Subscribe to Latest Posts

5 Dec 2011

Coin patterns

Posted by James. Comments Off

Suppose you lay out a long line of coins, all facing heads up, and number them 1, 2, 3, …

Suppose that you then go and flip over the 2nd, 4th, 6th, 8th, … coins. (You would then have a pattern H T H T H T H T… )

Then you go and flip over the 3rd, 6th, 9th, 12th, … coins.  (You would then have H T T T H H…)

And then you go and flip over the 4th, 8th, 12th, 16th, … coins.

And so on.

Which of the coins will remain heads up, and why?

Read the rest of this entry »

10 Nov 2011

What is short selling?

Posted by James. Comments Off

The simple approach to trading is to buy some shares and then sell them later.

For instance:  I could buy 1000 shares of NVDA (Nvidia Corp.) at $10, wait for the price to ramp up, and later sell the shares at $15, making a hearty $5000 profit (minus transaction fees).

Short selling is the reverse of this process: you sell the shares first and then buy them later.

But how can you possibly sell shares that you don’t have? Well, when you initiate a short sell, your broker borrows the shares from one of its retail customers and sells it on the market. Your account is credited with the proceeds, however you owe them the shares back. At a later point in time, you decide to cover your position by buying the stock, and the shares are returned to the retail customer.

So to continue the previous example, if we short (sell) 1000 shares of NVDA at $10, wait a little while, and then buy to cover the 1000 shares at $15, we make a loss of $5000. So when shorting, we are interested in the price going down rather than up!

I.e. even in bear markets you can make a profit.

8 Nov 2011

Random Number Generator

Posted by James. Comments Off

I got asked this question during a technical interview for Bloomberg:

You have a piece of hardware that generates random 0’s and 1’s at your request. How would you use it to generate random numbers between 0 and 199 (200 values)?

The obvious problem here is the set of values you’re interested in; had it been 0 to 255, then you only need to bung 8 bits in a byte and you’re done. But in this case we need to be careful and make sure the generator has a uniform distribution.

  0 = 00000000 in binary
199 = 11000111 in binary

During the interview, I said that you generate a byte’s worth of random numbers, and if the number is above 199, then simply query the hardware again until you get one below 199. The obvious flaw to this is that you might be unfortunate and have to call the function hundreds of times, theoretically up to an infinite number of times! However, in practice this is a suitable approach, and you are assured of it being a uniform distribution. That answer was accepted.

What’s the chance of having to re-call the function 20 times in a row? About (56/255)^20 = 6.81e-14. That means, if you called the RNG 100,000 times a second, on average this worst-case would happen once in every 5 years.

8 Nov 2011

Angle between the clock hands at 3:15

Posted by James. Comments Off

This question was asked to one of the interns at Citi.

Taking it one step at a time…

At 3:00 the big hand is at 12 and the little hand is at 3.

At 3:15 the big hand will be at 3, and the little won’t be at 3 – it’ll be 1/4 of its way towards the next number.

Therefore the angle between the hands is (1/4) * (1/12) * 360 = 7.5 degrees.

8 Nov 2011

Why are man-hole covers round?

Posted by James. Comments Off

This one was apparently asked by Microsoft in the US to their prospective employees in the early days. It’s designed to assess how the person thinks rather than their being a definitive answer.

These are the ones I thought of:

Rolling it – Manhole covers are heavy, and it might be helpful to be able to roll one across the ground instead of dragging or carrying it, and is potentially more safe.

Manufacturing it – Whatever factory makes manhole covers may well make a range of other round objects, and so its cheaper to use the same machine and be happy with cheaper round ones.

Geometric safety – A circle is the only shape that, given a certain angle and tilt, cannot fall in on the hole it is covering. This is important since manhole covers are heavy, and so difficult to retrieve, but also it’s a lot safer, since one reason you might be removing a manhole cover is to let people down below escape, and you don’t want it landing on your head.

Of course, to my knowledge, manhole covers in the UK seem to be all sorts of shapes, so I’m not sure any argument holds true!

3 Nov 2011

Simple portfolio management

Posted by James. Comments Off

Scenario: You have $1,000,000 to play with and want to invest in a some promising companies.

The basic questions to ask are…

  • How many shares should I buy in company XYZ?
  • How can I limit my exposure to new downward trends?

…and the companies we’re looking to invest in are trading at…

Company Symbol Price
ON Semiconductor ONNN $7.09
Apple AAPL $397.41
American Express Company AXP $50.32
Analog Devices ADI $35.50
Imperial Oil IMO $40.98

In order to minimise risk of ruin, we need to spread the risk of losing a lot of money amongst all the stocks in our portfolio.

Assuming that we are willing to divide the $1,000,000 evenly between these stocks, then the job is to simply divide the money we want to invest in each company by the current price.

Symbol Price Qty = $200,000/Price
ONNN $7.09 28209
AAPL $397.41 503
AXP $50.32 3975
ADI $35.50 5634
IMO $40.98 4880

COUNTER EXAMPLE: If we were to simply buy an equal number of shares in each company (1882 shares in each and totalling $999,906), then even a 1% drop in the prices would mean vastly disproportional exposure to loss across the portfolio: the lowest being $133 for ONNN and the highest $7479 for AAPL.

Now that we have bought the shares, we would like to safeguard against any downward spirals in the markets and so limit our possible losses. A (sell) stop order can be used as an automatic bail-out if a certain price is hit. (i.e. if the market value of company XYZ hits price X, then the order automatically sells the shares, in order to stop further losses. What you do after that is up to you.) If we are willing to experience a maximum loss of $10,000 then the maximum loss per company in our portfolio is $2,000.

Symbol Price Qty Stop price = Price – ($2,000/Qty)
ONNN $7.09 28209 $7.02
AAPL $397.41 503 $393.44
AXP $50.32 3975 $49.82
ADI $35.50 5634 $35.15
IMO $40.98 4880 $40.57

The interesting thing here is that, since Apple’s stocks vary so widely from day to day (easily as much as $10) a stop price of $393.44 doesn’t seem so sensible considering its volatility. We might prefer to hold fewer shares in AAPL and withstand a larger sway in price, but still limiting losses to $2,000. This means we would not be evenly distributing our cash.

With this very simple strategy you can see that potential losses have been limited, but potential profits are unlimited. For the purposes of this post I’ve picked a couple of random companies, but in practice we would want to pick stocks that we think are going to go up rather than down, and perhaps this is where Fundamental Analysis and Technical Analysis could come into play.

I am interested in developing this simply strategy a little further and then use historical data to test the algorithm. Further things to consider are…

  • How to pick the companies, and in which sectors
  • Use an average daily stock price change variable to help decide stop price
  • Decide on when to sell any profits

23 Feb 2011

Metaprogramming

Posted by James. Comments Off

Here’s my example of how to use template metaprogramming to calculate a value during compilation. It is the basis of more advanced techniques, such as those found in the STL for type computation.

The problem

Let’s say we want to convert from trinary to decimal values. As a reference:

Trinary number system
index: 8 7 6 5 4 3 2 1
value: 37 36 35 34 33 32 31 30
2187 729 243 81 27 9 3 1

So the trinary number 220 is equal to the decimal number 24 because:

2203 = 2*32 + 2*31 + 0*30

2203 = 18 + 6 + 0

2203 = 2410

Haskell solution

Haskell is a functional programming language, and so is metafunction programming. Therefore, we start with a functional solution. Here it is:

1:
2:
3:
4:
trinary :: Int -> Int
trinary n
 | n < 3          = n
 | otherwise      = trinary (n `div` 10) * 3 + n `mod` 10

So “trinary 220″ would be ((2) * 3 + 2) * 3 + 0 = 24 which is correct.

C++ solution

Now we implement the same thing but using templates instead:

 1:
 2:
 3:
 4:
 5:
 6:
 7:
 8:
 9:
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:
20:
#include <stdio.h>

template<int N>
struct trinary
{
	static int const value = trinary<N/10>::value * 3 + N%10;
};

template<>
struct trinary<0>
{
	static int const value = 0;
};

void main(void)
{
	printf("%d", trinary<220>::value);
}

//Example adapted from Abrahams 2005

This works in much the same way as the Haskell version does with one distinct difference. The termination condition is different, although it performs exactly the same job.

Walking this through:

17:

trinary<220>::value

This is reaching inside trinary and causes the compiler to instantiate trinary<220>
6:

value = trinary<22>::value * 3 + 0;

trinary<22> is instantiated
6:

value = trinary<2>::value * 3 + 2;

trinary<2> is instantiated
6:

value = trinary<0>::value * 3 + 2;

trinary<0> is instantiated
10:

struct trinary<0>

This template specialization is invoked and stops further recursion. 0 is “returned”.
6:

value = 0 * 3 + 2;

All the value’s are now computed (in reverse order as listed above)
6:

value = 2 * 3 + 2;

trinary<22>’s instantiation
6:

value = 8 * 3 + 0;

trinary<220>’s instantiation
17:

printf(“%d”, 24);

The compiler has got our result and the constant integer is used at run-time

Since all the computation is done at compile time, the program always runs in constant-time! It’s as if the compiler is doing what Haskell does when it runs, to only use the final result.

This example is adapted from: C++ Template Metaprogramming, David Abrahams and Aleksey Gurtovoy, Addison-Wesley Pearson Education, 2005. The book is an excellent introduction to advanced template techniques!

13 Feb 2010

ClientData

Posted by James. Comments Off

This article is about a real project I undertook during my internship with unexpected requirements and it illustrates why an unusual solution was the correct choice in this situation.

Read the rest of this entry »

6 Feb 2010

Anatomy of a hard drive

Posted by James. Comments Off

I decided to open up a few of the broken and old hard drives from work to see how they work.

The hard drive in the video is a Seagate Barracuda. If you look closely you can see a little blob of whitetack I used to hold open a plastic gate. In normal operation with the top on and the power switch off this gate holds the actuator arm out of harms way. When the drive is switch on, the platters spin and create enough air movement to flick the gate open and the actuator has free movement over the whole platter. So this appears to be a cheap and easy clutch system to keep your data safe when not in operation.

Below are images of another hard drive. This time it’s a Seagate, 7200, 500GB.

Hard disk drive partial deconstruction

A is the logic board and ATA interface. B shows the base casting with intact components. C is the top of the drive. X appears to get a gate like the one shown in the video. Notice what appears to be a filter in the top-left corner of B.

Hard disk drive full deconstruction

D and E are the bottom and top of the actuator magnets, and together they are very strong indeed. F is the actuator arm and you can see the copper coil that sits between D and E. Attached by ribbon cable is G, which seems to feed data from the actuator to the logic board via hole P. Next, H, I, J, K and L make up the platters, in order, and sit on the motor and spindle M. Not all hard drives have I, which appears to be some sort of shielding between the platters. N is the filter (as mentioned above) and finally, O appears to be silica gel, or at least something to take care of humidity / air pressure changes.

Conclusions

Hard drives haven’t really changed in the last 15 years, the only difference being capacity. The majority of drives only had 1 or 2 platters. All drives had the G, P, A configuration and some sort of air filter and/or silica gel.

To be honest, I’ve always thought it amazing hard drives work as well as they do, considering the amount of data now available in a single drive and the speed at which they work.

UPDATE

Opening a Maxtor 15k reveals more. The unit itself is much heavier than a 7k counterpart.

Maxtor hdd atlas 15k UltraSCSI with top opened

The platters are protected much more than in other drives.

Maxtor hdd atlas 15k UltraSCSI deconstructed

It’s apparent they’ve gone to quite some trouble making the spindle components light, presumably due to the higher RPM.

Tags:

6 Feb 2010

Cluster powersave

Posted by James. Comments Off

Background introduction

A cluster is many server units working together with a high speed interconnect. Each ‘node’ might have 8 cores and 24 GB RAM. The size of the cluster varies but let’s say its 64. So that’s 512 cores with 1.5TB RAM. Cost is something like £250-500K, but the sky’s the limit.

A single cluster

Multiple clusters are switched on all the time in the server room. Unsurprisingly this eats up a lot of power! Not just to keep them running but also keeping them cool.

I was given the task of writing some software to automatically switch nodes off when not in use.

In concept, it sounds so simple :)

Considerations

  • Jobs are submitted to the cluster for processing and are scheduled by LSF.
  • Jobs need a number of nodes, e.g. 8, for anywhere between 10 minutes and a month.
  • Boot times for nodes is very slow: up to 12 minutes.
  • Shutdown times are generally very quick.
  • Nodes may go up/down for servicing ‘at random’.
  • The hardware and operating system version on which the powersaving software runs could change from time to time.
  • Nodes should not be switched off too often since this is likely to cause hardware failures.
  • Whatever the solution, no one is going to be monitoring it closely so it needs to be robust.
  • The cluster(s) it controls will change over time, so it needs to be easily configurable.
  • Booting 64 nodes simultaneously will cause a power surge that is better avoided.
  • Interns of varying capability are likely to be looking after the system in years to come.

The most important factor is this: For the solution to be taken seriously, it must be robust and not interfere with the business use of the machines.

Implementation

Python was chosen for its ease and flexibility. Small Bash files were used where needed.

The obvious approach is to fire off the script and have it enter an infinite loop where it scans for changes in the cluster and reacts to them. To cater for maximum robustness I chose a different approach: have cron call the script at regular intervals and design the script around a one-shot reaction. What this means is each invocation will see a complete scan of the nodes, a decision on what should be done (if anything), actioning that decision and a quick exit.

The reason this is more robust is that just about every part of the script could fail at any point through no fault of its own. The networked environment is forever in a changing state, nodes seemingly disappear, the DNS configuration changes, the cluster gets re-imaged, the server on which powersave runs reaches 100% CPU for half an hour, the ssh keys are invalidated, etc, etc. Because we would like a robust system that does not need looking after, the solution is to design the script with a ’scan, action and exit’ mentality, and let cron do the pseudo-infinite-loop part. The premise is that if one call to the script fails, the next call hopefully wont because the environment has changed. If it still doesn’t work, keep trying, but die quietly every time.

In practice this approach works very well. The disadvantage is that nothing is held in memory between cron invocations!

As I often do with other scripting tasks, the ‘what’ and the ‘how’ are separate files. In theory, a novice can look over the easy to read ‘what’ script and get a general understanding for what happens. The gory technical ‘how’ side of it is called from the main script and only a programmer need look at it. There’s lots of reasons for doing this, one is that your manager or local system administrator can look over the ‘what’ script and feel confident about it.

Realisation

I would often walk past the computer room in the morning to find most of the nodes are off. There’s a job still processing from yesterday so those nodes are still on.

An hour later, the rest of the nodes are on because a job has been submitted.

That in itself is nice, but its even nicer when it works day in, day out.