Monday, February 25, 2013

Calculating variance at $1/$2 live

I've been playing $1/$2 live at my local casino since the summer of 2009. During that time I've recorded my results. When I come back home I record the number of hours played and the amount won or lost. The results are just recorded in an Excel spreadsheet.

At this point I have over 100 sessions and over 400 hours played. Results are ok. around break even, maybe down tree fiddy or so. Estimating 25 hands an hour live that's a sample size of around 10k hands.

While I have the totals and per hour can be readily calculated I was thinking of things like determining if my sample looks like a normal distribution, calculating my variance, and how confident I can be in the results to this point.

I looked at sessions first, since that is the form of the raw data. Now with sessions there's an immediate observation they are not the same size. Often I play 2 or 3 hours. More infrequently 4+ hours. Nonetheless that's where I started.

With sessions the variance came out at $117 per session over the data set. In Excel it can generate histograms with a bit of setup. Using the interval size of $10 I could convince myself that the graph has a semblance of a normal curve.


Over to hours. For live players I believe per hour is the way most of us usually think about cumulative results. People don't say "I lose $10 a session on average". They may say or calculate "I lose around $3 an hour playing."

Unfortunately the per hour data isn't great quality. The problem is results are recorded per session, not per hour. The result of this is that outliers are not accurately recorded and true variance is underreported.

For example suppose I play 3 hours and lose $30. Hourly, that would expand as {-$10, -$10, -$10}. In reality it would be something like {lost $50 the first hour, broke even the second, won back $20 the third, went home}. In this example the average hourly deviation is $0 in the averaged out session, and $27 an hour in reality. Alas the actual hourly data just isn't recorded so I can't know for sure true hourly variance.

Using the "expanded" way of assigning the session hourly result evenly over the number of hours played. It comes to a variance of $32 an hour. The histograms actually aren't that bad, especially the $10 intervals graph looks fairly not terrible.

Alas as noted the true hourly variance has to be higher, perhaps $60 an hour? I can't be sure.


There is a way using the central limit theorem to somewhat overcome the loss of hourly accuracy due to the results being recorded as sessions. If I put the sessions together into 10 hour buckets then I can overcome at least some of the lost accuracy. For example suppose I play 5 of 4 hour sessions for 20 hours. If I put them into 10 hour buckets then there's no loss of accuracy in sessions 1, 2, 4, or 5. in session 3 half of its result is assigned the first 10 hour bucket and half is assigned the second.

By using the per session results to "fill" consecutive 10 hour buckets and averaging sessions across boundaries, it came out to a variance of $176 per 10 hour block. Which is fairly good considering there are only around 40 data points. The histograms are a bit sparse but not bad looking using $25 and $50 interval sizes. I found using the 10 hour buckets scheme produced the highest sigma value when determining confidence levels. It was around 7% higher confidence than variance based on individual sessions which had about a 100 data point size.


So that's not bad. I'm glad I went through this exercise. It took a little bit to think it through, around as long as I expected. Plus I have the spreadsheets and Python helper scripts up so they can be reused. CSV file format can be your friend.

It was a bit humbling I was a bit surprised by how high the standard deviation is and how low the sigmas are in the confidence levels of the results. A fat tail can be good. Basically 10k hands isn't really a good sample size. I'd say based on my direct observations here 10k hands is a bare minimum to get some rough numbers. Basically I should come back by doubling the sample size to 1000 hours and I suspect that would fill in and smooth out a lot of the graphs.


There's a reason I'm more systemically interested in $1/$2 results at this time and I hope to have something to post in the nearer future.

No comments: