Some days an interesting discussion pops up on the forum seemingly out of nowhere.
I hope that the gentlefolk don’t mind my posting the messages because not only are they interesting but having them on the blog means that they can be found again for future reference.
So, without further ado here’s Patrick’s first post on the matter of value and, interestingly, he started off being a little sceptical of it all.
This all started last week when I decided to investigate “Value”.
I don`t want to get barred from the site, but I have a confession….
I am not a believer!
I understand the concept of value, but in the sense of coin-tosses, dice-rolls, etc.
Horse racing is a whole other kettle of fish.
To say a price is “value” because it is higher than you expect is highly subjective and so has little worth.
With this in mind, last week i decided to test your VOP figures.
I broke them down into 1% tranches (starting and ending on a 1/2%, ie from 2.5% to 3.5%, etc).
I looked at all the runners (taking out NRs) and winners in all percentage brackets looking at all 10 years of info in the database.
It took a few days to manually calculate things.
I was expecting only vague correlations between your VOP figures and the actual conversion percentages.
Imagine my surprise and excitement when I discovered this was not the case!!!
With the exception of the 1-4% range, everything was very close up until about 30%.
The 1-4% converted lower (1=0.39%, 2= 1.02%, 3= 1.94%, 4= 3.35%)
At the upper ranges, the actual conversion is higher than the VOP anticipation.
A large factor for this divergence at the upper end, I suspect, is non-runners.
The VOP figures are calculated on declared runners (or possibly declared runners minus already declared NR`s), but that still leaves the late declared NR`s.
In 2014 to 2018 (I can`t calculate the numbers before that because of how the NR`s were done in the data) 7.72% of all the declared runners became non-runners.
I had no idea until that point that there were so many non-runners!
(ie, in 2016, there were 129,971 declared runners and 118,648 actual runners, a gap of 11,323 runners, or 8.71% of the original total number. This was calculated by filtering all the results to winner only and totaling the number of declared runners and actual runners – there is the possibility of a built in error from dead-heat races where the numbers will be counted twice, but this will be a small error.)
A non-runner in a race (or multiple) will increase the chances of all runners in the race and therefore mean that their actual success rate exceeds their calculated VOP.
Put another way, if the VOP were calculated in hindsight, knowing the NR`s, it would be proportionately higher for each runner, meaning those with the higher VOPs would see their VOPs rise proportionately more.
This would account (in large part) for the VOPs at the higher end being off by more.
ie, those runners with a 45% VOP actually won 56.05% of the time (averaged over the 10 years from 2009-2018).
At this stage, I was rapidly becoming a convert to the concept of value.
Patrick continued with this with his next message.
OK, so the VOP figure was turning out to be a pretty accurate representation of the actual likelihood of success.
My next step was to create an adjustment to the VOP based on ACTUAL performance.
I split each 1% into 100`s and adjusted it according to the actual performance.
I converted that into decimal odds for each 0.01%.
I then created a column for an adjusted odds to take into account Betfair Commission.
I added further columns based on this figure +10%, 20%, 25%, 50% and 100%.
I ended up with 18 different figures for each 100th of a percent, based on the original VOP, the figure adjusted using the full 10 yrs of data and a set based on just the last 5 yrs.
I added this table to the spreadsheet containing the 2009 full list of runners for the year (127,541 runners).
I then added look-ups for each runner in each race and each of the 18 different odds calculations based on the above.
I then added a value column for each to show if the BFSP was greater than the odds required to achieve value based on the above criteria.
The idea was that BFSP could be used with a min price set for each runner depending on the final set of the 18 values above which was selected.
If “Value” worked, bets could be placed an all runners with a price requirement built in.
If they hit that requirement, the bet went on, if it didn`t, the bet was not taken up.
I then started looking at how the figures stacked up for 2009, for all runners and races and only those that met the value requirements of the 18 scenarios laid out above.
Looking at all races in 2009, there were 11,937 winners from 127,540 runners (9.36% SR) producing -3,655.38 profit to BFSP.
Even just looking at a non adjusted value figure (ie VOP converted to decimal odds, with no allowance for BF commission, actual performance or a percentage of value cushion being added), the number of winners was reduced more significantly than the number of runners.
3,256 / 75,557 (4.31% SR) -2,968.38 profit.
I started to break this down further (HC is far better than non-HC, for example) and this is when I hit the problem of the spreadsheet freezing, the CPU hitting 100% usage, the fan on the computer (I assume it was the fan) screaming continuously, etc.
Conclusion to follow in the next post.
Patrick’s third post.
My conclusion on “Value”.
1. The VOP figures are surprisingly accurate and represent a horse`s true chance of winning, (based on the last 10 years of data.)
2. The V low percentages (1-4%) are over-enthusiastic and over estimate a horse`s chances.
3. The upper percentages underestimate the horse`s chances and this is likely to be the result of NR`s throwing off the calculations that were made before the NR`s were known.
4. Betting only value selections reduces the number of bets, but also dramatically reduces conversion and this affects profitability.
(ie, most of the winners, 72.7% for the year of 2009 which was all I had a chance to look at before computer meltdown, won at non-value prices.)
My original problem with the concept of Value in horse racing was in the getting of an accurate probability to begin with.
How can you say a price is right or wrong, Value or not, if the probability figure underlying all of this is just guesswork?
Malc has solved this problem with an accurate probability figure, the VOP.
However, for the vast majority of punters out there, the concept of Value is either alien, or its actual calculation is incorrect.
The bulk of punters are taking non-value prices and so the market doesn`t offer enough value prices.
People are taking 10/11 on a coin toss as they don`t know the correct price is evens!
I need to fix this issue with spreadsheets on my computer.
I am about a week and a half into this investigation.
There are signs that there might be a bulk angle into Value and I don`t want to give up yet.
(2009 seems to show it is possible to make almost 1500 points in the year using Value betting and I need to investigate this further and confirm it hold at a similar level for ALL years and isn`t a flash in the pan.)
This was immediately followed by Malcolm (not I) posting the following which is a useful start to explore further investigations.
If it`s data analysis and/or modelling you`re interested in, I would highly recommend investing some time in exploring the R programming language (together with the excellent RStudio IDE).
R is open source and free to download and use. Because it`s open source, there are a ton of free resources available online.
Because R was designed for data analysis, it is an ideal environment for carrying out data analysis or building models.
Don`t let “programming language” put you off if you haven`t any programming experience – just learning the basics of R will probably only take a month or two and you will be surprised how quickly you will be progressing and using R for your analytics work.
If you`d like a few resources to check out as a starting point, these are a few of my favourites;
The Analytics Edge – this is an MIT course on the edx educational platform online. Almost all of the edx courses are free (unless you want certification). I have done several MOOC`s on udemy, edx etc. and the MIT courses stand out both for the quality of the teaching and the content. It`s a beautiful thing that some of the best Universities in the world offer free learning and this course is an excellent primer in using R for analytics. It`s only drawback is that this course focuses on Base R and none of the “tidyverse” packages but don`t let that put you off (That might not make much sense but within a few weeks it will!). A really good starting point with high quality teaching and good resources.
R For Data Science by Hadley Wickham & Garret Grolemund
This is an excellent book that takes you from the basics of analytics using R through to some pretty advanced techniques. Hadley Wickham is something of a celebrity in the R world. He created many of the Tidyverse packages – which are basically a suite of packages that make data analysis easier, faster and more consistent when compared to using Base R on which they were built.
R in Action by Robert I. Kabacoff
A comprehensive reference for the R language that covers a lot of ground. It assumes no previous knowledge of the R language and is therefore a good choice if you are starting out from scratch.
Hands On programming With R by Garrett Grolemund
Another excellent beginners guide to the R language.
Applied Predictive Modeling by Max Kuhn
This is my favourite book. It`s not cheap but it`s the best book I have found on building predictive Models in R. You would want to have some R programming experience under your belt to get the most out of it but I can`t recommend this more highly.
There are tons of free tutorials on YouTube. Check out the David Langer channel, he has a series of videos with the beginner in mind. He has a playlist of “Introduction to Data Science with R” which is pretty good as an intro to analytics and modelling using R.
Anyway, my apologies if this isn`t what you are looking for. I`ve noticed a few posts on here recently from people mentioning ways to analyse the csv files so thought I`d stick up a post in case anyone might be interested. Whenever I see `Excel` in a post my heart drops through the floor and a piece of my soul dies.