Data Analysis Tool – A How To Guide – Lesson 3

By | 29th August 2018

So far we’ve improved our Windows environment a little, we’ve downloaded the DA Tool, sundry files and, of course, the data files. We’ve set up the required folders, we’ve got Excel running properly and, finally, we’ve run our first report.

We only just found out during the previous lesson that the report was simply looking at the Top Rated Horse within each race.

This is the most logical place to start when it comes to learning about the DA Tool because everyone should be familiar with the concept of the top rated horse in the race.

So let’s see if we can play around a little and modify our requirements a little.

But before we do that we need to know something about CSV files.

What is a CSV File?

Despite what you may think, or what your computer is trying to tell you, a CSV is most certainly not an Excel file.

What it is is a text file in which the various values are separated by commas. This is where it gets its name from: Comma Separated Value.

Because it has a title row as the first row of the CSV file (also separated by commas) the structure of the file is obvious. And this means that it can be readily read by any spreadsheet application like this CSV which you are seeing in its text form.

This is a part of the Daily CSV file and we can see that all of the fields are, indeed, separated by commas and, thus, even the most basic of spreadsheets can open it and display it.

What most of the spreadsheets can’t do, and Excel is no exception, is that they can’t seem to write back to the CSV without really making a mess of it.

So please remember the Golden Rule here: a CSV is not an Excel file but it’s a file that Excel can read and because of this never, ever consider making any corrections to it from a spreadsheet because it will really make a mess of it.

Spreadsheets are good at reading CSV files but they are really, really bad at writing to them.

With that out of the way, let’s continue.

The Daily CSV File

The first thing we need to look at is one of the sample data files. You can either open the Daily CSV file or one of the ones in your data folder. It doesn’t matter which one.

The file (above) is the Daily CSV file for today, which happens to be the 27th of August, 2018. There’s nothing special about this CSV file for our purpose here. And if you look closely you may see that this is the same CSV file as the one shown in the first illustration; but the only difference is that it’s opened in Excel.

The layout of the CSV file is always the same: whether it be the monthly CSV Results file (which is all of the Daily CSVs added together and the results added) so once we get to know the layout of the CSV then you can be assured that it will never change.

This is an important fact to remember, because if you feel that the name of the horse should be in a field before the RaceType then please don’t ask: it won’t be changed unless something is really, really broken.

As an aside, there’s a few duplicated fields in the CSV file and because I am not going to change the structure of this file these duplications will remain. This means that the CSV files that work today will always work in the future.

Of course, there’s a But here. If I have to add more fields to the CSV files then I will update all of the CSV from 2009 onwards. Your DA Tool code will still work but you will need to download a new version of the DA Tool and all of the CSV files once again. I don’t plan on doing this anymore unless there is something really, really broken or significant missing.

Have a look in this CSV file and read the column titles as you scroll to the right. You will find that there’s a load of different data fields in here. In fact, it would be fair to say that most of the data that you see in the PDF files is in here somewhere.

Please have a look around, get used to it. You’re going to be digging in here a lot now. And if you ever need any help with some of the column titles then this (hopelessly out of date) web page may help you a little: https://www.ukhorseracing.co.uk/Archives/CSVFields.asp

Now it’s worth knowing that the CSV currently holds around 450 columns of data and that means that the information that one should wish for is in there somewhere. So, if you can write down a requirement (as complex or as simple as you like) and provided that all of the criteria you are looking for is somewhere within these 450 columns then a DA Tool report can be made.

This morning we were asked on the Discussion Group if the DA Tool can be used to make a report for “What is the strike rate of Underscore RawAdj at Goodwood over 7f in the soft for a 3yo+ Handicap?”

The answer to that query is a definite yes. It’s a rather simple request but with a number of requirements and as all of the information is in the CSV file then it’s easy to do. The resulting number of horses that will be analysed won’t be massive because of the rather tight condition request; but for the purpose of asking whether this sort of thing can be done this demonstrates what the DA Tool can do.

A part of the page describing the CSV fields is shown below.

This page lists the fields and, where needed, a small explanation of the fields is also given.

The Integrated Development Environment

If the DA Tool isn’t open then please open it now.

Now the fun starts…

The next thing to do is to ensure that we can get into the IDE. Microsoft haven’t made this easy (again!) as for no explicable reason the IDE is blocked unless one enables the Developer tab.

If you can’t see the Developer tab above the Ribbon on the DA Tool (below) then you need to make it visible.

If you go into Options (the big round button on the top left) then click on the option shown (for Excel 2007 & 2010):

If you have a later version then follow these instructions:

We can now continue with the exploration of the IDE.

Press the Alt-F11 chord (that’s the Alt key at the same time as pressing the F11 key) and stand back because this then appears on the screen.

Don’t worry, it’s not as scary as it looks.

First of all, if you can’t see the Properties Window then press F4 and it appears. It does have a habit of vanishing and as it’s a useful window it’s best to keep it open.

Before you start screaming let’s have a wander around this screen. And where is a good place to start but in the top left corner at the Object Tree.

The Object Tree

You will see that all of the open Excel documents are shown in the Object Tree.

I have two objects in that tree: one of which has been expanded. The first object is that daily CSV file which has been imported (i.e. opened) into Excel. The second Excel file is the DA Tool itself.

It may be a good idea to explain next what we mean by Objects.

What is an Object?

At the simplest level an Object is a Thing. You’re a thing and I’m a thing and that makes us both Objects.

An object has Properties, which is way of describing an aspect of that object. For example, I am an object and one of my properties is Name which is ‘Malcolm’. I have a DateOfBirth property which is, of course, my date of birth.

I don’t have an Age property because I don’t really need one because my age could be easily calculated from today’s date and my DateOfBirth.

I am also right-handed which also is a property of my object. You may be left-handed, ambidextrous or right-handed.

That’s simple enough, but one can go further.

An object can contain other objects or a Collection of objects.

So my Me object (objects can be referred to as ‘Me’, honest: you will see this in code sometimes but let’s not worry about this now) has a Collection of eyes. Two in my case and, right now, the colour properties of each of my eyes are red. David Bowie, for example, would have one of his eye properties as blue and the other as green.

There’s a left eye and a right eye and both these left and right attributes are further properties of the eye.

So basically anything one can use to describe an aspect of an object is a property. Such as: I have a Height and a Weight property: I’ve told my GP that I am short for my weight but I don’t think that washed with him and he now considers that my Idiot property is set to True.

Before we go further there’s a thing called Class. A Class is really the blueprint of an object. So, my Me object belongs (mostly) to the Human, or Homo Sapiens, Class as do you.

So both my eye objects are of the Eye Class.

What other collections would my Me object have? Well, there’s a collection of ears, all are derived from the Ear class. I have two ear objects in my Ears collection. Vincent Van Gogh ended up with one in his collection and, as we know, Davy Crockett had three (left ear, right ear and wild frontier: I’ll get my coat!).

The thing with Classes is that they describe what the object will look like. It will tell us what sort of properties the object will have and what other objects, and collection, of objects it will consist of. And any new instantiated object will have those properties and objects. Like I have said, it’s a blueprint of what an instantiated object of that class will look like.

So the Data Analysis Tool object has a number of objects associated with it. The first thing is that it’s a Workbook object which consists of a number of Worksheets, or a collection of Worksheets to be precise.

This is a rather scruffy introduction to what an object is. There are other aspects of objects but, for the purpose of this series of lessons, this is all that we need to know. So if you ever hear of inheritance, overloading, events and methods just nod sagely: for the purposes of the DA Tool you won’t need to know any of that stuff.

One last thing. In software development objects need not be limited to tangible things such as spreadsheet workbooks, worksheets, cells, controls and the like. The
software developer can make classes (which are then used to instantiate objects) for non-tangible things such as lumps of code and this is what we’re going to do later on.

But now you’ve had a breather, let’s get back to what you see in front of you.

Back to the IDE

IDE? That’s the abbreviation for Integrated Development Environment. In other words everything you need to develop code is all in one place.

Excel and the other Office applications have their very excellent IDE all behind the application and is only an Alt-F11 chord away. So, next time you’re in Word look into the IDE and you’ll see something similar.

If in the Code Window the code for the Lesson 01 isn’t visible: you can see if the first few lines look like this below, then double click on ‘cls__Lesson_01’ in the Object Tree.

Right, scroll down the code so that you can see what the illustration below is showing.

The first thing to note is that there is a thing called a Comment. These are displayed in green (the IDE does this for you) and these comments aren’t executed. So they are handy for putting in comments, instructions or just for temporarily stopping some code from running.

There are two important lines of comment and these are shown in the above illustration by the last box and this is where you can do your coding. All of the rest of the module you can forget about apart from two other places and we’ll come to this in due course, but the important thing is that most of the work that you do goes into this small section that’s indicated.

Now as you may remember that the first run that you made with the DA Tool looks only at the top rated horse.

If you look at the Daily CSV sheet you will see that the Rating Position of the horse is in Column AZ. And as we’re looking for the top rated horse in each race we’re looking for the value of AZ to be 1.

If you look at the code above you may be able to work out from this information that we’re grabbing the value of column AZ and putting the value into what we call a variable. This variable has the name nRanking.

Then further down we’re checking to see if this variable has a value of 1.

Basically what we are doing is, for every item of information taken from the CSV file, is to take it in three steps: first, declare a variable, secondly fill the variable with the information from the CSV file and then, finally, test the contents of the variable to see if it meets our requirements. There’s other bits and bobs but, for now this is the gist of what happens.

Let’s explore first but let’s look at these things called variables.

Variables

A variable is something that you create (or declare to use the correct terminology) in your code that you use to store data. Then you can use the variable in many ways: you can test it against other variables or values and/or you can change the variable’s content if you like.

You declare a variable by using the Dim statement and then you give the name of the variable that you would like to use. And then you state what type of data is held in the variable.

In our code we can see that we’ve declared the variable nRanking as follows

Dim nRanking As Long

The Dim statement means that there’s a variable declaration going on.

I have chosen the name nRanking for the variable name (top tip: name your variables so that you can look at your code in six months time and know what the heck is going on) and then there’s the data type of Long.

In our code we are going to use a few Data Types. Our data that’s in the CSV file is either numeric or text. Anything that handles text data will need to be declared as type String.

When it comes to numbers, we are either dealing with whole numbers (integers) or numbers that can have decimal places in (real numbers).

If we’re going to use integers for anything (for example, the card number of a horse, the horse’s weight or perhaps it’s ranking within the ratings) then we’ll use the data type of Long (this is an abbreviation of Long Integer which refers to how the data is held within the code – this is something we don’t need to worry about).

If the data has decimal points, or the chance of a decimal point, then we use the Double data type (the name, once again, refers to the internal storage which we don’t need to worry about). We use Double data type for, for example, the Race Class value or the value of the horse’s RAdj field.

Now, you needn’t do what I do, but I put a little lower case prefix in front of my variables; n for numeric, s for string and, when you see them, Is for Boolean.

Boolean variables are either True or False . I use the prefix ‘Is’ to help me remember what True and False means.

For example, if I had a Boolean called IsTopRated and it was False then I could work out from the naming convention that the horse wasn’t top rated.

But at the end of the day, as long as your variables are clear then it doesn’t matter what convention you use for naming the things. Remember it’s you that has to come back in six months and debug the thing.

Onto Lesson Two

Goodness, this has taken a while to get here. But, not to worry as on the way you’ll have learned some things I am sure. And even if you don’t use the DA Tool much: learning to use the code in the IDE of Excel is always an useful skill to have.

Lesson One was to run the code which generated the Top Rated report. Now we’re going to look at the Top Rated in Handicaps only, so we’re going to edit the code so that we can make a second report.

The first thing that we’ll do is to go to the top of the class and find the line which has the CONST declaration of the variable CLASS_TITLE .

A CONST, by the way, is a Constant Variable. This is just simply a variable that’s given a value when it’s defined and it can’t ever be changed in the code.

But we can change the value here and what we’ll do is to change the value of this string from “Lesson 01” to “Lesson 02 – Top Rated Handicaps”. It’s good to give a little bit of information here as to what is going on because this title forms the title of our report and its file name.

So change that CONST value and note that this is the first place outside of those two comments which read ‘ ********* Place the selection criteria after here and ‘ ********* Place the selection criteria before here

We see that we have a variable already created called sHandicap. Now it gets its data from column BB of the CSV file and we can see there’s some code which refers to column BB so that’s good enough for us here.

So we can see that there is a variable called sHandicap which has been declared as being a string and we can see that this variable is given a value from Column BB.
That’s two of the three things that we do with a variable. To repeat they are:

  1. Declare a variable and its Data Type
  2. Assign a value to the variable
  3. Test the value of the variable

We’ve done the first two and now we need to do the third step ourselves. I’ve not helped you with this bit.

We can see that the variable testing is done a little further down here:

We will be changing the code so that it looks like this:

After the statement which checks to see if the nRanking is 1, we add the line

If Len(sHandicap) > 0 Then

Then the line before the End If statement, we add another End If.

Then it’s good to use the Tab key before the code lines to make the indentation look a little like mine in the picture below. This is the same as the previous illustration but without the boxes and arrows cluttering up the view.

So, what have we here then? Well, our first If statement checks for top rated horses and those which are top rated go into the If statement and those horses which aren’t top rated don’t go further into the If statement.

Then those horses which are top rated are then faced with another check. We are checking the Length of the string that’s in the sHandicap variable. Now have a close look at column BB in the CSV file.

Handicap races come in two forms: Handicaps and Nurseries. The latter is simply a handicap for two year olds, but it’s clear that any form of handicap has a string of characters, namely ‘Handicap’ or ‘Nursery’ and non-handicap races have an empty string. An empty string is simply a string that is of zero length.

So we’re just testing here in the second If statement for anything that passes the first test that has some characters (i.e. a length of more than zero characters) in the sHandicap variable.

And if it does, a Boolean variable, the one that I have previously defined is set to True. You will see that before the first If statement that it’s set to False.

So, if the horse is top rated (nRanking = 1) and is running in a Handicap (Len(sHandicap)>0) then we can set the variable bIsSelection to True.

Then at the foot of our section I test this Boolean variable and if it’s True then the program does something with something called oNodeHandler.AddResult() . Don’t worry about what that does because that’s the entrance into the rabbit hole and where our code, after having identified a horse that satisfied our criteria, goes next.

There’s no need to go down this particular rabbit hole. It’s best to leave it alone because madness lurks deep within the dark corners of the DA Tool’s code and it’s best just to leave it here.

And that’s it. Save the code with the Save button on the top left of the IDE (or press the chord ‘Ctrl-s’ to save it) and then find the front page of the DA Tool, and run it.

If all goes well then we’ve got our second report coming.

Now, that wasn’t hard at all, was it?