Race
results data can now be had in virtually unlimited quantities, in pre-digitized
format. Presumably, somebody, somewhere typed the stuff into a computer,
but it ain’t me anymore—thank goodness. The raw data is now
available, for a price, from several sources on the Web, and if you
are going to put it to your own use, the first thing you need to decide
is how large your study population should be.
With
horse racing statistics, the standard take has always been that bigger
is better. Everybody knows that tiny samples give invalid results;
small samples can give skewed results—big is good—humongous
is even better. Right? So, naturally, what horseracing statisticians
have tended to do is shoot for “humongous,” and at the same time, there
has been a shift from the goals of identifying handicapping factors
to identifying ROI (return on investment) factors over, say, 15,000
races.
To
get right to the point, this “homogenizes” the data.
With
all due respect to those horseracing statisticians who have labored
over massive data sets of race results—they are usually singing under
the wrong window. This is a game where short-term variability is crucial—and
big samples blur opportunities.
What
is often lost in these broad-based computer studies is the importance
of the variability that makes up the day-to-day reality of the two main
sets of information: racing data (times, etc., which Eric Langjahr termed
“The Cold Dope”) and tote board data (odds, etc.), which are part-and-parcel
of the “variance” that makes betting “scores” possible.
In
the 1960s and ‘70s, data sets had to be “punched in”—literally, on keypunch
machines, while squinting at microscopic print in the Form.
I did not wear glasses until I did a lot of this in the late ‘70s.
(You also had to stand at the machine, and there was often only
one machine per 20,000 or so students and faculty, so it was not unusual
to get your turn at 3 o’clock in the morning.)
As
a result, early data sets were small, but hopes were high—after all,
this was a computer the size of a tractor-trailer so some miracle was
bound to happen. The goals were simple: looking for patterning of handicapping
factors in a fairly traditional sense. There was no miracle, but a
lot of the goals were met. We know much more about handicapping factors
today than we did then, thanks to the published works of William Quirin
and those who have followed.
I
vividly remember the frustration of simply getting the data then.
The charts were published in the paper Form, often hit-or-miss.
The day you thought the charts for a certain race day should be published,
they weren’t. It was difficult to even find a Form in
my area, and past Forms had to be ordered at higher-than-face
cost and if you were lucky, they arrived in a tattered bundle, maybe
six weeks later. I am still waiting for several bundles I ordered in
the early ‘80s.
I
also distinctly remember wanting more! Bigger data sets! I
wanted humongous. I was wrong. Like virtually everyone else
then, I was constrained to looking at smaller data sets and smaller
questions—and that turned out to be a lucky stroke.
There
are many questions in horse racing where it would be nice to have a
population of 15,000 races, but there are many more where smaller, more
compact and focused populations identify patterning, which large populations
completely obscure.
What
researchers have generally looked for in big-data runs are factors that
show a certain percentage profit or loss over, say 15,000 instances.
An example of one of the simplest types of factors tested would be the
profit percentage of theoretical flat bets on favorites. A more complex
one might be the profit percentage of hypothetical bets on three-year-olds
after a certain length lay off after July 31 of the year. It is a certainty
that if you run enough of these little simulations that you will find
some that show various profits—always small.
It’s only a little tongue-in-cheek to ask: “Okay, now—have you got
15,000 bets—and the several years it would take to make them when the
angle arises?”
In
our little world of horse racing, variance happens. If you’re
going to take any of these statistical angles seriously, you’d better
have those bets and the time, because you might not score until bet
number 14,998, then lose it all on 14,999 and be back to zero again
at 15,000.
Extremely
large samples in horse racing are not totally useless and I’m not suggesting
that you don’t invest in some of the statistics-based studies that are
available. It is good to know that class-droppers tend to win greater
than their fair share of races—duh—and many of the other positive and
negative “impact” factors either identified or verified through large
sample studies. These are things everyone should be able to grab off
a synapse at the appropriate moment during the handicapping thought
process.
But
15,000-race samples completely blur the hour-to-hour, day-to-day, and
week-to-week variability, which creates the opportunities for bettors
to score. In the days when handicappers generally focused on one track,
Andy Beyer recommended taking a day before the season started in a closed
room with a year’s supply of last year’s Forms, and a bottle of Jack
Daniels. The purpose was to develop “class-par times,” which I’ve never
been too crazy about, but the result—aside from a hangover, if
you followed his instructions literally—was a good overview of a
year’s racing at your home track. You couldn’t help but pick up
on both patterning and quirks in the results charts, which would help
you deal with the beginning of a new season.
If
you follow one or two home tracks, this is still fine advice, although
that’s about the only scenario in which I’d worry much about the pars
(or the variants for which they form the baseline, but that’s another
story). However, many of us today do not follow a single track or even
regional circuit, and are more likely to be placing bets at ten tracks
or more across the country, although not necessarily on the same day.
(Albeit, there are accounts of system players who go much further than
that.) With almost unlimited availability of tracks for simulcast betting,
most bettors I know have broadened their field of play well beyond a
local circuit, though they still tend to focus primarily on tracks that
they know to some extent or have played before.
Large
populations of races for statistical studies are valuable for large,
fundamental questions, but usually small profit. The variability that
we move on as value bettors is more often short-term—sometimes instantaneous—and
a lot more profitable.
With
comma-delimited past performance and results data available fairly cheaply
on the internet, and with spread sheets now virtually standard equipment
on every computer, you may dream up your own approaches to identifying
short-term patterns at your tracks. My suggestion is not to worry about
humongous samples and fundamental questions of racing per se,
but think small and think local—local, at least, to the tracks you play,
which may be scattered across three time zones.
If
computer analyses are not your idea of recreation, you can still look
for patterning and opportunities by simply eyeballing past performances
and results charts. It is extremely handy now to use the computer to
get to race results charts provided by a number of Web sites. If for
some reason I am going to try working a track I’m not familiar with,
or just haven’t worked for a while, I’ll usually pull up some recent
results charts on the Web to see what’s going on.
For
my style of play, I like to see some “normal” variability displayed
in the odds payoffs. By that I mean a few races will show $4.20, $3.60,
and $2.40, but there will also be a healthy mix of patterns like $26.80,
$5.60, and $4.80. I especially like to see patterns like $4.80, $11.80,
$3.60, because they often indicate handicappable place overlays and,
although none of these patterns predict future events, they suggest
that the field is open and the opposition from the rest of the crowd
in shaping the odds is normal.
Once
in a while, you’ll find a pattern where the public is “On.”
Early in the year, I pulled up results from (I believe it was) Penn
National, where all the races were 1 mi 70 yds and the crowd was nailing
every race for a period of at least several days. One way to find value,
which has become more difficult with simulcasting, is to find a really
dumb crowd—that one obviously wasn’t.
For
some approaches to betting this scenario might be a goldmine, but not
for mine, so when it happens, I’m somewhere else.
However,
a great opportunity happens each fall and requires no searching
through data. It happens every year, and it is the Fall Fairs. The
fair circuit is big in California and Maryland, and my home state of
New Mexico. Some handicappers specialize in fairs and for those handicappers
the fall season is like Christmas for Macy’s; it make’s the bulk of
their annual profit. Except for a small percentage of serious handicappers,
fair crowds are rank amateurs; they can no more handicap a horse race
than a tractor pull. The horses come in from the surrounding circuit
tracks, which are generally in hiatus before changing to fall venues.
This is one of the few times when you can worry a lot less about fine-tuning
“value”—good old-fashioned handicapping comes to the forefront.
If you have a way of dealing with complete fields of shippers, this
is the time for good handicappers to reap the fall harvest.