Monday, July 23, 2018
Release the Data

It's time. Well, in truth it's overdue, but better late than never. Time for what? It's time to release the data -- the historical data that represents the results of Thoroughbred horse races. For argument's sake, let's say data that is at least six months old. You see, these data are not readily available to the general public, like box scores in baseball. Actually, that's not quite true. Historical racing data can be purchased. But that's different. It's time to give it all away. As far back in time as is viable.

Don't fret. I'm not in any way suggesting that live past performances be made available free of charge. Not that I couldn't suggest that, but such a fanciful notion would be quickly dismissed. The suppliers of past performance data for handicapping -- our friends at America's Turf Authority and several others -- would have a bit of an issue with that. Even the most rudimentary past performance data, as long as it was free, would quickly eat into their bottom line.

Instead I'm suggesting that historical data be set free, with a lag time of maybe six months. Yes, the revenue stream that currently accompanies the sale of these data would go away. But how significant can that revenue stream really be?

This is not to say that the Daily Racing Form, BRIS, recent plucky upstart TimeformUS and others would be prevented from selling their historical data to any interested buyers. Each data provider injects and derives their own unique value-added elements into the source data. The daily market for past performance data would be unchanged.

What we are left with to discuss then is the industry-run Equibase, the parent source of all flavors of racing data. And we're talking about a decent amount of data -- if every racing start DARPA_Big_Datais a data point then the volume of data theoretically available is nontrivial. But this really isn't a big deal from a consumer's perspective. Not anymore. In 2013 this kind of analysis can be done on a cheap laptop. Sitting on the couch. In your underwear. And in a few years time your hand-held device or tablet will be able to handle it.

What good would it do for Equibase to become an "open-data" platform of sorts? It's their data. The effort, cost and infrastructure required to collect, store, and distribute racing data is far from negligible. That said, the Thoroughbred racing industry will benefit greatly in the long run if the data is set free.

A subset of all horseplayers are "database" players. They build databases of racing data and slice and dice to their heart's content looking for trends and angles. They're looking for an edge. The database players would no longer have to pay for the raw data that they need, unless they want to append proprietary speed figures of a certain flavor or some other value-added elements. And there are surely current horseplayers who would quickly evolve into database players (i.e., more engaged, more wagering) given free access to racing information. Give these players the data they need to search for their edge.

Most bettors, however, would not be directly impacted by free historical data. They lack the skills and desire to go hands-on with complex data manipulation and analysis. Instead they would interact with the data through the many "info-mediaries" that would pop up. Entrepeneurs with a vision would build products that leverage the data in some way. They'll build products to help bettors make their selections. These products and services already exist of course, but their numbers would grow exponentially. If there is a downside to this, I can't see it.

And the most interesting and beneficial (for the racing industry) uses of the data will be the ones we never saw coming. This is kind of the exciting part. Release the data. Let the innovators innovate. Maybe we'll hear someone say "you know, a vector of integers and fractions really isn't an intuitive way to visualize a horse running around an oval so we invented [cool, amazing new way to show past performances].

Release the data. I still stubbornly cling to the belief that the racing industry wants more people to bet on horse racing and horse bettors to bet more. How can the end-game beachpartyunderlying any promotion or strategy be anything else? But when you look at the various initiatives under way these days, it's tough to make a case that increased wagering is the ultimate objective.

Maybe the real goal is to have racetrack attendees look like a cross between a Golden Globe Awards show and the beach at Panama City during spring break. That's all well and good, but it won't do your handle any favors.

Open access to historical racing data would directly create new horseplayers and encourage current horseplayers to wager more. X will cause Y. No imagination needed. Sure, I can't prove it. But I strongly believe it.

With minimal investment in setting up a platform to release the data the ROI could be impressive. It won't take much. The data will have to exist in a reasonably convenient and machine-readable form. The data will be provided under the most lenient terms with few if any restrictions. That last part is terrifying, I understand.

Current costly industry initiatives that pass off impressions and website visits as important metrics do not readily reveal themselves as clear tactics to attract new bettors. But I could be way off base too. Who knows, we may hear this in a future NHC winner's victory speech: "There I was working the fields back in Kansas. I saw that bus go by on the highway and knew just what I had to do. I stepped down off that tractor and just kept walking..."

Release the data. And don't do it quietly. Make a loud splash.

Maybe you've heard of kaggle. This company started out as a platform for predictive modeling competitions and still does that kind of thing. Imagine the racing industry hosting a kick-off modeling competition that accompanies the roll out of open access to racing data. Who can build the most accurate model to predict winners of Thoroughbred horse races? Figure out the particulars of the contest and the data. Offer a prize about the same as the annual salary of an America's Best Racing brand ambassador. Or better yet kick it up a notch. Imagine 200 teams from all over working with racing data, including teams of engineering and physics students from MIT and Stanford. Maybe some contestants will be inspired to drop out of school, check-in to the Oasis Motel, and become a degenerate horseplayer. I hope not.

Well we're at it, let's let every statistics/data science department at every institute of higher learning in the United States and Canada know that there is a fun, new, huge data collection out there to hack away at for teaching purposes. I like the idea that thousands of horseplayers-to-be would become intimately familiar with racing data on a daily basis, don't you?

Equibase promises to "leverage information to serve the fan base and help promote the sport." Release the data. It just might be one of the very best ways to keep those promises.