A sports betting model is used to predict the probability of an outcome in a certain event. The goal is take be able to take certain data about the teams or players and have a complex formula or algorithm produce an accurate picture of what is likely to happen in the contest.
Prediction models are just one of the tools that we use in our sports handicapping arsenal in order to profit over the books. The other we have talked about is top down sports handicapping.
If you are interested in building your own betting model then this guide will help get you started down the right path.
Starting Out With Building Statistical Model for Sports Betting
Here’s a dirty little secret about modeling, you can start out by piecing situations together instead of going all in from the ground up. This can help you be directionally correct even if you aren’t 100% accurate.
What I mean by that is you can take something as simple as home field advantage and end up with a number on what that’s worth. You can take a look at rest situations and get yourself a solid number for what it means for teams to come off a bye week in the NFL, or to play on back-to-back nights in the NBA.
You don’t need to start out by sitting down and gathering every piece of data about every player in the NBA going back to 1990 and using that to model an exact score prediction for each NBA game. That’s a massive project so it’s better to start out small and work your way up.
I think it’s easier to begin with modeling situations. You have all of these building blocks that you can put together and see that a team has a certain edge over their opponent, then use a power ranking system or something to see what the number should be without your situations and to compare that with the betting odds.
How Can You Win By Modeling Sports?
I believe there are two ways that you can make your model better than someone else’s: having better information or be better at processing that information.
Today there are advanced statistics in nearly every sport. If you are trying to build a model from the same data that was available ten years ago you are going to be behind the eight ball.
Better Information
In baseball if you are trying to model without using StatCast or PitchFX then you are at a big disadvantage. In basketball if you don’t use pick and roll usage and efficiency ratings then you probably don’t know how well two teams matchup as well as someone who does.
This can also be the latest news though. If you know a player isn’t playing that night and can model how much of an impact that has on his team, that’s a big edge you can get by having better information than the market.
However, getting breaking news before anyone else does is pretty difficult to do on a consistent basis.
Processing Information
The other way you can have an edge modeling is if you are better at processing the information. What you are looking for are metrics that predictive of future performance.
One way to do that is to be better at cleaning the data than everyone else. In football there can be a lot of garbage time. Is the data in the second half of a game where a team is up 35-0 at halftime as valuable as when two teams play to a one score game? I don’t think so, but it’s up to you to decide where cleaning the data gives you an edge or has you missing out on key info.
Another way to clean out the data is looking at the weather. If you are looking at golf scores from a tournament and the morning wave got to play in absolute perfect conditions, but the afternoon was stuck with high winds and rain, of course they aren’t going to play as well, but are those scores going to have any predictive value when golfers from different waves play the next day in the same conditions? Probably not much.
How to Build a Betting Model That is Predictive
There are a few steps to building any model: gathering data, cleaning the data, figuring out the weights, testing the model, applying them to events that haven’t happened yet, and then constantly updating the model.
Gathering Data
You want to gather as much data from as many different sources as possible. I don’t want to give a complete list because if someone takes everything I do, and adds something of their own then they can pretty quickly have a more accurate model than I do.
But the basics are team and player statistics. As detailed as possible so you have a complete picture of what happened in the game. You’ll want to include external factors like weather or travel as well.
Cleaning the Data
Once you have gathered all the statistics and external factors that you want to use, it’s time to clean that data. Here you are looking for situations that might mean what happened in the game isn’t predictive for how a player or team will perform in the future.
Some examples of that are extreme weather, injuries, or blowout situations.
Figuring Out the Model Weights
Once you have all of the statistical metrics you are going to use, it’s time to figure out how much to weight them.
When I first started linear regression in Excel was a tool that provided more than enough power to model and win. Now you are going to want to make sure that you have some programming knowledge in either R or Python.
Now you’ll want to try decision trees, regression, and neural networks to see what model does the best job of making predictions based on your dataset.
Testing Your Model
You are going to want to hold back part of your dataset in order to test your model against it. This means if you are modeling based off of data from 2013-2023, you might build the model off of the data from 2013-21 and then test it against the last two seasons to see how it would perform.
If you build the model off of the entire dataset then you run the risk of overfitting. You might be just explaining what happened instead of coming up with something predictive of what will happen in the future.
Using the Model to Make Predictions
Once you have your model weights, you want to take the data for the event you are trying to model, throw it into the model, and see what it spits out. Most modelers might stop there, but I think see if there are any situational adjustments I want to make based on information that I haven’t included in the model.
A recent example of this is when Jacksonville played back-to-back games in London. I couldn’t put an exact weight on being able to adjust to the travel by staying a longer period of time, but I knew there had to be some edge. This is where a little bit of the art meets the science of modeling in order to improve your handicapping.
Updating the Model
You are going to come up with new ideas to include in your model. During the season a situation will arise where you will wonder how a team performs in a certain situation. Or a new stat will come out that looks promising.
Whatever it is, it’s time to go back and add the new data to your dataset and build a new model.
That’s what makes modeling difficult, because it’s not a one-and-done process, but that’s what makes it exciting. You are constantly trying to increase your edge over the books to win more of your bets.
How to Model Early in the Season Without Any Game Data
One of the trickier parts of modeling is early in the season, especially for college sports. You don’t have any current season data to use for your model, so you have to create priors. This can be done by taking previous year statistics and making modifications based on age and expected usage.
With new players it’s even trickier. You can use their physical attributes or recruiting ranks to give you an estimate, but at the end of the day you have to work hard to figuring out the accurate priors to use.
What Skills Do You Need for Modeling Sports
Modeling isn’t the handicapping tool everyone should use. You need a little bit of an education in basic statistical concepts like probability and regression. You should have some coding ability.
If you don’t have much experience with those it’s not hard to learn, but you have to be willing to put in the work. There are books out there you can read such as Football Analytics with Python and R. That is a great place to start.
The other thing that’s important is that you start with one sport and specialize. You are going to put a lot of effort into building this model, and it’s going to need constant love. Once you get to a nice place with one sport, it’s ok to venture into another. But starting out you are going to definitely put all of your time and attention into one sport and break off from there.
I think you can also hire a lot of this out. You can definitely find someone who is willing to write scripts to scrape data for you. I’ve done that myself.
You can even hire data scientists to build the model, but the only caution I’d throw there is that the more people who know what goes into a profitable model the quicker the market is going to adjust.
Conclusion
Modeling is a great way to improve your sports betting. It will give you an accurate number and if that number is more accurate than the sportsbooks you can make a lot of profitable bets and win a lot of money. It’s a tool that I really think is a must have in your handicapping toolbox.