I was cleaning out my Google Drive last weekend and came across a document I made back in 2010 with 7 games still to play in the English Premier League season that predicted the final Premier League table perfectly. In order to do so I created a new model for calculating a team’s strength of schedule and have set out below to predict the end of this year’s Premier League final table.
The highlights for those tl;dr: it looks a lot like the current table, with Liverpool winning the league and Cardiff, Sunderland and Fulham going down. Below is a bit of explanation about my algorithm and the main point is that strength of schedule has a huge impact on a team’s final place in the table. I have made all of the data completely available. Let me know what you think!
Back in 2010 everyone was saying Manchester United’s run in (the strength of schedule for their remaning games) was easier than Chelsea and would finish top of the table, and Arsenal, with the easiest of all, had a goog chance of winning out.
Something just didn’t seem right to me, mostly that the pundits were not taking into account home field advantage, which has been proven to offer a significant statistical advantage to the home team (and coincidently european soccer more than any other sport). Suggested reading by Freakonomics and Northeastern University if you’re interested, but the number one reason? The refs.
In America, determining the “Strength of Schedule” is a big deal. For college football, a team’s ranking decides whether they will play for the championship/playoff or bowl game, and millions of dollars of ad deals as a result. Their ranking is created based on who they have played and how difficult those opponents were, based on how they did versus other teams. Calculated formally as:
This may look some some serious math, but it’s actually very simple. Basically they are valuing a team’s own performance as twice as valuable as the average of all their opponents performances by simply averaging their win vs loss record compared to the average win/loss record across all those that they played.
Footie Strength of Schedule
All that truely matters in European soccer is your league points. If you win a game, you get 3 points. Draw (a tie) is 1 point. This doesn’t determine placement in the playoffs or anything else (kind of with Champions League, but that’s another story). Simply put: who ever has the most points at the end of the season, WINS THE LEAGUE. Since this is the key performance indicator (KPI), I started with simply taking the combined number of points of all the teams a specific team still has to play and summed the number of points those teams have collectively. The more points they have, the harder the team’s schedule as their opponents have won/drew more games.
In this regard, the pundits were right. Arsenal’s opponents had a combined 290 points (1.34 points per game [ppg]), ManU’s opponents were 317 (1.46 ppg), and Chelsea’s opponents had total of 325 points (1.50 ppg). A significant advantage for Arsenal, and ManU with a sizeable advantage over Chelsea. Remember, the fewer points their opponents have, the more that team is likely to win. Would seem Arsenal could win all their games, and ManU should do better than Chelsea. So far so good…
But wait, there’s more
Problem is, things were very different if you factor in home vs away performance. Any Arsenal fan can tell you that although Stoke may look like a crap football team on paper (or in person), no one wants to go to the Midlands. Ever.
This also is true in American sports. Let’s say the Michigan Wolverines football team has two games on their schedule: unranked Akron Zips and top ranked Oklahoma. What would be more impressive, beating the Oklahoma Sooners at home and the Zips away, or beating Akron at home and winning a road game at Oklahoma, something only two teams have done in over a decade from 2000-2010? This level of difficulty needs to be factored into the strength of schedule.
So, back to my rankings for the 2010 Premier League, what happens when you factor in Home vs Away performances, throughout the season, for the opponents? Drastically different rankings. Chelsea’s opponents only have 166 points (x per game), ManU’s 175 points, and Arsenal’s a leading 178. So instead of winning all their games, Arsenal drew with Birmingham, loss to Tottenham, loss to Wigan, drew with Man City, and another loss to blackburn, and won their final game. Ouch. Chelsea coasted through and won the title.
So What About 2014?
Honestly, I look at the data and I’m surprised by the ubiquity between the current table and the projected table. One thing the algorithm does not take into account is the difference between Fulham playing Southampton at home versus Arsenal playing Southampton at home. That said, I believe over 6 games, the averages weigh out and the winner is clear.
There are two very important games coming up this weekend. One is the Arsenal vs Everton game. Everton has a much easier schedule, expecting over 11 pts versus 9 pts for Arsenal, but an Arsenal win this weekend will settle the race for 4th.
The other is Sunderland’s two matches in hand, but yet they are playing Tottenham, Everton, Man City, Chelsea, and Man Utd (I know, I know). It starts with a crucial game versus Norwich this weekend.
Overall, the data speaks for itself, but upsets make the Premier League the best, most entertaining in the world. Enjoy and please let me know what you think. Feedback appreciated!
Again, all the data is available on my Gdrive here: https://docs.google.com/spreadsheet/ccc?key=0Am3HmWheCrH3dExDb0xqcktjdUVJa3ljc0tSbVNneWc&usp=drive_web#gid=23