Today I want to take a break from SEO to talk about sports – specifically hockey – in terms of SEO concepts.
The 2020 NHL playoffs are (finally) underway.
So in the spirit of debate, let’s find out if it’s possible to measure sports using SEO techniques – and ultimately predict the next NHL Stanley Cup champion.
Measuring Sports Using SEO Techniques
These days, all the cool SEO tools like Screaming Frog and Sitebulb include features they call a crawl map.
They’re actually a pretty cool diagnostic tool you can use for finding all kinds of SEO issues.
Here’s an example from WTFSEO so you can see what I’m talking about (and also so I can sneak in a link to).
Anyway, we were doing some cool work with force-directed graphs at work the other day and it got me thinking of an old experiment I ran last year.
While discussing PageRank’s history of originally being used to measure citations and then being adapted to the web, I wondered: “What else could PageRank be adapted to do?”
Historically, PageRank has been really, really good at measuring the authority of webpages.
My theory was, can it measure the authority of other things too?
What about using regular season sports data to see if we can find the most authoritative team in the NHL?
Wins and losses can be skewed because some team gets lucky and plays my hometown Detroit Red Wings several times per season while other teams only got to play them once.
So instead of just looking at total wins, what if we could look at the quality of wins?
Looking at the Quality of Wins
If we have a crawl map, we should be able to calculate PageRank.
Several of us old school SEO professionals have been creating these crawl maps manually for years.
I used to love doing them in Google Fusion Tables but then Google took that tool away from us (somebody please bring a version of this back, pretty please?).
Before fusion tables, we used a free tool called Gephi.
Gephi has a bit of a learning curve and there’s a ton of posts out there on other SEO sites talking about how to use it for SEO – but that’s not the scope of this post.
The best part about Gephi is that once you have your graph implemented you can run PageRank with the click of a button.
I first started looking at sports victories as directed graphs a couple of years ago.
Below is a tweet where I did this with NCAA college football, and as you can see it worked out really well.
(Actually, eigenvectors and harmonic centrality models worked better than PageRank. But I suspect modern PageRank is way different now in Google than it was in the original algorithm. And besides, if we start using eigenvectors then we lose the SEO tie in and I have to find someplace else to write this post.)
I took every 2018 NCAA D1 football game and turned it into a directed graph where every team “pointed” to the teams that beat them. Then I ran pagerank over it. The top 10 teams?
1 Alabama
2 Texas
3 Purdue
4 Oklahoma
5 Clemson
6 OSU
7 ND
8 Georgia
9 WASH
10 LSU pic.twitter.com/rouV0MtKwC— Ryan Jones (@RyanJones) January 31, 2019
So, what about hockey?
Well, last year PageRank successfully predicted that the St. Louis Blues would win the Stanley Cup.
THREAD: Got bored this weekend so I ran the PageRank algorithm over every NHL game this season. (losers were a “link” to winners.” It looks like this. pic.twitter.com/qpFGkP0afH
— Ryan Jones (@RyanJones) April 22, 2019
If you check out the above thread it shows that STL, PIT, and BOS were the best 3 teams according to PageRank, and it just so happens that STL beat BOS for the Stanley Cup.
We might be onto something here.
Calculating the PageRank of NHL Regular Season Data
So, with the NHL season about to start up again, let’s apply this to NHL regular season data.
Below is the force-directed graph of every NHL game represented like such:
Every team links to all the teams that beat them.
It’s kind of pretty once we add some color.
Side note: It’s amazing how many teams are red/blue. Here’s hoping Seattle picks a color scheme that isn’t the same as every other team.
Now, all we have to do is run PageRank.
You can see the results of that below (as well as some other stats for you data geeks):
I’ve already gone ahead and bet 10 units on every first-round playoff matchup based on the team with the higher score here.
I’ve also put some money on the Colorado Avalanche to win the Cup.
As a Detroit fan, this kills me, but Go Avs!
Who’s your team?
Do you think this model will come close to reality or will COVID-19 ultimately make the statistics unreliable?
More Resources:
Image Credits
All screenshots taken by author, August 2020