AWS have released what they call the Fastest Driver in Formula 1, and even after reading their explanations, just, no.
And here’s the top 1️⃣0️⃣!
👀 ICYMI, more here >> https://t.co/ghznkqrmne#F1 @awscloud pic.twitter.com/zACUClNmTa
— Formula 1 (@F1) August 18, 2020
At the start the list doesn’t look too bad, and then you keep going and it just keeps getting more bizarre. When you look at the full top 20 it is simply ridiculous. While you can try and compare drivers over different eras with numbers and algorithms, it just doesn’t work and isn’t possible. While in my early days I did do this myself, it was a mistake and utterly baseless in anything. While you can compare teammates up to a point, when you then start comparing teammates to their previous teammates, and then their teammates, it stops working.
A standard algorithm is still written by someone, algorithms can be written with bias, whether intended or not, stats can also be shown with bias by not giving the full picture. My day job also often involves stats, and I know exactly how to show bias in stats and graphs to display what you want them to show. While I’m glad AWS have somewhat explained how they came to these numbers, they still mean absolutely nothing.
AWS have said the figures came from machine learning, using every qualifying session since 1983, and compared drivers to their teammates, but only where there are drivers with 5 or more races as teammates, and then comparing teammates of teammates and so on. But this brings a problem, for one, there are not enough data points in that selection to be anywhere near accurate.
Using what AWS have said, to compare Hamilton and Verstappen, you’d have had to compare Hamilton to Button, Button to Alonso, Alonso to Raikkonen, Raikkonen to Vettel, Vettel to Ricciardo, and then Ricciardo to Verstappen. To compare Senna with some of the other drivers it’s an even more ridiculous link than that. While in other applications these links may work fine, in F1 it doesn’t, there are too many variables and different drivers suit different cars.
While I have so many more issues with this specific data set, I have more issues with some of the other AWS graphics and data, specifically some of the ones they often show during practice/qualifying/race sessions. The tyre graphic means basically nothing, they cannot know the actual life of the tyre, it will be based off time data, laps on the tyre and the wear rate of the circuit, but the number they give is purely a prediction but commentators often take it as fact.
Then there is the ratings they give for high speed corners, which I think are out of 10, but they don’t say, and you can get the fastest driver having a rating of 7.6 or something like that. That makes no sense, where does this number even come from, it makes no sense.
Stats need proper context to be taken properly, without full context you can claim all sorts of stupid things. You can technically claim that Pastor Maldonado is the best F1 driver of all time, because he’s won 100% of races he started on pole, while Fittipaldi only has 66.67%, Alonso has 63.64%, Button has 62.50% and Schumacher 57.35%. Now that’s obviously a ridiculous statement, and it’s meant to be, but it’s proving my point that context is required to properly use stats. I could also claim that Marcus Ericsson is the best Swedish F1 driver this century, technically it’s correct, if anyone disagrees they are 100% wrong, because he is the ONLY Swedish F1 driver this century, again, context.
There’s a Twitter account of @BadF1Stats, while they rarely tweet any more, a lot of the tweets are all “technically correct” and utterly ridiculous, stats need context, and AWS rarely give any.