The Athletic does some bad data journalism
Using a way too small sample when they don't have to
This morning I was checking my email and one of the stories from the daily Athletic newsletter caught my eye. A solution to Arsenal’s goalkeeping dilemma: Play Raya at home and Ramsdale on the road.
It is not quite as interesting as I hoped rehashing a lot of the same type of story that we heard, Mikel Arteta promised innovative usage and rotation but that hasn’t happened, to some of the mistakes Daivd Raya has made and if that opens up to more questions.
Ultimately it ends with an interesting suggestion to switch from Cup/League rotation to Home/Away rotation.
I found the idea interesting but what has me here writing is not the suggestion but rather the graphic that accompanies it.
I thought it was interesting but it didn’t really match what I have seen given how much time I have spent on this topic. So I went about trying to replicate the data to understand the samples that they are using here.
The David Raya one is pretty easy and I got the exact same results but it does have big questions as to why you would want to use this small of a sample using just his League games at Arsenal.
This is just 10 matches and 25 shots on target. That is a CRAZY small sample size, especially for something as volatile as save percentage. Even a single season’s worth of shots is just starting to scratch the surface of having more signal than noise in it.
The Ramsdale part wasn’t quite as neat to replicate form the data I pulled straight from the match logs of FBRef.com but it does seem close enough that maybe there is some minor query part where some blocks are being counted as on targets being faced or something (but still a little weird that the data was off). The FBRef data has him at a 66.1% save rate at home vs 65.6% in the data here and 72.1% vs 71.6% at home.
This at least is a decent-sized sample of 77 matches played and 274 shots on target faced.
What I don’t get is why throw away and disregard the other Premier League games both of these keepers have played? There doesn’t seem to be a hypothesis floated that gives a reason why only at Arsenal should they have these splits and it wouldn’t have been the case at Brentford, or Sheffield United, or Bournemouth.
Because I think that is relevant data I have gone and pulled it and presented it below:
The story they are trying to tell kind of falls apart here however looking at just save percentage. It doesn’t have to fall apart to try and suggest the same idea but it does require looking at something a little more advanced than a simple save percentage.
If instead of using save percentage they had used goals saved compared to post shot xG instead they could have told the same story, had it been on stronger footing, and not had to cherry-pick data to make their case.
I do wonder if they didn’t go this route because even at this the difference between them is not that great and it still shows that Ramsdale is at best right around average, with a pretty minimal gap to Raya.
Anyway, rant over. Thank you for indulging me. If you ever see anything like this that gets you going feel free to send it my way and I can take a look at it as well.
Great stuff, Scott
Good analysis by you and pretty shoddy by the Athletic(which is usually pretty good). One of my unprovable opinions about goalies is that fans lack sufficient data to make informed decisions about goalkeeper performance. This is particularly true on shot saving. Do some goalies make routine saves and more non-routine saves than other goalies--presumably yes but there aren’t enough examples in live play for me to realistically compare. On the other hand, goalkeeper coaches and team managers know if a player can cover the goal and what plays they can make from practice repetitions. I also think that I am attracted by what looks really neat (bright and shiny objects)--so if Ramsdale makes a great looking save against Leicester I assume he is a world beater. But I really can’t tell for a dozen different reasons if that proves anything. Same with ability to make passes to different parts of the field. Team management see what Raya can do and what Rambo can do at practice and that’s a better predictor of skill than me watching what looks like a bad pass but could be several other things. So short summary--trust the manager on picking goalies and assume he knows much, much more than we do. It’s still bothersome because I really like Rambo but the idea that I can tell if he’s better than Raya based on my look and feel doesn’t make much sense. Stats help but it’s still a real small data set.