Cannon Stats

Share this post

Thinking about uncertainty

www.cannonstats.com

Discover more from Cannon Stats

Stats and analysis for all things Arsenal from Adam Rae Voge and Scott Willis. Going deep into match analysis, player scouting, transfer business and rumors, squad-building, and general transfer coverage.
Over 4,000 subscribers
Continue reading
Sign in

Thinking about uncertainty

Jeff Charles
Nov 19, 2021
Share this post

Thinking about uncertainty

www.cannonstats.com
Share
Thinking about uncertainty

One of the most common complaints about xG I hear is about the ratings of single shots.

My common refrain is that for a single shot the error bars should be in the range of plus or minus 25%. This matches pretty well with work done by @WillTGM who estimated that at the 90% confidence intervals a single shot was in the 10-15% range, with things dropping to about +/- 5-7% at 100 shots, +/- 2-3% at 1,000 shots and then settling into +/- 1-2% at 5,000 shots.

Bootstrapping – 2+2=11

Posts about Bootstrapping written by Will Gurpinar-Morgan
2+2=11Will Gurpinar-Morgan

With this, I thought it would be good to think about ways of illustrating this uncertainty. We present these graphics and numbers with decimal points (sometimes even into the 2 decimal range, I know I can be guilty of that at times) that present a level of certainty that is not warranted, especially at the single-game level.

So with inspiration from what others have done before, especially Martin Eastwood (@penaltyblog) I have made some adjustments to my standard xG vizualizations.

pena.lt/y

pena.lt/y

The new running xG Graphics

First this is what the old one looks like:

And here is what the updated one looks like.

I had made a few changes here, first I have changed my default font style because I got bored looking at the old one.

Second and more importantly, is the addition of the confidence intervals for each shot. I have gone with a bit more uncertainty in these sticking with the +/- 25%. The top end of shots is obviously still capped so that a shot cannot have a value greater than 1, so with really big chances the low end will be larger than the high estimate.

I have also added the low and high estimates range to the sum at the top, to make seeing the level of uncertainty more clear.

Overall I am quite happy with how this looks and the extra information that is presented seems intuitive (if it isn't or more information is warranted please send me a message).

The third thing I have added is the simulated match result from the shots. This is originally inspired by what StatsBomb and something that I had before when I was still making these from excel.

The simulated match result is a Monte Carlo simulation where for each shot a random value between the low estimate and high estimate (at the suggestion of @elliott_stapley, which is much appreciated) is chosen and then compared to a random number to simulate goal or no goal. These are added up and then the goals scored are compared to give the probability of home win/draw/away win. From these, I have also derived expected points for each team.

One of the things that using the low and high estimates does for these simulations is flatten the odds slightly and I think that is probably a good thing to reflect that there is more uncertainty than was reflected in the regular xG values. I am happy again to have spent the time to get this into the visualization and I like the current presentation.

Thanks for reading and if you have any suggestions I am always open to hearing them.

Share this post

Thinking about uncertainty

www.cannonstats.com
Share
Previous
Next
Comments
Top
New
Community

No posts

Ready for more?

© 2023 Scott Willis
Privacy ∙ Terms ∙ Collection notice
Start WritingGet the app
Substack is the home for great writing