At this point - would this be the worst way to train our back four?
So I've decided to turn my hand to data analytics. This is an interesting development because I understand neither data nor analysis.
I'm just having a go.
I'm a guy with some data, a laptop and a platform on which to express my uninformed view. Jeremy Hunt with a blog, if you will. I'm the Secretary of State for Idiots Having A Crack At Analysis.
I would LOVE some of those proper football analysts who I greatly admire (and are linked at the side of this blog) to take this article and turn it into something coherent, but for now you'll have to make do with my ham fisted attempts in the same way that we have to put up with Jack Whitehall being a "comedian".
2. It's Not The Years Honey, It's The Mileage
I would LOVE some of those proper football analysts who I greatly admire (and are linked at the side of this blog) to take this article and turn it into something coherent, but for now you'll have to make do with my ham fisted attempts in the same way that we have to put up with Jack Whitehall being a "comedian".
2. It's Not The Years Honey, It's The Mileage
So, why am I writing this at all? Well, it's primarily due to the fact that West Ham appear to have forgotten how to defend. And I mean this genuinely. Trailing 3-0 at West Brom our entire set of outfield players went forward for a corner and then looked baffled when the home side broke away to score a fourth. That marked the twelfth occasion during the previous sixteen games that West Ham had conceded at least twice, and they repeated the trick again this weekend against Southampton.
Indiana Bilic is in search of the lost art of defending, and he probably won't find it anywhere near James Collins.
The genesis of this article was actually a moment during a game on Sunday 2nd February, 2003. West Ham were at home to Liverpool and 2-0 down within 9 minutes. As this was during our tragicomic attempt to avoid relegation under Glenn Roeder, it occurred to me that by continually going behind in games we were making our task significantly harder.
Indiana Bilic is in search of the lost art of defending, and he probably won't find it anywhere near James Collins.
The genesis of this article was actually a moment during a game on Sunday 2nd February, 2003. West Ham were at home to Liverpool and 2-0 down within 9 minutes. As this was during our tragicomic attempt to avoid relegation under Glenn Roeder, it occurred to me that by continually going behind in games we were making our task significantly harder.
As it turned out, this was unsurprisingly correct, as that encounter was during a run where West Ham won just once in 17 games, and conceded first in a whopping 14 of those matches. Relegation followed and Roeder ended up in hospital with a brain injury, probably brought on by spending two years wondering why the fuck Harry Redknapp tried to replace Rio Ferdinand with Christian Dailly and Rigobert Song (Too soon? That might be too soon).
3. Fortune And Glory, Kid, Fortune And Glory
3. Fortune And Glory, Kid, Fortune And Glory
Conversely, last year West Ham recovered no fewer than 18 points from losing positions. And as I watched them repeat this trick through the season I began to wonder about how sustainable it was. Could a team really expect to go behind this often and keep picking up points? If this flaw doomed Roeder, then why wasn't it doing the same to Bilic?
As an aside, how is it even possible to go 2-0 down against Sunderland in the first place? Remarkable.
Anyway, in order to establish what this meant, it seemed to me that I first needed to understand the impact of going behind in a game.
In an ideal world, we would be able to calculate a Win Probability Added (WPA) statistic based upon a variety of factors each time a goal goes in. These would include:
- venue
- game situation
- time of goal
- strength of scoring team
- strength of conceding team
No need to imagine any of this folks. It's all fairly basic and West Ham have been good enough to concede all manner of comedy goals this year to help us illustrate the point.
Thus, Diego Costa scoring in the 89th minute of a game at Stamford Bridge to put Chelsea 2-1 up would appear yield a far greater WPA than Nacer Chadli scoring to put West Brom 4-0 up at The Hawthorns in the 56th minute.
Again, this seems fairly obvious but it is important to document my train of thinking because I am desperately keen for people to believe that I have a train of thought, and am not just making this all up as I go, whilst randomly inserting a few pictures of Indiana Jones into the article to distract you.
The problem with ideas like this is that the data necessary to produce such an analysis is not easily available or accessible. It is closely guarded by the likes of Opta and not available to people like me who presumably would only use it for evil purposes like proposing earth shattering theories like "West Ham should stop going behind in games".
It's a shame, as I think that the analytic community at large would advance the cause significantly if simply given the information, but that's not where we're at so I had to figure out what I could actually do myself. Answer - not much, but I did spend ages doing it. A bit like when my wife asks me to fold the washing.
4. 'X' Never, Ever Marks The Spot
As it was West Ham's 2015/16 season that triggered this all off, I decided to to start there. Opta have made public the following table:
As an aside, how is it even possible to go 2-0 down against Sunderland in the first place? Remarkable.
Anyway, in order to establish what this meant, it seemed to me that I first needed to understand the impact of going behind in a game.
In an ideal world, we would be able to calculate a Win Probability Added (WPA) statistic based upon a variety of factors each time a goal goes in. These would include:
- venue
- game situation
- time of goal
- strength of scoring team
- strength of conceding team
No need to imagine any of this folks. It's all fairly basic and West Ham have been good enough to concede all manner of comedy goals this year to help us illustrate the point.
Thus, Diego Costa scoring in the 89th minute of a game at Stamford Bridge to put Chelsea 2-1 up would appear yield a far greater WPA than Nacer Chadli scoring to put West Brom 4-0 up at The Hawthorns in the 56th minute.
Again, this seems fairly obvious but it is important to document my train of thinking because I am desperately keen for people to believe that I have a train of thought, and am not just making this all up as I go, whilst randomly inserting a few pictures of Indiana Jones into the article to distract you.
"I don't know what linear regression is"
The problem with ideas like this is that the data necessary to produce such an analysis is not easily available or accessible. It is closely guarded by the likes of Opta and not available to people like me who presumably would only use it for evil purposes like proposing earth shattering theories like "West Ham should stop going behind in games".
It's a shame, as I think that the analytic community at large would advance the cause significantly if simply given the information, but that's not where we're at so I had to figure out what I could actually do myself. Answer - not much, but I did spend ages doing it. A bit like when my wife asks me to fold the washing.
4. 'X' Never, Ever Marks The Spot
As it was West Ham's 2015/16 season that triggered this all off, I decided to to start there. Opta have made public the following table:
Team | Points Gained From Behind | Final League Position |
---|---|---|
Tottenham | 19 | 3 |
West Ham | 18 | 7 |
Leicester | 14 | 1 |
Southampton | 14 | 6 |
Liverpool | 13 | 8 |
Swansea | 13 | 12 |
Sunderland | 12 | 17 |
Arsenal | 11 | 2 |
Crystal Palace | 10 | 15 |
Man City | 10 | 4 |
WBA | 10 | 14 |
Chelsea | 9 | 10 |
Newcastle | 8 | 18 |
Stoke | 8 | 9 |
Everton | 7 | 11 |
Norwich | 7 | 19 |
Watford | 7 | 13 |
Bournemouth | 5 | 16 |
Aston Villa | 4 | 20 |
Man Utd | 4 | 5 |
Now the most obvious comment to make here is "Man Utd - LOL", but there are some other points worth noting.
a - The teams who did best were among the better teams in the league. This might be circular but it's also logical; you would expect good teams to have more chance of recovering points than, say, Aston Villa who were less a football team and more a collection of travelling acrobats.
a - The teams who did best were among the better teams in the league. This might be circular but it's also logical; you would expect good teams to have more chance of recovering points than, say, Aston Villa who were less a football team and more a collection of travelling acrobats.
b - The question here is therefore perhaps whether or not we think West Ham were a truly good side, or one who over performed and got lucky last year.
With an average return of 10.15 points for all Premier League teams from losing situations, we can see that last season West Ham picked up an additional 8 points from such positions as compared to the average team.
c - As fans we want to attribute this to some sort of voodoo magic wherein we establish that team spirit and resilience actually mean something, and that whatever "it" is, West Ham have "it". But I don't really think that's true. It's just a made up projection like Santa Claus or Eskimos.
c - As fans we want to attribute this to some sort of voodoo magic wherein we establish that team spirit and resilience actually mean something, and that whatever "it" is, West Ham have "it". But I don't really think that's true. It's just a made up projection like Santa Claus or Eskimos.
However, without access to tonnes of historical data I didn't have much choice but to revisit West Ham's own historic results and at least figure out if this was unusual for us, let alone anybody else.
None of this was difficult as my employer once sent me on an Advanced Excel course and when I got back, my boss asked me what I had learned. I proudly showed him how to turn his Excel spreadsheets upside down, which led to a week of high jinks in the office before it was generally accepted that I probably hadn't made the most of that particular opportunity.
However, using the general excellence of EuroStats I found some interesting things (last 11 seasons of West Ham league results):
None of this was difficult as my employer once sent me on an Advanced Excel course and when I got back, my boss asked me what I had learned. I proudly showed him how to turn his Excel spreadsheets upside down, which led to a week of high jinks in the office before it was generally accepted that I probably hadn't made the most of that particular opportunity.
However, using the general excellence of EuroStats I found some interesting things (last 11 seasons of West Ham league results):
Conceded First Goal | Didn't Concede First | |
---|---|---|
Won | 26 | 119 |
Drawn | 36 | 40 |
Lost | 143 | 28 |
Goalless Draw | 0 | 34 |
426 | 205 | 221 |
So, in our 426 games, we recovered just 26 times out of 205 instances where we went behind to go on and win games.
Presented as percentages, in games where a goal was scored (i.e: Excluding all those Allardycian 0-0s) we see the following:
Presented as percentages, in games where a goal was scored (i.e: Excluding all those Allardycian 0-0s) we see the following:
Conceded First Goal | Scored First Goal | |
---|---|---|
Won | 13% | 64% |
Drawn | 18% | 21% |
Lost | 70% | 15% |
So, over a ten year span (including one season in the Championship) West Ham have lost 70% of all matches when they have conceded first.
You guys, I'm beginning to wonder if we should stop letting in the first goal? Or maybe just stop letting in any goals at all? Feel free to mention this Slaven Bilic next time you see him in Mothercare.
5. Nothing Surprises Me; I'm A Scientist
If we dig a little deeper, and break down the numbers by manager, we can see the following:
You guys, I'm beginning to wonder if we should stop letting in the first goal? Or maybe just stop letting in any goals at all? Feel free to mention this Slaven Bilic next time you see him in Mothercare.
5. Nothing Surprises Me; I'm A Scientist
If we dig a little deeper, and break down the numbers by manager, we can see the following:
Year | Conceded First Goal | Points Gained | Behind In Game | Points Gained | Finishing Position | Manager |
---|---|---|---|---|---|---|
2005/06 | 20 | 18 | 3 | 0 | 9 | Pardew |
2006/07 | 24 | 9 | 2 | 0 | 15 | Pardew/Curbishley |
2007/08 | 20 | 14 | 2 | 1 | 10 | Curbishley |
2008/09 | 17 | 8 | 3 | 0 | 9 | Curbishley/Zola |
2009/10 | 20 | 5 | 4 | 2 | 17 | Zola |
2010/11 | 22 | 9 | 4 | 0 | 20 | Grant |
2011/12 | 15 | 17 | 2 | 0 | 3 | Allardyce |
2012/13 | 18 | 7 | 5 | 0 | 10 | Allardyce |
2013/14 | 15 | 3 | 7 | 1 | 13 | Allardyce |
2014/15 | 18 | 9 | 2 | 0 | 12 | Allardyce |
2015/16 | 16 | 15 | 2 | 3 | 7 | Bilic |
Conceded First Goal - Any game where West Ham conceded first
Points Gained - Points gained from any game where West Ham conceded first
Behind In Game - Any game where West Ham took the lead and then lost it
Points Gained - Points gained from any game where West Ham took the lead and then lost it
Now, all sorts of conclusions can be drawn from this. Unsurprisingly to me, we see that the higher up the table the team finished, the more likely it is that they were able to rescue points from losing positions. Better teams having a better chance to recover - that seems uncontroversial.
Interestingly, we can also see that Allardyce didn't go behind very often but when his teams did, they rarely recovered. Having watched so many of his insipid away performances this fits perfectly with my recollection of his time in charge.
I can't help but think that Allardyce read these articles at The Power of Goals and 5 Added Minutes which detailed how it is borderline impossible to come back from a 2-0 deficit. As such, his teams dug in and tried to sneak an equaliser rather than risk going two behind. This explains a lot of very tedious games over that 4 year period.
IT ALSO DOESN'T EXPLAIN WATFORD, SLAVEN.
6. It's Time To Ask Yourself; What Do You Believe In?
However, there are examples of shit West Ham teams (Hi Avram!) who are also above average in this area, suggesting that recovery in itself isn't a skill per se that teams have, and is instead probably just random variation. We may see West Ham teams do well over the course of a single season, such as last year, but over longer periods of time that performance will invariably regress to the mean.
It is interesting that the 11 year average for West Ham has been to gain 11.0 points from losing positions per year, which isn't too far removed from the league average last year of 10.15. That is again too small of a sample to suggest anything but it does hint that last year was at least unusual.
For context, West Ham have this season conceded first in 4 games and lost them all, scored first once and lost and scored first once and won. That seems to be an ultra quick regression.
7. You're Meddling With Powers You Can't Possibly Comprehend
What should be acknowledged is that Bilic did manage a twofold piece of good work last season. His team did not concede the first goal very often, relative to recent history, and they came back from such positions with great frequency.
He also managed the only instance in the last decade of a victory where West Ham took the lead, lost it and then regained it - in the final game at Upton Park against Manchester United. I was shocked to discover this, but if you think about that chain of events, it feels reasonable to assume that most of those goals for opposition teams would happen relatively late in the game, presumably not leaving much time for a comeback. Also West Ham have been crap for quite a lot of the last eleven years, which really does underpin a lot of this research.
It does also suggest, however, that if you do ludicrous things like go 2-0 up against Watford, and then go 4-2 down that the likelihood of recovering from that position is fairly minimal. Thankfully we don't do stupid stuff like that any more now we've moved to our new stadium.
8. You Call This Archaeology?
I did wonder if it made much difference whether we conceded the first goal at home or away. Well, here you go:
Home | Conceded First Goal | Scored First |
---|---|---|
Won | 16% | 66% |
Drawn | 18% | 20% |
Lost | 66% | 14% |
Away | Conceded First Goal | Scored First |
---|---|---|
Won | 11% | 56% |
Drawn | 18% | 23% |
Lost | 71% | 21% |
So, yes it does, but it's not moving the needle very much. Letting in the first goal is still about as good an idea as being President, building a wall along your border and telling your neighbouring government that they are going to pay for it.
9. And What Did You Find? Me...Illumination
It's entirely possible that you may hate statistics. Many fans I know view them as representative of all that is wrong with the game. They want to see beautiful flowing football, and they want to see Matthew Le Tissier or Gianfranco Zola or Paolo Di Canio doing things that can't be cooked up in laboratories by the likes of Sam Allardyce.
Now, I'm not here to tell you how to enjoy the game you love, unless you're a fan of one of those teams who play music when they score, in which case you should take at least a short look at yourself. No, I'm just here to make a case that data analytics in football are not looking to replace the free flowing nature of the game, but instead to enhance our understanding of it and help us interpret what we are seeing.
Simon Gleave, the Head of Analysis at Gracenote Sports was kind enough to humour me in a conversation about this very point, saying "The key for me has always been stories. The numbers are simply to support the story and are not the story in themselves".
This makes perfect sense to me.
I have also heard it said that football is simply too difficult to break down into pure numbers. That it is unplanned and spontaneous and balletic and therefore defies categorisation.
Well, if you turn that thinking on it's head - how can one person watch a game of football for 90 minutes, with 22 players and hundreds, maybe thousands of discrete events happening in front of them and decide that their own opinions formed via one solitary glance at events could be sufficient to fully process and digest that game?
Analytics isn't trying to tell you what to think, it's simply trying to give you more information to think about. And that, to me, is a crucial development in the game.
It is unfortunate, however, if your first exposure to football analytics is through this column, as that would be like discovering reggae music through Sid Owen, but nonetheless everyone must start somewhere.
Bob Marley, basically
Please do not be misled into thinking that this is false modesty. I do not know what I am talking about. This is like getting Peter Andre to write a blog about Brexit. Proper analysts would identify problems and then tell you that they have fixed them with things like Poisson Distributions. The problem I have is that I don't know what that is.
I do know, however, that Poisson is French for "fish" so I instead had a tuna salad and just carried on, ignoring the glaring holes in my research. This is the beauty of being an idiot.
10. Trust Me
I apologise that this article has been so long. It's a lot to read about data and West Ham combined, and even though we've at long last brought Indiana Jones and Ricky Butcher together, I accept it's been a bit of a slog.
I also feel like the revelation at the end is a bit like that moment in Star Wars: The Force Awakens when you wait ages to find out who Kylo Ren is and then he takes off his mask and it turns out he looks like Graham from IT Support.
But still, West Ham need to stop conceding first. If they don't we will be relegated.
History doesn't always repeat but it's useful to know your own nevertheless.