In a previous post, I showed results from a model that gives a 23 percent change of victory for Germany in this year’s World Cup. It is the highest score. So, can I jump and say that Germany will win it, or worse yet, can I bet my savings that it will do so?

Short answer is: No.

As for a long answer, let me start reminding that I know nothing about football (soccer) and can’t even name a single player from Germany’s national team other that Bekenbauer, whose name I probably even can’t spell right. I am not even sure that he’s not a tennis player :-).

Keep also in mind that the model used for predicting winners for each match is very simple and that the main goal for this exercise, aside from being a learning experience for the intern, is to provide, at best, a common sense level understanding of how this Cup might play out.

This been said, the rule of thumb on making statistical predictions is to go back and see, whenever possible, how your model might have worked in the past for events that already have happened.

So, what would this model say before the last World Cup? I did not run it with data from the 2010 cup, nonetheless, it is safe to say that the winner, Spain, would probably be no better evaluated at that time than it is now. In the current model, it has about 5% chance of winning and this is already impacted by the points Spain amassed in its winning campaign of 2010.

Thus, if, back in 2010, you took the country with the highest probability of winning and told everyone that it would certainly be the winner, you would have been wrong.

This is not to say that the model is or was absolutely wrong. The problem here is abusing it by using results beyond what they tell us. A 23% percent chance of winning, despite being the highest in the table, only says that out of 4 or 5 World Cups, Germany would win one of those. There are still 3 or 4 other potential World Cups where Germany does not win.

Going forward with this reasoning, an interesting thing to do is to contrast the model’s given probabilities with the list of actual winners of World Cups. The model has Brazil with a 22% chance of winning. This is a little over 4 out of the 19 past World Cups. The actual number is 5. For Argentina, the model hits the mark: a 13% chance of winning is equivalent of winning 2 of 19 past cups, what they actually have done.

Results being close is no surprise as the model is built on the past performance of teams in those 19 World Cups. On the other hand, their discrepancies can tell us a few things. First, we see that the model leaves some chance for the victory of teams that have never won before, which is a good thing. Second, the fact it is based on points and not on wins is evident from what it says about Uruguay, predicting no victory for it. Despite having two wins, Uruguay has almost only half the number of points of Argentina, the other nation with two World Cup wins.

And lastly, it shows that this year, though we cannot say it will win for sure, Germany indeed seems to have a nicer path to victory than expected.

I thank Andre Luchine, Beto Boullosa, Charles Queiroz, Fernando Varejão, Marcio Eduardo Bezerra and Neca Boullosa for their consulting on the inner workings of the World Cup and Eduardo Viotti for questioning the model’s performance against the past.