01 September, 2015

On the abuse of the logistic extrapolation

I was riffling through the book of Juilland when I stumbled upon tha page with record projections and predictions. Juilland obtains these predictions by extrapolating, based on exponential curves (his terminology), from records established between 1950 and 1999. (I surmise that by "exponential curves" he means a logistic expression, but more on this later). 

It was the Discus throw that grabbed my attention where the predictions were really out of this world. Juilland predicts more than 77 m for 2000, 92 m for 2050 and 101 for 2100. Today we know that Juilland's predictions will never materialise. Since the 80s and the 90s, women's throwing events have taken a serious downturn and it took 25 years for women to reach again 70 m (with just two performers above this mark and a still more than 5 m below the world record).

How could Juilland's predictions be so wrong? A first answer to this is that whatever you do you cannot predict the evolution of the world record. Last year, while presenting a talk on athletics, I chanced the prediction that we should see the first throw beyond 80 m in women's hammer this year. That was a easy one and I would never chance a more daring prediction. So, why are people, including really serious writers like Juilland, think they can make sensible predictions? The answer is: "too much confidence in mathematics", especially by people who are not really trained in mathematical modelling. 

So let us see how the predictions can get off the mark. The usual tool for extrapolating from a set of given data on athletics is the logistic equation. It has the expression
$$L={A\over 1+\exp({b-t\over c})}$$
and its graph has a sigmoïd form.  

It is supposed to represent the mean evolution of records over time, with a period of rapid progress followed by a slow-down. For very large times the value of L approaches asymptotically A. The way the logistic equation is applied in practice is by obtaining the best fit of such an expression to a set of given data, which allows to fix the values of the parameters. Once the latter are know one can obtain the value of L for any time and thus make predictions for the future evolution of records. 

While the assumption that the progression of records follows a sigmoïd curve is quite reasonable, the blind faith in logistic-based predictions may lead to absurd conclusions. We can illustrate this through some examples. The fit below was based on the women's discus world records registered between 1923 and 1965 i.e. just before the 60 m barrier was broken.



The values I obtained were A=58.87 m, b=1923 and c=9.664 (both in years). This is really funny: the prediction for the future evolution of the world record gives a value which is lower than the already registered record! Once you have seen this, how much confidence can you have in logistic-based predictions?

But how about the future, the "real" future now. I haven't tried to reproduce Juilland's predictions but rather used all existing world records from 1923 to 1988. In fact, I am not 100 % whether he has used a logistic fit or he just opted an exponential one of the form 
$$L=A(1-\exp({b-t\over c}))$$ In the figure below you can see the best fit corresponding to parameters A=90.31 m b=1939 yr and c=31.52 yr leading to prediction for today of almost 83 m. 



However we know now that the discipline of discus throw for women did not continue to evolve as it did in the 80s and thus the fit based on the pre-88 records is misleading. Not having a good criterion that would allow to eliminate suspicious performances I decided to redo the fit by adding a ficticious 2015 world record equal to the current 76.80 m one. 



The change in parameters is dramatic. We find now A=83.28 m b=1934 yr and 27.29 yr which would lead to a 2015 prediction around 79 m.

What can we deduce from these analyses? For me the main conclusion is that a logistic fit is next to useless when it comes to predicting the future in athletics. And I am convinced that this is not just a fluke of the logistic equation. (In a future post I will review some of the expressions commonly used in order to make extrapolations based on a set of existing data). It is the very nature of athletics that makes the evolution unpredictable in the long run. Rules, equipment, training methods, to name but a few, do evolve over time with, as a consequence, serious bumps in the record evolution, which cannot be smoothed out by some mathematical tool. 

No comments:

Post a Comment