03 January, 2016

Record prediction: an absurdity

In a previous post of mine I discussed the inadequacy of the logistic extrapolation for the prediction of the record evolution in athletics. I concluded that post saying that I did not believe that the logistic approximation was to blame. Thus I decided to perform the analysis of the data used in the aforementioned post using different mathematical expressions.

The first expression I am going to analyse here is the four-parameter logistic which has the form
$$L=A+{B\over 1+\exp({b-t\over c})}$$
where L is the length of the throw, t is the time and A,B,b,c are parameters. The standard, three-parameter, logistic expression corresponds to A=0, so adding this term brings an extra freedom which might improve the situation. The figure below shows the best fit of a four-parameter logistic expression to the data. Indeed a small improvement with respect to the extrapolation of the three-parameter logistic used in the previous post is obtained. While previously we found a prediction for the 2015 record of circa 83 m, with the four-parameter expression we obtain a prediction of 81 m, still way off.

Next I turn to another expression widely used in various domains of biology or finance, the Gompertz law. While the logistic expression is a symmetrical sigmoïd, the Gompertz function has a rapid initial growth and a slower approach to the final asymptote. The precise expression of the Gompertz law is 
$$L=A+B\exp\left({-\exp({b-t\over c})}\right)$$
again a four parameter function. It turns out that the Gompertz law leads to a much stiffer fit and an extrapolation of the record for 2015 beyond 85 m.

What is the conclusion one can draw from this analysis? To sum it up in a catchy way: "garbage in, garbage out". It is not the mathematical tool used for the analysis that is at fault. It is the fact that we are attempting an analysis based on the past trend that is to blame. No prediction can be made unless one takes into account the reality of the discipline. In the case of women's discus the fact that the record did stagnate, with the best performances of the past decade several meters behind those of the late 80s, should essentially be attributed to the better antidoping controls introduced in the 90s. Thus it is not the mathematical analysis that is inadequate but its use with inappropriate data.

At this point I cannot resist the temptation to make a remark, based on the analysis presented in an article, co-authored with Y. Charon, and published in New Studies in Athletics  29:4 (2014) page 37. The world record, despite its absolute and gripping character, should not be used in order to assess the progression of a discipline. In the figure below we show the progression of the 50th performer for women's triple jump since 1992 (shortly after the IAAF recognised officially this event) till 2012. While the world record has remained unbeaten since 1995, the performance of the 50th performer is increasing steadily, representing faithfully the progress of the discipline. 

Given the smooth progression one would be tempted to use the data on the 50th performer and extrapolate towards the future. The dashed line in the figure does just that using a three-parameter logistic formula. And, once more we obtain an absurd result. The asymptotic value we find is a mere 14.73 m, a performance I am convinced will be reached (and surpassed) by the 50th performer before the end of the decade. 

The whole point of this post is that record prediction is a very subtle business and no mathematical approach will ever work even for short-term predictions. The situation is slightly better if one considers the performances not at the very top (like the first 5) of the discipline but rather those of the elite (say the 50th performer) provided one limits oneself to really short-term predictions. But this is something any competent coach can do without using any mathematics.

No comments:

Post a Comment