Jeff Kaufman's Writing
https://www.jefftk.com/p
Jeff Kaufman's Writing on predictionen-us/p/lilacs-bloomingLilacs Blooming
https://www.jefftk.com/p/lilacs-blooming
lwfeedprediction05 Oct 2011 08:00:00 EST<p><span>
Around 1848, Adolph Quetlet decided to try to predict when lilacs
would bloom. [1] He found that the lilacs waited to bloom until it had
been a while since the last frost. Specifically, when:
</span>
<p>
</p>
<pre>
sum(t^2 for t in mean_daily_temperatures(
days since last frost)) > 4264C^2
</pre>
<p>
This is actually pretty strange: how does it make sense to be
squaring temperatures? [2] We are talking about predicting frost,
however, and it happens that 0C is set to the temperature of frost.
So we could rewrite this as:
</p>
<p>
</p>
<pre>
sqrt(sum((t-t_freezing)^2
for t in mean_daily_temperatures(
days since last frost))) > 65.3C
</pre>
<p>
This now makes sense, at least in terms of units. It also looks a lot
like the common mean squared error loss function (though without 'sum'
instead of 'mean'), and there might be an interpretation in which we
see the lilacs as trying predicting "it's going to freeze" and then
deciding "it's not going to freeze" when their total error since their
last successful prediction crosses a threshold. This might be the
same thing as the <a href="http://en.wikipedia.org/wiki/Residual_sum_of_squares">Sum of
Squared Errors of Prediction</a>?
</p>
<p>
[1] This comes from <a href="http://www.nytimes.com/books/98/04/19/reviews/980419.19graylt.html">Seeing
Like A State</a> p313 (James Scott, 1998), which sources it to Ian
Hacking's, "The Taming of Chance", p 62 (1990). Scott claims that
"the calculations must begin with an unpredictable event: the 'last
frost'. Since the date of the last frost can be known only in
retrospect, Quetelet's formula fails as a useful guide to action."
This is probably a misinterpretation of either Quetelet or Hacking:
instead of "the last frost of the season" the formula probably takes
as an input "the most recent frost".
</p>
<p>
Scott actually has the right hand side of the inequality as "(4264C)
squared", and Hacking has it as "(4264C)^2", but this is clearly
impossible as it would require lilacs to wait through 40 years of 35C
days before blooming. While 4264 square centigrades just requires 10
days of 20C. This errant squared might be supposed to refer only to
the C, however, in which case it would make sense.
</p>
<p>
Looking now, this is also in the 1918 <a href="http://books.google.com/books?id=7hO2AAAAIAAJ&lpg=PA389&ots=ie0sYS1Bvv&dq=quetelet%20lilac&pg=PA389#v=onepage&q=quetelet%20lilac&f=false">Publications
of the American Statistical Association, Volume 15, p389</a> and the
1850 <a href="http://books.google.com/books?id=zCYbAAAAYAAJ&lpg=PA39&ots=pZek557NlO&dq=quetelet%20lilac&pg=PA39#v=onepage&q=quetelet%20lilac&f=false">Edinburgh
review, Volume 92, p39</a>. Here they have as the right side of the
inequality just "4264 centigrade", not noting that the units are
square at all.
</p>
<p>
[2] The particular choice of units matters; squaring acts very
differently on temperatures measured in Kelvin or Farenheight because
of their different zero points. Adding squared temperatures
especially doesn't make sense. Imagine I chose a temperature scale
(J) with 0J = 100C. Then if you add squared temperatures your total
is lower for a series of 20C (-80J) days than 10C (-90J) days which
would not work with this formula. (<b>edit</b>: was 80J and 90J)
</p>
<p><i>Comment via: <a href="https://plus.google.com/103013777355236494008/posts/QawnZYPF7BT">google plus</a>, <a href="https://www.facebook.com/jefftk/posts/286002998078798">facebook</a></i></p>