I seem to have caused an uproar by not agreeing with Grant Foster on his views about curvefitting data. The last two posts on this blog are duplicates of posts I made on Open Mind, but I reposted here in fear that my comment wouldn’t be approved. I have had several posts deleted which seems disingenuous. In his latest post, entitled “teachable moment”, he says I made a grievous error when looking at the 1928-1960 data as the “best” curve fit was a quadratic, and not a cubic. But if you look at my last post, you’ll see that my answer was specifically aimed at whether a cubic fit in 1960 would predict the remainder of the data, and not a determination of what was “best”. You would see that except that Grant decided not to post that comment in the original thread (it was in answer to a posed question by Bernd Palmer, and is shown in my previous post).
I also posted a comment about how I approach problems as an engineer. That post hasn’t yet seen the light of day and probably won’t. Grant decided that what I said was entirely too inflammatory (things like factors of safety and margins) and decided that posting Allycat’s denunciation of all engineers as being intellectually inferior was much more appropriate. Unfortunately, I didn’t make a copy of that post for reposting here… live and learn, I guess…
With all that said, here is my reply to Grant’s “teachable moment”, unedited from what I posted:
Ok, let me say this as carefully as I can.
1) We both agree that extrapolation is bad – and any result is fraught with danger. In this case, though, if you look across the years, extrapolating a linear fit matched the data much better than any of the polynomial fits. Extrapolation of polynomials fail dramatically.
2) In your piece, you fault Watts by saying that “there is acceleration” in the data and that he’s an idiot for not seeing it. I also do not see the acceleration as I feel the acceleration you are claiming is simply a characteristic of the polynomial fit and NOT a physical property in the data. Looking at other timeframes (1928-1960, 1928-2000) and using your methodology for determining the “best” fit, the polynomial fits are also all “better” than linear. Using your criteria, during each timeframe an argument could be made that the trend was not linear. The problem though is that this acceleration (be it positive in the cubic or negative in the quadratic) is not visible when more time passes and data is added. And all the polynomial fits fail miserably when extrapolating. The linear fit does not fail miserably when extrapolating! Given that, which is more likely? Is it more likely that the linear fit continues to match the data or is it more likely that the polynomials suddenly start matching the data? As they are both predictions, we won’t know the answer until the data is recorded, but if the data was in any other endeavour, the money would be on the trend that has been ongoing for 80 years to continue – and that trend is linear. In one of the comments you provided, you stated that extrapolation to the near future is valid. In no case does extrapolation of the polynomials perform better than extrapolation of the linear, even in the near term! Given that’s the case, why is linear so bad?
3) AIC is not in any way an indication of skill in predictive capability. First, I did not check to see what “the best” was for the 1960 end date case, I only checked to see which of the two (linear or cubic) was better. Cubic was better (hence I did the work right on calculating the AIC – so to say I “didn’t know what i was doing” seems to be argumentative). Cubic failed miserably to predict the future. In your case, you found a slightly better choice in using a quadratic. It also failed miserably to predict the future. The only model that seemed to match the future is linear. In the 1928-2000 case, the “best” was actually a cubic and yet it still failed to predict the future even to 2013 (see my previous blog post).
4) In the original post, you make the claim that using a linear model is foolish:
“What’s far more foolish is using such a model to extrapolate, not just to next year or the next few years, but all the way to the end of the century. Foolish.
Suppose I used the cubic model (demonstrably better than the linear one!) to extrapolate to the end of the century? That model predicts that sea level will rise between now and the end of the century by over 2.6 meters. Yes, that’s meters. Over 2600 mm. Over 100 inches. Over eight and a half feet.
But, honestly, it’s not valid to extrapolate this statistical model to the end of the century. Prediction is hard — especially about the future — and extrapolating simple statistical models far into the future is a very poor way to go about it.”
Your claim that using a linear model is foolish does not hold. You seem to base your whole argument on the assumption that worse AIC values result in worse predictive capabilities. You bash Watts for saying it’s linear because of the AIC not being as good as a cubic fit (your exact words are that “claiming the trend is linear is foolish”). In reading this, you imply (but do not state) that the polynomial models, which all have better AIC values, to have better predictive capabilities, hence the labeling of a linear model as foolish. In fact, in neither the 1928-1960 (where the quadratic is better) nor the 1928-2000 (where the cubic is better) do either of these fare better at predicting what actually happened between the end of the data and 2013. If your method of choosing the most useful model would hold, then in both these cases the polynomials would be a better fit to the actual data!
5) The only thing that the “better” AIC does is allow one to have a more accurate match when INTERPOLATING the data. That’s it. If the difference between a linear fit and a cubic fit (seldom more than 20mm when annual variations are on the order of 100mm) is important, then by all means use a cubic. In my engineering experience, such a small difference wouldn’t be worth the effort.
6) If your point is to not extrapolate the data – which I fully agree with – then why attack the choice of a linear model at all? A cubic model, as you chose, performs worse historically when extrapolating to the future than a linear model. If you want to say there really is a physical acceleration in the data, then you’ll have to provide better evidence because it’s just not there…