{"id":304,"date":"2012-12-04T19:09:35","date_gmt":"2012-12-04T19:09:35","guid":{"rendered":"http:\/\/blog.fellstat.com\/?p=304"},"modified":"2012-12-04T19:09:35","modified_gmt":"2012-12-04T19:09:35","slug":"climate-misspecified","status":"publish","type":"post","link":"https:\/\/blog.fellstat.com\/?p=304","title":{"rendered":"Climate: Misspecified"},"content":{"rendered":"<p>I&#8217;m usually quite a big fan of the content syndicated on <a href=\"http:\/\/r-bloggers.com\">R-Bloggers<\/a> (as this post is), but I came across a <a href=\"http:\/\/www.statisticsblog.com\/2012\/12\/the-surprisingly-weak-case-for-global-warming\/\">post<\/a>\u00a0yesterday\u00a0that was as statistically misguided as it was provocative. In this post, entitled &#8220;The Surprisingly Weak Case for Global Warming,&#8221; the author (Matt Asher) claims that the trend toward hotter average global\u00a0temperatures\u00a0over the last 130 years is not\u00a0distinguishable\u00a0from statistical noise. He goes on to conclude that &#8220;there is no reason to doubt our default explaination of GW2 (Global Warming) &#8211; that it is the result of random, undirected changes over time.&#8221;<\/p>\n<p>These are very provocative claims which are at odds with the <a href=\"http:\/\/www.treehugger.com\/climate-change\/pie-chart-13950-peer-reviewed-scientific-articles-earths-climate-finds-24-rejecting-global-warming.html\">vast majority<\/a> of the extensive literature on the subject. So this extraordinary claim should have a pretty compelling analysis behind it, right?&#8230;<\/p>\n<p>Unfortunately that is not the case. All of the author&#8217;s conclusions are perfectly consistant with applying an unreasonable model, inappropriate to the data. This in turn leads him to rediscover regression to the mean. Note that I am not a climatologist (neither is he), so I have little relevant to say about global warming per se, rather this post will focus on how statistical methodologies should pay careful attention to whether the data generation process assumed is a reasonable one, and how model misspecification can lead to professional\u00a0embarrassment.<\/p>\n<h1 style=\"text-align: center;\">His Analysis<\/h1>\n<p>First, let&#8217;s\u00a0review\u00a0his methodology. He looked at the global\u00a0temperature\u00a0data available from NASA. It looks like this:<\/p>\n<figure id=\"attachment_305\" aria-describedby=\"caption-attachment-305\" style=\"width: 480px\" class=\"wp-caption aligncenter\"><a href=\"http:\/\/blog.fellstat.com\/wp-content\/uploads\/2012\/12\/means.png\"><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-305\" title=\"means\" src=\"http:\/\/blog.fellstat.com\/wp-content\/uploads\/2012\/12\/means.png\" alt=\"\" width=\"480\" height=\"480\" \/><\/a><figcaption id=\"caption-attachment-305\" class=\"wp-caption-text\">Average global temperatures (as deviations from the mean) with cubic regression<\/figcaption><\/figure>\n<p>He then assumed that the year to year changes are\u00a0independent, and simulated from that model, which yielded:<\/p>\n<p><a href=\"http:\/\/blog.fellstat.com\/wp-content\/uploads\/2012\/12\/randomWalk.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-medium wp-image-306\" title=\"randomWalk\" src=\"http:\/\/blog.fellstat.com\/wp-content\/uploads\/2012\/12\/randomWalk-300x300.png\" alt=\"\" width=\"300\" height=\"300\" \/><\/a>Here the blue lines are temperature difference records simulated from his model, and the red is the actual record. From this he concludes that the climate record is rather typical, and consistant with random noise.<\/p>\n<p>A bit of a fly in the ointment though is that he found that his\u00a0independence\u00a0assumption does not hold. In fact he finds a negative correlation between one years temperature\u00a0anomaly\u00a0and the next:<\/p>\n<p><a href=\"http:\/\/blog.fellstat.com\/wp-content\/uploads\/2012\/12\/meanReg.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-medium wp-image-307\" title=\"meanReg\" src=\"http:\/\/blog.fellstat.com\/wp-content\/uploads\/2012\/12\/meanReg-300x300.png\" alt=\"\" width=\"300\" height=\"300\" \/><\/a>Any statistician worth his salt (and indeed several of the commenters noted) that this looks quite similar to what you would see if there were an unaccounted for trend leading to a <a href=\"http:\/\/en.wikipedia.org\/wiki\/Regression_toward_the_mean\">regression to the mean<\/a>.<\/p>\n<h1 style=\"text-align: center;\">Bad Model -&gt; Bad Result<\/h1>\n<p>The problem with using an autoregressive model here is that it is not just last year&#8217;s\u00a0temperatures\u00a0which determine this year&#8217;s temperatures. Rather, it would seem to me as a non-expert, that temperatures from one year are not the driving force for temperatures for the next year (as an autoregressive model assumes). Rather there are underlying planetary constants (albedo and such) that give a baseline for what the temperature should be, and there is some random variation which cause some years to be a bit hotter, and some cooler.<\/p>\n<p>Remember that first plot, the one with the cubic regression line. Let&#8217;s assume that data generation process is from that regression line, with the same variance of residuals. We can then simulate from the model to create an\u00a0fictitious temperature\u00a0record. The advantage of doing this is that we know the process that generated this data, and know that there exists a strong underlying trend over time.<\/p>\n<figure id=\"attachment_308\" aria-describedby=\"caption-attachment-308\" style=\"width: 300px\" class=\"wp-caption aligncenter\"><a href=\"http:\/\/blog.fellstat.com\/wp-content\/uploads\/2012\/12\/simulated.png\"><img loading=\"lazy\" decoding=\"async\" class=\"size-medium wp-image-308\" title=\"simulated\" src=\"http:\/\/blog.fellstat.com\/wp-content\/uploads\/2012\/12\/simulated-300x300.png\" alt=\"\" width=\"300\" height=\"300\" \/><\/a><figcaption id=\"caption-attachment-308\" class=\"wp-caption-text\">Simulated data from a linear regression model with cubic terms<\/figcaption><\/figure>\n<p>If we fit a cubic regression model to the data, which is the correct model for our simulated data generation process, it shows a highly significant trend.<\/p>\n<pre>              Sum Sq  Df F value    Pr(&gt;F)\npoly(year, 3)  91700   3  303.35 &lt; 2.2e-16 ***\nResiduals      12797 127<\/pre>\n<p>We know that this p-value (essentially 0) is correct because the model is the same as the one generating the data, but if we apply Mr. Asher&#8217;s model to the data we get something very different.<\/p>\n<figure id=\"attachment_309\" aria-describedby=\"caption-attachment-309\" style=\"width: 300px\" class=\"wp-caption aligncenter\"><a href=\"http:\/\/blog.fellstat.com\/wp-content\/uploads\/2012\/12\/randomWalkSim.png\"><img loading=\"lazy\" decoding=\"async\" class=\"size-medium wp-image-309\" title=\"randomWalkSim\" src=\"http:\/\/blog.fellstat.com\/wp-content\/uploads\/2012\/12\/randomWalkSim-300x300.png\" alt=\"Auto regressive model fit to simulated data\" width=\"300\" height=\"300\" \/><\/a><figcaption id=\"caption-attachment-309\" class=\"wp-caption-text\">Auto regressive model fit to simulated data<\/figcaption><\/figure>\n<p>His model finds a non-significant p-value of .49. We can also see the regression to the mean in his model with this simulated data.<\/p>\n<figure id=\"attachment_310\" aria-describedby=\"caption-attachment-310\" style=\"width: 300px\" class=\"wp-caption aligncenter\"><a href=\"http:\/\/blog.fellstat.com\/wp-content\/uploads\/2012\/12\/meanRegSim.png\"><img loading=\"lazy\" decoding=\"async\" class=\"size-medium wp-image-310\" title=\"meanRegSim\" src=\"http:\/\/blog.fellstat.com\/wp-content\/uploads\/2012\/12\/meanRegSim-300x300.png\" alt=\"\" width=\"300\" height=\"300\" \/><\/a><figcaption id=\"caption-attachment-310\" class=\"wp-caption-text\">Regression to the mean in simulated data<\/figcaption><\/figure>\n<p>So,\u00a0despite\u00a0the fact that, after you adjust for the trend line, our simulated data is generating\u00a0independent\u00a0draws from a normal distribution, we see a negative auto-correlation in Mr. Asher&#8217;s model due to model misspecification.<\/p>\n<h1 style=\"text-align: center;\">Final Thoughts<\/h1>\n<p>What we have shown is that the model proposed by Mr. Asher to &#8220;disprove&#8221; the theory of global warming is likely misspecified. It fails to to detect the highly significant trend that was present in our simulated data. Furthermore, if he is to call himself a statistician, he should have known exactly what was going on because regression to the mean is a\u00a0fundamental\u00a0100 year old concept.<\/p>\n<p>&nbsp;<\/p>\n<p style=\"text-align: center;\">&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;-<\/p>\n<p>The data\/code to reproduce this analysis are available <a href=\"http:\/\/www.fellstat.com\/files\/climateSimulation.R\">here<\/a>.<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I&#8217;m usually quite a big fan of the content syndicated on R-Bloggers (as this post is), but I came across a post\u00a0yesterday\u00a0that was as statistically misguided as it was provocative. In this post, entitled &#8220;The Surprisingly Weak Case for Global Warming,&#8221; the author (Matt Asher) claims that the trend toward hotter average global\u00a0temperatures\u00a0over the last [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[6],"tags":[],"class_list":["post-304","post","type-post","status-publish","format-standard","hentry","category-r"],"_links":{"self":[{"href":"https:\/\/blog.fellstat.com\/index.php?rest_route=\/wp\/v2\/posts\/304","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.fellstat.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.fellstat.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.fellstat.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.fellstat.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=304"}],"version-history":[{"count":0,"href":"https:\/\/blog.fellstat.com\/index.php?rest_route=\/wp\/v2\/posts\/304\/revisions"}],"wp:attachment":[{"href":"https:\/\/blog.fellstat.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=304"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.fellstat.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=304"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.fellstat.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=304"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}