This thread is more about an interesting (and somewhat mathematical) discussion than an actual question, so bare with me.
I got a question on social media earlier today about whether Tornado Alley is moving east. I have heard it does and as I am currently in the process of learning Machine Learning I thought it would be fun to analyze the SPC data. At first, just for fun, I just checked the overall mean latitude & longitude for ALL tornadoes between 19502017 (excluding zero values as well as Hawaii, Alaska & Puerto Rico). It turned out that Diggins, MO, is the geometrical "Tornado Capital" of the USA (@ 37.169/92.833).
Doing a linear regression on the data (year vs longitude) I found a positive slope 0.0134 * yr + 93.3613 (where year 1950 = 0). Converting longitude to miles, it suggests that the mean longitude is shifting apprx 2 miles per year to the east.
Now, the correlation here is almost nonexistent (0.02! For nonmathematicians: a correlation of 1 is max and 0 is the least  so it is as random as it gets) so there is no validity in these numbers at all. This doesn't surprise me considering the spread of the data in longitude. Although the data is based upon 64.000 tornadoes the data cannot be used for this really given the geography of the States (5 tornadoes in California tilt the data more towards the west than 10 in Kentucky does to the east).
But, how would one determine/calculate whether "tornado alley" is shifting more towards the east? There are some ways of breaking down this question:
 Is Tornado Alley defined as the place where tornadoes appear during "season" (i.e. April to June/July)? I.e. should all tornadoes outside of these months be excluded?
 Would a linear regression of the mean longitude per year be more useful? Or, perhaps a sliding 5 year average?
 Should certain states be excluded completely (even considering months). I.e. is a tornado in New York state or California a sign of Tornado Alley shifting? Not really...
I am using Weka as data mining tool by the way and although I have done my fair share of studies in math and statistics, it was some while ago  and I am just getting started with Machine learning.
Any thoughts on this?
I got a question on social media earlier today about whether Tornado Alley is moving east. I have heard it does and as I am currently in the process of learning Machine Learning I thought it would be fun to analyze the SPC data. At first, just for fun, I just checked the overall mean latitude & longitude for ALL tornadoes between 19502017 (excluding zero values as well as Hawaii, Alaska & Puerto Rico). It turned out that Diggins, MO, is the geometrical "Tornado Capital" of the USA (@ 37.169/92.833).
Doing a linear regression on the data (year vs longitude) I found a positive slope 0.0134 * yr + 93.3613 (where year 1950 = 0). Converting longitude to miles, it suggests that the mean longitude is shifting apprx 2 miles per year to the east.
Now, the correlation here is almost nonexistent (0.02! For nonmathematicians: a correlation of 1 is max and 0 is the least  so it is as random as it gets) so there is no validity in these numbers at all. This doesn't surprise me considering the spread of the data in longitude. Although the data is based upon 64.000 tornadoes the data cannot be used for this really given the geography of the States (5 tornadoes in California tilt the data more towards the west than 10 in Kentucky does to the east).
But, how would one determine/calculate whether "tornado alley" is shifting more towards the east? There are some ways of breaking down this question:
 Is Tornado Alley defined as the place where tornadoes appear during "season" (i.e. April to June/July)? I.e. should all tornadoes outside of these months be excluded?
 Would a linear regression of the mean longitude per year be more useful? Or, perhaps a sliding 5 year average?
 Should certain states be excluded completely (even considering months). I.e. is a tornado in New York state or California a sign of Tornado Alley shifting? Not really...
I am using Weka as data mining tool by the way and although I have done my fair share of studies in math and statistics, it was some while ago  and I am just getting started with Machine learning.
Any thoughts on this?
Attachments

61.5 KB Views: 0