• A friendly and periodic reminder of the rules we use for fostering high SNR and quality conversation and interaction at Stormtrack: Forum rules

    P.S. - Nothing specific happened to prompt this message! No one is in trouble, there are no flame wars in effect, nor any inappropriate conversation ongoing. This is being posted sitewide as a casual refresher.

Forecasting Accuracy 2024 - Good or Bad?

Lots of watches and warnings the past few days with not much in the way of reports. I guess the main view is it's better to be safe than sorry when it comes to alerting the public. Any thoughts on why storms didn't produce? The components needed for severe storms were there, but I thought the risk levels seemed a bit ambitious if nothing else.
 
If we still have any readers from the SPC, I'm curious if there is a policy against downgrading outlooks after 1300z. There have been several examples of events where the environment very evidently seemed trashed by mid-morning (due to clouds/precip, an MCS scouring the moisture and shunting it south, or something else), yet the outlook remained at its 1300 maximum. The Elmer day in 2015 was like that, the overnight MCS scoured moisture down to the Red River, yet the higher tornado risk level in Kansas and Nebraska from earlier outlooks was maintained. I figured there was some social science component to this - maybe the impact to public trust in the forecasts is worse when a downgrade is made versus when the forecast busts? Again, I'm not bashing the SPC as they do an excellent job, and busts are just part of dealing with the weather using our current science and tools.
 
I was told once, years ago, that they don't like downgrading outlooks because NWS and many other Federal and State agencies adjust the number of people they staff based on outlooks. I do not know if this is true.
 
I think you're on the right track Warren. I remember reading an older thread about SPC risk classification where Rich Thompson mentioned something to that effect. He stated that it had an impact to the prior planning of outside organizations such as EMs. He also touched on the gamble of lowering the risk classification and then having storms over perform, basically saying it would have to be a sure thing on the part of the forecaster.

Again, I'd imagine it's generally better to bust and have nothing happen than the the other way around. Since I've been tracking the weather, the SPC has done an amazing job... and I was surprised to see an otherwise adequate environment to my novice eye, fail to produce. Dew points seemed sufficient in the last few cases, but maybe the moisture wasn't deep enough. Shear was plentiful as well, but instability was marginal, as is to be expected with early season setups. If forecasting was easy, it wouldn't be as rewarding to nail a prediction.
 
He also touched on the gamble of lowering the risk classification and then having storms over perform, basically saying it would have to be a sure thing on the part of the forecaster.
To me, this makes the most sense--let's say the forecast downtrends from warranting a moderate-risk to an enhanced risk. How are you supposed to adequately convey to the public that they should still be alert for strong tornadoes, but that the expectation of an outbreak has passed? It's a catch-22 that could only promote lethargy on the part of the public when there may still be a relatively volatile environment in place. Also, imo, there is enough inherent uncertainty of any given forecast even the day / afternoon-of (e.g the All-High probs Tornado watch from May 20, 2019), that the marginal benefit of precision gained by performing such a downgrade is too small.

On the other hand, I too am perplexed by those cases that Dan mentioned, where e.g. the warmfront simply didn't reach as far north as expected, and yet they maintain a sig-risk area for tornadoes which is clearly north of the warm-front. This has always confused me.
 
I see the mission of SPC as producing the most accurate forecasts possible, not directly (by, say, not downgrading) affecting personnel decisions.
 
I'm not sure what entities this is referencing, but from the media side, I'd argue this is still a consequence of this not being an exact science. I have to wonder if high-resolution model simulation, including CAMs, plays a role here. Nonetheless, despite all of the tools that have at our disposal, and some of the best minds to which those tools are applied, forecasting the behavior of a fluid dynamic is going to sometimes be erroneous. I tend to focus on the bigger picture here. Our lead times for watches and warnings are strides above what it was decades ago, so it's not like we're dealing with pyrrhic victories when these setups evolve as forecast. On the microscale, I haven't noticed anything particularly grossly inaccurate. In terms of downgrading higher-end outlooks, particularly with SPC forecasts, there are social science nuances involved that have already been discussed. That's difficult because we oftentimes don't know which mesoscale factors will enhance or reduce severe risk in a given area when 12+ hour lead times need to be given. I think most agencies do a decent job of conveying wording that expresses potential caveats or uncertainties with high-end events. We don't always have a way to accurately forecast when and where, for example, kinematics may be there for a substantial severe weather risk but thermodynamics don't align at the eleventh hour. I still believe it's better to be proactive and help the general public understand that, while there are caveats, the atmospheric ingredients exist to put their location in a potentially substantial risk of severe weather. We just don't have the science to affirmatively tell Whoville that a QLCS tornado will arrive at 2:30 PM on April 1st.
 
I'd like to retract my previous statement about not having a lot of reports. Just went back and checked, and at least for the 9th, there are quite a few reports from what I'd imagine, are based on surveys. I think the easiest answer to my question of why storms didn't appear to be as severe as forecasted, was as Dan alluded to, overturning from prior convection and messy storm modes. I'd like to think that no one is complaining or unhappy with the outlooks. At least for me, I was more interested in getting opinions from those more knowledgeable than myself as to why the the last few outlooks didn't appear to pan out initially. I should've known better, and demonstrated a little patience in waiting for the final tally to come in. I'd also imagine storms down in the jungles would have less eyes on them to drive real time reports. Live and learn, that's why I'm here, lol.
 
Here is a take based on some less-than-stellar forecasts for this severe season so far. Many forecasters, including ones who are far better than I are sometimes not paying enough attention to one very basic thing: Will there be any storms at all in the area were parameters look otherwise to be great for supercells/tornadoes? Conversely, in the same vein will there be *too many* storms? For an example of the former scenario, we need look no further than yesterday. In the area from SW Kansas to West TX the parameters like the shear/CAPE combo on forecast soundings looked to be pretty optimal for supercells/tornadoes. The rub, however, was there were only a handful of storms, and there were few severe reports and no tornadoes.

My concern all along for yesterday was that for several days, including leading right up to the event, the models, including the now pretty numerous CAMs, were very consistent in showing a rather large zone where no or little convection except for a zone in NW Texas. The attached frames are the 00Z Mon HREF ensemble forecasts for updraft helicity at 00Z Tue. Note the lack of any sort of signal in the KS/OK area. This is one snapshot of one set of models, but this trend was very consistent not just from various models, but it was also consistent over quite a few days before the event,

My general point is that forecasters need to look not just at how good the environment looks from a parameter standpoint, but they also need to examine several runs of various models to see if storms will actually occur, or in the latter scenario above, see if the storms might be so numerous that chasing will be not productive for seeing a photographable tornado or good storm structure.
 

Attachments

  • Screenshot 2024-04-16 at 9.05.52 AM.png
    Screenshot 2024-04-16 at 9.05.52 AM.png
    557.6 KB · Views: 12
Maintaining forecast continuity is definitely a consistent practice. Whipsawing the forecast back and forth just causes confusion and undermines credibility probably more so than a busted forecast. It’s an example of how precision can actually sometimes be the enemy of a forecast’s usefulness, relevance and accuracy - or, as Noah put it:

the marginal benefit of precision gained by performing such a downgrade is too small.

I think this comment by Noah was also insightful:
let's say the forecast downtrends from warranting a moderate-risk to an enhanced risk. How are you supposed to adequately convey to the public that they should still be alert for strong tornadoes, but that the expectation of an outbreak has passed?


I have seen explicit reference in NWS AFDs and in NHC discussions to maintaining continuity. For example, with winter storms here in Philadelphia it’s not uncommon to see reference to recent model runs backing off on snowfall totals, but the forecaster documents his/her explicit decision to defer changes to the forecast unless/until the model trend continues.

However, I have never seen SPC explicitly refer to maintaining continuity. I never see from SPC anything to the effect of, for example, “initiation is now in doubt, but we are maintaining the Enhanced Risk for continuity.” It almost seems like the discussion seeks to continue justifying the previous outlook, even while hedging with the wording. Yesterday 4/15/24 seems to be an example of maintaining continuity. There was a 30% risk, from at least Day 6. After a couple days of enhanced wording, the language softened, yet the risk area and level remained unchanged. It’s important to remember that convective outlooks are not really designed for the general public, and I assume the local NWS offices that rely on them understand these nuances. But I’m not sure about emergency managers or the media. Even during the day yesterday, there was an MSD for Nebraska that gave a 95% probability of watch issuance. No watch was issued, at least not within the ensuing four hours, but it’s not as if there was a subsequent MSD to that effect. That’s an example that might cause potential confusion for the audience of the product - what are they to make of the threat level as they wait for a watch that never comes?

As an aside, I find the concept of forecast continuity, the trade-off between precision and relevance/helpfulness of a forecast, and communicating uncertainty really interesting and I have applied it to my own profession as a CFO in communicating financial forecasts. Here’s a blog post I wrote for a professional journal if interested:

 
I think a lot of this has to do with the relative infancy we're still in regarding the science of convective severe storms meteorology. There is obviously a lot we still don't know, and that is repeatedly evident time and time again. Add to that the coarseness of our surface and upper-air observations, and it's going to be an uphill battle. I wonder if we all are expecting too much of the enterprise? Could we have been spoiled by stretches of nailed forecasts for more classic setups? It does make me feel better about many of my busts when the best-of-the-best struggle this frequently.
 
Here is a take based on some less-than-stellar forecasts for this severe season so far. Many forecasters, including ones who are far better than I are sometimes not paying enough attention to one very basic thing: Will there be any storms at all in the area were parameters look otherwise to be great for supercells/tornadoes? Conversely, in the same vein will there be *too many* storms? For an example of the former scenario, we need look no further than yesterday. In the area from SW Kansas to West TX the parameters like the shear/CAPE combo on forecast soundings looked to be pretty optimal for supercells/tornadoes. The rub, however, was there were only a handful of storms, and there were few severe reports and no tornadoes.

My concern all along for yesterday was that for several days, including leading right up to the event, the models, including the now pretty numerous CAMs, were very consistent in showing a rather large zone where no or little convection except for a zone in NW Texas. The attached frames are the 00Z Mon HREF ensemble forecasts for updraft helicity at 00Z Tue. Note the lack of any sort of signal in the KS/OK area. This is one snapshot of one set of models, but this trend was very consistent not just from various models, but it was also consistent over quite a few days before the event,

My general point is that forecasters need to look not just at how good the environment looks from a parameter standpoint, but they also need to examine several runs of various models to see if storms will actually occur, or in the latter scenario above, see if the storms might be so numerous that chasing will be not productive for seeing a photographable tornado or good storm structure.
I agree with much of what you said in principle. I'm just not sure how applicable some of it is to the 2024-04-15 event.

I'm seeing a remarkable amount of Monday morning quarterbacking since about 6p yesterday, mainly on social media. To be clear, I respect everyone's point of view here on ST (including yours, Matt) because we're having reasoned discussion, unlike most of social media.

But my perspective here is that yesterday's lack of robust supercells across KS, the Panhandles, and W OK was hardly obvious beforehand. Many global models showed CI along the dryline. The ECMWF was fairly consistent in this, with others like the GFS and ICON also showing it on many runs. Sure, the overall tendency was for the event to begin around 00z, which is not great for chasing. But, to put it plainly, the world renowned ECMWF did very poorly with this event -- despite having a consistent depiction that would've instilled confidence, without the benefit of hindsight.

Combine the non-CAM model guidance with the fact that there were decent height falls projected by 00z, and concerning morning cloud cleared out rather nicely by early afternoon for KS and the N Panhandles... and I don't think the SPC outlook, or people's choice to chase, was unreasonable. In some ways, you could look at this event like an alternate permutation of 2019-05-17 where the cap verified just a bit stronger than forecasts, rather than the opposite.

Dryline setups like yesterday are inherently volatile in outcome, with everything riding on one tipping point. Close doesn't count in the outcome, but it should count in the evaluation of forecasts, IMO.

I may be biased and self-justifying since I was sitting in a gravel lot with my head hung at 8p yesterday, but that's my 2c.
 
I think a lot of this has to do with the relative infancy we're still in regarding the science of convective severe storms meteorology. There is obviously a lot we still don't know, and that is repeatedly evident time and time again. Add to that the coarseness of our surface and upper-air observations, and it's going to be an uphill battle. I wonder if we all are expecting too much of the enterprise? Could we have been spoiled by stretches of nailed forecasts for more classic setups? It does make me feel better about many of my busts when the best-of-the-best struggle this frequently.

Specifically with regard to chasing and only chasing, I actually hope forecasts, the related tools, and the observation network don’t get much better! The uncertainty is part of the appeal. I’m not a poker player, but I imagine it’s a similar inclination and personality type that enjoys weighing the blend of knowns, unknowns, probabilities, etc. and decision making in an environment of uncertainties. I don’t think chasing would be as satisfying if it was 100% successful all the time, instead of a more baseball-like .300 average. It’s the challenge that makes chasing fulfilling. Thinking of the recent eclipse, suppose the exact day, minute and path of an EF5 tornado were known in advance, and you could just show up to watch it (along with hundreds of others who had to do little more than look up the path and schedule online) - would you even want to chase anymore???
 
Thinking of the recent eclipse, suppose the exact day, minute and path of an EF5 tornado were known in advance, and you could just show up to watch it (along with hundreds of others who had to do little more than look up the path and schedule online) - would you even want to chase anymore???
For sure I would still chase if the batting average went up to 1.000. I would chase more. Busts are a serious drain on resources, which I did not appreciate until the economy began to go south in 2021-2022. In 2021 the idea of a two-day panhandle chase with an overnight stay was, while non-trivial, definitely doable. Not so much in 2022...and afterwards. Lots more same-day "there and back again" excursions. That's why I spend so much time trying to figure out on which days to chase, and when things go wrong, why they went wrong. To optimize the consumption of resources.
 
For sure I would still chase if the batting average went up to 1.000. I would chase more. Busts are a serious drain on resources, which I did not appreciate until the economy began to go south in 2021-2022. In 2021 the idea of a two-day panhandle chase with an overnight stay was, while non-trivial, definitely doable. Not so much in 2022...and afterwards. Lots more same-day "there and back again" excursions. That's why I spend so much time trying to figure out on which days to chase, and when things go wrong, why they went wrong. To optimize the consumption of resources.

Interesting. We probably have very different views on that. For sure I sometimes find myself frustrated and wondering if chasing is worth the time and money. But ultimately I have come to appreciate that the failures and “.300 average” are what make the successes so very satisfying. And I just enjoy the process. I suspect you do, also, more than you are letting on. To go as deep as you go in trying to learn, you’d have to find intrinsic satisfaction in it. Bottom line is you are still finding obvious enjoyment in the challenge of improving, and if it were made easier (guaranteeing the 1.000 average) all that would go away. Interesting topic in its own right, probably better for a DM conversation as I didn’t mean to take us OT from the original 2024 forecast accuracy thread!
 
I'm not sure I am qualified to comment on forecast accuracy so far this year, but it seems to me that the forecasts have not been all that bad.

However, I do temper any perception of accuracy with my understanding that the SPC forecasts "severe weather events" and I am more interested in the subset of "severe weather events" that are "chasable storms".

For instance, here is how the 1630Z Day One outlook "verified":

1713374586463.png
I mean--if you grade based on "point-in-polygon" and weight by the categorical outlook, this forecast probably gets a very high score. Hard for me to argue this was bad. Chasable storms, though? I am waiting to see the reports come in.
 
Last edited:
I might respectfully disagree as to the accuracy of the outlooks for the 15th.

The hatched hail forecast (right) was awful. There was only 1 report of ≥ 2" hail in spite of thousands of square miles of hatched areas. Few large hail events occurred in the red (30%) area.

As to tornadoes, there was one EF-2 "strong" tornado but that was at the edge of the 5% area (Greenwood County, KS). No strong tornadoes in the hatched area.

I'm not saying I did any better w/r/t tornadoes (except that I had fewer square miles of false alarm, see: [4:30pm Update] Central U.S. Tornado and Severe Thunderstorm Forecast .

Given the number and intensity of tornadoes that occurred yesterday morning (most after 7am, so they were after the period of the forecast shown), I think we dodged a bullet with the upper low being 6-8 hours later than forecast. Had that 125 kt jet steam been able to work on the 3000+ j of CAPE, things might have been really dangerous.
 

Attachments

  • Screenshot 2024-04-17 at 1.32.49 PM.png
    Screenshot 2024-04-17 at 1.32.49 PM.png
    406.5 KB · Views: 5
  • Screenshot 2024-04-17 at 1.33.01 PM.png
    Screenshot 2024-04-17 at 1.33.01 PM.png
    620.5 KB · Views: 5
Specifically with regard to chasing and only chasing, I actually hope forecasts, the related tools, and the observation network don’t get much better! The uncertainty is part of the appeal. I’m not a poker player, but I imagine it’s a similar inclination and personality type that enjoys weighing the blend of knowns, unknowns, probabilities, etc. and decision making in an environment of uncertainties. I don’t think chasing would be as satisfying if it was 100% successful all the time, instead of a more baseball-like .300 average. It’s the challenge that makes chasing fulfilling. Thinking of the recent eclipse, suppose the exact day, minute and path of an EF5 tornado were known in advance, and you could just show up to watch it (along with hundreds of others who had to do little more than look up the path and schedule online) - would you even want to chase anymore???
Definitely agree with this. While it is certainly frustrating driving for literally days on end spending precious time and money only to come up empty handed, it makes the successes that much sweeter.

To me, the very act/process of chasing is fun and enough to get me out there. There is just something inherently exciting about forecasting and planning with so many uncertainties. I love roadtrips, Ghost Towns, camping, photography, history, exploration/spontaneity, and severe weather. All of which are typically available in great abundance on any given chase. Not knowing where you will be or what you will see is intoxicating to me, because its one of the few things you can actively pursue with true adventure everytime. I love that cowboy feeling every chase day of making decisions on the fly which can make or break your whole day. Constantly learning from each chase to get better every time. To me, all of those factors are enough that I still have fun each time I go out, and enjoy the overall experience. Yes, catching "the big one" for that day is the ultimate goal, but it doesn't mean the trip sucked if you didn't capture it. At that point it was ultimately a roadtrip, and those are always a great time in my book.

My brother/chase partner takes bust days alot harder than I do. He comes out so desperately wanting and hoping to catch a photogenic tornado, and gets really bummed when it doesn't happen. I like to remind him that it's called stormchasing and not tornado catching, but he still gets pretty bummed out.

We can all sit here and bash the SPC (and probably all have at some point), but ultimately they are working with the same data we are, and can't really blame them when it doesn't exactly verify. They are an easy scapegoat to us, just like the local weather girl is to the general public, but they are just that, scapegoats. Without more detailed, frequent, and granular observation data for the models and for us in the field, we all can only work within these broad uncertainties and have to accept the realities of that.

Gotta do it for the love of the game, not the love of victory.
 
I contemplated this post as an Event or Pseudo-Event for 4/23/2024. As much as we have recently seen events with huge categorical risk polygons thinly populated with storm reports, I'd like to see some analysis of an event where it seems like the SPC did a surprisingly good job: what did they see in the conditions and model forecasts that prompted the tiny SLGT risk polygon in NW TX on 4/23/2024? Here is the graphical display of performance; no hail or tornado events--just wind:


1713982321154.png


Sometimes I wonder to what extent categorical risk polygons are set by the convex hulls* (or concave hulls) of the CAM model outputs, as in the following image:

Model Convex Hull.jpg
Composite image of all Reflectivity Ensemble Paintball plots for 4/23/2024 19Z to 4/24/2024 06Z from the SPC HREF Ensemble Viewer. The model runs reflected in the ensemble paintballs were initialized on 4/23/2024 00Z.

I'm not suggesting that such a simplistic approach (convex hull of CAMs) was taken here, or anywhere for that matter. Just wanted to point out the "convex hull" observation on the way into the discussion.

What I'm really interested in is: what made such a focused forecast successful in this case?


* The convex hull is the smallest set of points enclosing data of interest, in this case the smallest polygon that encloses the paintball ensembles.
 
Last edited:
There are so many ways I think you could attack this topic of accuracy/verification topic and what I write here is not meant to solve anything. it is definitely an interesting topic always, and I have definitely been part of these discussions before in my own right.
1714138182575.png

- spatial coverage given the size of the polygon? i.e. if a marginal area covers 25,000 sq miles and you get 1 report, is that a bust? or a hit. Does it depend on who you ask?
- does defining the risk trump any accuracy? (how much accuracy should matter in terms of defining risk to the public)
- how do they define ISOLD / SCT/ Numerous / Widespread (coverage area on Sq miles? or resultant number of storm reports inside the polygon)
- are storm reports part of the SPC verification process of the previous models forecast?
1. say you cover the entire plains in Enhanced and you get 6 tornado reports 2 EFO, 3 EF2, 1 EF4 .. anyone located inside the EF4 might be thankful for the Enhanced area. take a step back and people might say, that was a bust for such a large area of coverage that only received 6 confirmed reports.
- how much does the public really rely on SPC outlooks vs. local NWS announcements watches or warnings (I really don't know). I assume that maybe some study was done to ask this kind of question? I would assume services/business/EOC/Aviation/Chasers may rely on SPC more than generic public? total guess but it feels accurate/logical in my head anyway, lol.

- How many severe storm reports on lower end forecasts does it take to bust a lower end risk?

I could honestly keep going and ask more questions, but at the end of the day, I know the SPC is the best at characterizing the risk for the day. Do busts happen? sure, are there reasons meteorologically that delay or cap convection that the models missed, absolutely. As seasoned chasers, especially ones with meteorological backgrounds, it's our job to see through the baseline and investigate the micro to place ourselves in the location with the best potential.

and we ALL bust at one time or another.
 
Last edited:
Back
Top