HRRRv4 upgrade _June_ (now November) 2020 (also RAP to v5)

Jeff Duda

site owner, PhD
Staff member
Site owner
Supporter
Joined
Oct 7, 2008
Messages
3,791
Location
Denver, CO
Quick brief:

Various NWS regional centers and science offices approved the proposed upgrades to the HRRR model at the science level in December 2019. Therefore, the upgrade of HRRR from its current version (v3, since June 2018) to version 4 is on-time and currently scheduled to become operational on 2 June 2020 (yes, towards the end of chase season). The remaining obstacles are a combination of procedural/bureaucratic & computational in nature, and thus are beyond the level of anything scientific or of forecast performance.

Some of the bigger system enhancements for this upgrade include:
-More accurate initial conditions from using a storm-scale ensemble component (HRRRDAS) to initialize each forecast (actually, the ensemble mean will be used); previous versions initialized off of the RAP
-Improved data assimilation through the use of storm-scale (HRRRDAS) background error covariances in the hybrid 3DVar-EnKF formulation, which should make better use of observations and further improve ICs; previous versions used coarser ensemble error covariances
-Increased forecast lengths (forecasts every 6th hourly cycle will go out to 48 hours); the other cycles may go out to 21 or 24 hours, but I can't remember that for sure.

As far as storm chasers are concerned, there are a limited number of changes that you will probably notice:
-Previous versions of HRRR have been too aggressive with mixing out the PBL during the day, thus usually resulting in higher afternoon temps/lower afternoon dewpoints, higher LCLs, and greater propensity to falsely initiate storms. Through the improvement related to handling sub-grid clouds in the PBL scheme, this tendency has been completely eliminated, and in fact has been slightly overcorrected. HRRRv4 will likely exhibit a slight cool and moist bias during the afternoon, and has shown a slight tendency to have delayed CI near the dryline on weakly-forced convective days. For synoptically forced events, it should be pretty damn good on timing.
-A new bug was introduced into the reflectivity diagnostic, which may cause larger-scale precipitating bodies (like squall lines, MCSs, MCCs etc.) to have anomalously high reflectivity in the stratiform regions. A fix has currently been suggested, but I do not know at this time if it has been implemented during testing. It is tentatively scheduled to make it into the final operational version, though.

Please ask any other questions if you have them. I bet a TIN will be released sometime in the early spring (Mar-Apr) detailing this in a complete manner.

As always: current real-time parallel runs of the experimental version (which is currently identical to the proposed operational version that has passed evaluations) can be found on GSD's website here: HRRR Model Fields - Experimental
 
Are there any more updates? I see that it can now be accessed on COD site. Just wondering what to expect, when comparing it to the HRRRv3 we are all used to.
 
Implementation is still on schedule to update HRRRv3 to HRRRv4 operationally on roughly 8 June 2020.

If a forecast graphics site has now begun to produce graphical products for the real-time parallel, as it appears COD is now doing, then for you nothing is going to change. Barring some kind of computational issues during the 30-day test, this is happening.
 
Definitely going to be neat to see and compare HRRRv3 and v4, hadn't noticed COD was already showing v4!

@Jeff Duda I wonder if you could answer a couple dumb questions RE: HRRRDAS:
1) For initialization, do all grid point parameters just take a mean of the 36 members of the ensemble, or do some parameters use something other than averaging?
2) For the rolling strategy of ensemble members, should we expect the 10z and 22z HRRRv4 to be "more different" than other runs (e.g. might the 9z-to-10z delta in behaviors be more pronounced than the 8z-to-9z delta) since that's when half the ensemble members get swapped out?
2b) What led to that particular rolling strategy vs a more granular one, say where you swapped out 3 members every 2 hours? Computational limits? Other?
 
Answering @StephenHenry's questions:

1) I'm not sure what "other processes than averaging" you might be referring to. Not every field is averaged, but many, including some microphysics (i.e., mixing ratio even for some hydrometeor species) are.

2) This is a good question. We have seen some evidence of a step-function-like change in the behavior of HRRRE forecasts initialized depending on how old the HRRRDAS ensemble members are, which implies you could see some kind of delta in consecutive runs around the time of the swapping. This is not ideal behavior, but (2b) the choice to move to a rolling strategy was based on GFS forecasts valid in the early evening (i.e., 00-06Z) having a significant near-surface cold bias leaking into HRRRDAS members (since they used the GFS forecasts as ICs) and we were starting the partial cycling at 03Z, so right in the middle of that cold bias. It was showing up as significant cold biases in previous test versions of HRRRX (pre-v4) that were highly undesirable. So we moved to a rolling restart to reduce that bias.

The rolling restart of cycling may not remain forever, especially if the cold bias in the upstream GFS forecasts is ironed out. We have also experimented with using the RAP instead of the GFS to initialize HRRRDAS members since the RAP restarts at 09Z and 21Z, and the DA helps eliminate the cold bias from the GFS by the time 03Z comes around in the RAP.

We do not yet have a consistent treatment for this issue. Experimentation will continue beyond the release of HRRRv4. Keep in mind that HRRRDAS will become an operational product (although I doubt you will ever see images from it since they are just 1-hour cycles used for DA), but HRRRE will not be operational. I do not know if the moniker "HRRRE" will ever technically see the light of day in an operational sense (it will have a different name in three years when a formal storm-scale ensemble finally becomes an official operational product).
 
I dunno, maybe it would be fun to initialize on the MAX of the ensemble :)

Thanks for these answers Jeff. Always fascinated by how complex every little aspect of this is and I love hearing about all these little hacks to get things working best. Sorry I mean, intelligent workarounds.

And I will take your advice to wait until the 22z HRRRv4 UH streaks come out before committing to a chase target. ;)
 
Update:
The 30-day stability tests run by NCO for the HRRRv4 upgrade continued to experience failures due to bad lateral boundary conditions coming in from the driving model, the GFS. As a result, the test and implementation have been suspended until further notice. I do not know any more than that at this time.
 
Update:
The 30-day stability tests run by NCO for the HRRRv4 upgrade continued to experience failures due to bad lateral boundary conditions coming in from the driving model, the GFS. As a result, the test and implementation have been suspended until further notice. I do not know any more than that at this time.

That is sad to hear. Model testing is over my head but anecdotally I was pretty impressed with its performance. It did appear to have the same issue with initializing ongoing convection correctly, however it seemed to do much better with storm mode and placement than the HRRR. While the overmixing issue was supposedly addressed with the V4 it appeared to me to be about on par with the HRRR if not worse at times. Curious to hear Jeff's thoughts there. It did pretty well on the Burkburnett, TX dustnado huge hail day with initializing storms on the outflow boundary northwest of the HRRR placement. I gambled on the V4 that day and it paid off for the most part. Other issues I noticed was it seemed to be much more aggressive on UH streaks on a consistent basis. I'm not sure if this closer to reality or actually overdone, but it appeared to be directionally more accurate than the HRRR. Again this is all anecdotal and there were some failures, but I thought overall it was a sign of hope in a year of CAM failures, so I'm sad to see its having issues.
 
Anecdotally as well, of course, but I felt largely similarly. The v4 was horribly overmixed, impressively so in a season where moisture was already low and temperatures high. If the v4 suggested 30 degree spreads, you can basically bet that they were 15-20. If the v4 said the dews in eastern Colorado would be 42, you could confidently assume they'd be pushing 50. That's a big difference in forecasting surface based supercells, and that was nearly every single day. It was to the point where it wasn't even worthwhile viewing soundings, point or area averaged, for anything other than a wind profile as the thermals were all messed up, throwing lapse rate, LCL, etc calculations for a loop as well.

I agree that it seemed more correct on storm mode and placement, and I felt like it was on the mark with wind profiles (the original HRRR often veers the profiles at around FH12-18 in my experience, and the v4 did not do that). It was also pretty good with timing. The HRRR would often delay initiation by anywhere from 1-4 hours throughout the course of the day and bend toward the v4's original solution pointing at later initiation (or no sustained initiation in one instance on May 23).

The RAP seemed quite bullish on basically everything, but it was certainly more accurate on a regular basis in the thermal profile department than the v4 was. Probably only would require some small tweaks can get it back to being reliable, but what do I know? I work in software, not data modeling, and have no idea what goes into the verification metrics that I imagine are driving the decision to declare these trials unsuccessful and pull the models back.

I know that its unremarkable record is the elephant in the room in every model discussion, but holy cow the Fv3 is having a rough time so far. I sometimes have dreams about convective allowing models driven by the ECMWF. That said those dreams often include tornadoes in the plains, so I know they're unrealistic.
 
That is sad to hear. Model testing is over my head but anecdotally I was pretty impressed with its performance. It did appear to have the same issue with initializing ongoing convection correctly, however it seemed to do much better with storm mode and placement than the HRRR. While the overmixing issue was supposedly addressed with the V4 it appeared to me to be about on par with the HRRR if not worse at times. Curious to hear Jeff's thoughts there. It did pretty well on the Burkburnett, TX dustnado huge hail day with initializing storms on the outflow boundary northwest of the HRRR placement. I gambled on the V4 that day and it paid off for the most part. Other issues I noticed was it seemed to be much more aggressive on UH streaks on a consistent basis. I'm not sure if this closer to reality or actually overdone, but it appeared to be directionally more accurate than the HRRR. Again this is all anecdotal and there were some failures, but I thought overall it was a sign of hope in a year of CAM failures, so I'm sad to see its having issues.

Thanks for the input. Yeah, we noticed it nailed the Wichita Falls supercell day (the split near Childress followed by a plunge to the southeast).

I am surprised to hear of the issue with overmixing. That hasn't really been discussed much. I will have to bring that up at our meetings.

Keep in mind that there was a processing issue with the UH fields in the HRRRv4 parallel run by NCO that was only fixed about two weeks ago. The issue made UH tracks look very pulsed with really large values of UH within the pulses. That has been fixed. So I don't know if that changes your impression of that aspect of the performance.
 
My anecdotal thoughts from using HRRRv3 vs. HRRRv4 on chase days, limited to the Plains on days with some type of supercell potential, obviously: I think v4 was less aggressive with mixing, and therefore forecast lower LCLs on average, than v3. It would be very surprising if this turned out not to be the case objectively, given that I understand GSL has specifically tweaked their implementation of the MYNN PBL scheme with that goal in mind. However, I agree that HRRRv4 still tended to be drier than most other CAMs, such as NCEP's HiresW runs and NAM Nest. Given the poor moisture in general this season, I'm not ready to conclude that HRRRv4's dryness was a worse bias than some of the other CAMs' moist bias, though. I know that the June 4-9 period many of us chased from KS to ND/MN, in particular, saw ASOS Td's verify woefully dry, compared to even a 24-h consensus forecast from the GFS/NAM/ECMWF. Overall, I preferred v4 to v3 in the aggregate, though there were specific cases at specific times of day where v3 had better forecasts (as one would expect).
 
I looked at this objectively after I saw the first replies this afternoon and can confirm that the HRRRv4-NCO parallel showed an afternoon dry bias (based on METARs) across the eastern half of the US. It wasn't necessarily too warm, though. The dry bias seemed to work in progressively throughout May and has been consistently at its current value for the last several weeks. There is a suspicion that the greenness vegetation fraction used in the LSM has been lagging behind observations this year due to the progressively building drought in the Plains.
 
Back
Top