What do you think is the single weakest link in the current
(or future) modeling scenarios ?
What parts of the data sets are the worst ?
(Tim Marshall once said "Upper Air", --still true ?)
How bad is it ?
For short term (Day 1) forecasting, how much of a problem is
machine epsilon / butterfly effect ?
I imagine that it is probably dwarfed by the data problem.
When we "idealize" a data set, do we fill in the gaps with
smooth transitions ? Or do we do regression and fluctuate
around the "means" with a sort of random "grainy" character ?
Any help would be appreciated.
Any good articles / books for someone who wants to get started
in the subject area of meso/micro-scale WX modeling ?
In my opinion, the spatial resolution of the availability of data will be the biggest factor to hinder significant improvements in small-scale NWP. The highest resolution data we have to put into a NWP model comes from a network like the Oklahoma mesonet, in which stations are probably 20-30 km apart on average. Yet we're trying to run models at kilometer scale resolution. There's at least an order of magnitude difference there. Non-surface-based data is even more sparse, as all we have is the NWS raob network, some number of airplane obs (quality?), and a small number of independent measurements or supplemental sounding sites.
To answer your question about idealizing data sets, both types of assimilation are being researched. Current operational models generally use fixed covariance data assimilation schemes that generally cause assimilated data to be assimilated smoothly and evenly between all data points, but assimilation schemes using flow-dependent covariances exist and can be used. They are more computationally expensive, however, and you aren't likely to see them in widespread use anytime soon.
This is not to say that there is no hope in numerical weather prediction at small scales. Running models on the kilometer scale offers an immense amount of detail and realism that can't be obtained from lower-resolution models. This realism is very helpful in forecasting some general characteristics of the weather rather than specifics. For example, a 2 km model may initiate storms along a dryline at 2115Z, but the actual storms may develop at 2145Z and with distance errors of several tens of kilometers. Compared to the size and length scales of the storms, those errors are terrible! But compared to the applicable scales for a NWS forecaster, those are acceptable since the forecaster was given information on the 1) storm mode, 2) approximate initiation time, 3) approximate number of cells, 4) approximate orientation of lines of cells, and 5) evolution of the cells (either persistent discrete storms or building upscale into an MCS), which is all very helpful despite the lack of true accuracy. We're really nudging up against the asymptotic limit of predictability once we're down to kilometer scale, so it would be unrealistic to expect smaller errors than those described in the example above. We need to improve the data used to feed the model as well as the model physics (in particular at kilometer scales: microphysics, PBL, land surface, and radiation schemes).