Loading data...

WeatherMesh Benchmarks

Methodology

We calculate error as latitude-weighted RMSE. RMSE = root mean squared error: a measure of how far off the forecast is from the truth where we take the difference between prediction and truth for each point, square those, take the mean, and take the square root again. This rewards being close to the truth and penalizes differences more the larger they get. We weight the error by the cosine of the latitude of the point, so that points near the equator are weighted more heavily than points near the poles, as is standard in the weather forecasting community.

For our source of "truth", we use ERA5: a dataset widely regarded as the world's best guess at what weather actually occurred around the globe. We also internally validate against observations at weather stations, as well as observations collected by our own balloon constellation; these results are consistent with those from the ERA-5 comparison.

We compared to ECMWF models, both the deterministic model (HRES) and the ensemble (ENS), as they are generally-accepted as the best operational models. We also validate against other operational models, such as GFS and AIFS, which will be added to this page in the future.

Why are metrics only available for these dates? We switched to our latest version of WeatherMesh, WeatherMesh-4, sometime in April 2025 and validation data for this model are available starting April 23, 2025. For recent dates, there are two sources of latency. First, a forecast itself takes time to become reality. For a forecast made today, it will be ten days before we learn how it's ten day out forecast performed. This is why more data points are available for 1-day forecasts than 10-day forecasts. In addition to this, ERA5 takes roughly a week to be released, meaning that we can't calculate stats for a ten day forecast until 17 days after the fact. If you want to analyze it yourself against a lower latency source of truth, contact us!

Download and Verify Our Results

The best proof of forecast accuracy is providing forecasts ahead of time for you to verify against reality when the time comes. To that end, we publish the raw gridded outputs of our model runs as .nc files. To download and read these files, consult our API documentation.

500mb geopotential forecasts at a 24 hour resolution are fully public. Other model outputs are available only to researchers, since as a business we can't publically release everything. If you are a researcher who would like fuller access, please contact us at contact@windbornesystems.com.

Grid: everything is evaluated at a 0.25 degree grid. Note that we do not forecast for latitude -90, but only down to -89.75, so the shape of our data is (720, 1440) rather than (721, 1440). This makes no difference to the RMSE given the latitude weighting (see below). RMSE is evaluated globally.

Latitude adjustment: as is standard (eg section 4.3 of the GraphCast paper), we do a weighted average of the RMSE based on latitude. If we didn't, this would be unfairly biased towards the poles. You can see the weighting we use here.

Want to learn more?

Want more details on our models? Check out our blog for more details on our methodology and results. Interested in using WeatherMesh or our atmospheric data? Contact us!

Benchmarks

Configuration

Date Filters

How to read the scorecards

Loading data...

WeatherMesh Benchmarks

Methodology

Download and Verify Our Results

Want to learn more?