In order to describe the weather situation at a given place appropriately, various data sources are needed.
The data used by meteoblue originates from various national weather services as well as further sources. Initial conditions are usually determined through the help of measurements and observations, which cover the most important locations and weather variables, but describe only a fraction of the surface of the Earth and weather processes. These measurements are then incorporated into model simulations (data assimilation), to determine the state of the weather over the entire globe. After the model run, the output data may be validated and corrected through measurements and observation data, using different post processing techniques like downscaling, statistics, machine learning and nowcasting.
To achieve the highest accuracy level it is necessary to combine these different data sources in an intelligent way, which is a key value proposition offered by meteoblue. The offer is complemented with additional geographical information like location names, population density, vegetation health and soil characteristics, which makes meteoblue a unique offer for private and professional applications.
The main data sources are:
- Weather simulation data - These are data obtained by numerical weather models or reanalysis:
- Observations - These are records obtained by manual or mechanical sensing and transformed into weather variables:
- Measurements - These are data obtained by instruments that measure weather variables:
- Post-processing - These are technical approaches to combine simulations with measurements and observations:
- Geographical data - These are location attributes which are static (for a certain point of time)
- Air quality - available for history & forecast
- Ocean models - available for history & forecast
You can find further details about each data source by clicking on the link to the respective page.
Characteristics of the weather data from various sources
Generally, meteoblue per default offers the dataset with the highest accuracy (for data available through the meteoblue website, API or history+). Nevertheless, for some professional purposes, the right choice of weather data is crucial for the success of the project. To make this choice successfully, a lot of different aspects have to be taken into account.
For long time-series of more than 10 years, the only consistent data series available that include all basic variables are NEMSGLOBAL and ERA5. Both models have the same spatial (30 km) and temporal (1 h) resolution. ERA5 data have the advantage of being recalculated with local measurements. However, this does not grant a generally higher precision (especially in places where no measuring stations are available) and data are not yet available before 2000. Also, ERA5 data are only updated irregularly and usually with a delay of several months, whereas NEMSGLOBAL produces seamless datasets from 1984 until 7 days ahead, for every point on earth. Model choice depends on use criteria. Generally, it is recommended to observe both models, in order to derive an estimate of the uncertainty out of the differences between the two models.
For precipitation data specifically, the precipitation observation datasets CMORPH and CHIRPS2 are available, also for long time series.
For the last 1 - 5 years, many more weather models are available, with a higher spatial resolution. Generally, as many models as possible should be looked at to get a better understanding of the uncertainty. If local measurements are available, some models might be discarded, depending on their performance when compared to measurements. Unfortunately, there are not many universal rules about the accuracy of the various models - The best model or model combination has to be evaluated on a case-by-case basis. Some general points:
- Higher resolution does NOT mean higher accuracy neccessarily. It mainly means more spatial detail. This is especially true for precipitation accuracy, which might decline at higher resolution (below 5 km)
- The use of more than one model and making comparisons is beneficial
- Extreme events - like storms, thunderstorms, very high winds - are highly unpredictable and will not be accurately seen in any simulation or reanalysis data
- Local microclimatic details - like warmer climate on a south facing slope - cannot be reproduced.
- Spatial and temporal aggregations are more accurate than hourly data at a specific location
An overview of advantages and limitations of the various weather data sources is given in the table below:
Weather data availability
As stated above, the data sources have different temporal and spacial availabilities. For the availability, we can differentiate three different dimensions:
- Temporal availability (available time range and time resolution)
- Spatial availability (resolution and available region)
- Update frequency (automated updates and delay)
Best model availability is offered from NEMSGLOBAL, which is available for all time ranges since 1984 and updated automatically every day. This means it is available since 01.01.1984 until 7 days ahead. ERA5 will become available for the same time range, but today it is only available from 2000 - 2017. Other weather data sources are available differently for some years backwards to seven days in the future. Please note that observation data are raw data from third party providers or official networks. Thus, they can show substantial data gaps.
The figure below gives an overview for the most important data sources: