Which quality controls are done when producing indicators?

There is a detailed workflow to calculate the Climate and water Indicators (CI) provided in the two tools of Climate Information; the Site-specific report and the Data Access Platform. Many checks are completed throughout the production to ensure that the indicators are of reliable quality.

The chart below (Fig. 1) describes the different steps in the workflow and highlights (in orange) when quality control procedures happen. Each procedure is adapted to the dataset it is applied to (to account for different variables, ranges and more), and can be repeated throughout the workflow. Essential Climate Variables (ECV) from Global and Regional Climate Models (GCM/RCM) are downloaded from the Earth System Grid Federation (ESGF), the largest archive of climate data world-wide. ESGF has already some standards which the climate community follows in order for the output of the climate models to be available to the scientific community.

Data Production Workflow
Figure 1. Workflow describing the quality checks taking place during the production of climate and water indicators

Here are the different types of quality control procedures that are included in the orange boxes in the figure above:

File format checks/pre- and post-processing – happens at procedure 1-10

The file format checking procedure is used in all steps of data processing. It ensures that the files are in the right format. The following points are checked and corrected if necessary (how it is corrected is described in the brackets):

  • Data gaps/overlapping periods or missing values: If a single time step is missing in climate scenario data, a time step before or after the missing one is copied and added. Sometimes scenarios have overlapping periods between the historical period and the RCP period. RCP data begins in 2006 and data sets are cut to follow this standard.
  • Units appropriate for each indicator; for e.g. the climate model temperature unit is in general Kelvin (ºK), values and are then converted to Celsius (ºC).
  • Data dimensions and domain dimension are correct: Data should cover the full CORDEX domain. If not, the data is excluded from the production. Data must also have a daily time series from 1971 to 2100 to be used for the calculation of the indicators. Some scenarios end in 2098 or 2099, and if so, the last year is copied and added to fill out the time series to 2100.
  • Metadata is complete and correct: Metadata should follow the CF (Climate and Forecast) metadata standards . All files are edited to make metadata homogeneous.

Data pre-processing

Data pre-processing quality procedures are completed in step 1 (1 in the figure). This step is needed to make sure all datasets are in a format that can be treated within the production of the indicators. All files must be on the same format to be comparable. The following points are the main steps in this quality control procedure:

  • Convert calendar to standard, standard time reference
  • Remap data to HydroGFD 0.5 degrees for the bias adjustement
  • Sea masking by data points used in HydroGFD, plus data points needed by World-Wide HYPE for the hydrological assessment.

Range check on Essential Climate Variable (ECV) data

All ECV climate scenario data (time series of both raw and bias adjusted data) are tested against constant minimum and maximum values of a Global Climate Model (GCM) ensemble in step 1,3 and 6 (see figure). It ensures that the data do not have unrealistic values.  Scenarios with data outside the GCM range are flagged for further investigation. If a dataset has too many values outside the expected range, it can be excluded from the workflow based on expert judgement.

Range check vs HydroGFD climatology

HydroGFD (Berg et al., 2018)[1] is a global forcing data set used for evaluation and bias adjustment of climate scenario data. The HydroGFD data set is used to define ranges with min/max diurnal values from a 30 day moving window, using daily data from the full reference period 1981-2010. The procedure is used in step 1, 2, 3, and 6.  This results in grid-point specific annual cycle of min/max value ranges in the GFD data set and constitutes the reference in the range checks. All climate ECV data (non-adjusted and bias adjusted) are compared to the ranges during the reference period. The same ranges are also used to evaluate full climate scenario period, with expected change for respective variable considered. This procedure gives a more detailed check of data, compared to the GCM range used for ECV data. Here, ranges are calculated for each gridbox which take into account spatial differences in data. If a scenario has values outside the range, manual inspection is made to assure the quality. If the scenario has too poor quality, it is excluded from the project.

Range check vs hydro-GFD climate indicators

To evaluate the ranges of the calculated CI form CORDEX, manual comparison of Climate Indicators calculated from the HydroGFD CI’s historical period 1981-2010 to the calculated CORDEX CI’s are performed. Experts check if the patterns are within a reasonable range. This test is made in step 7 and 9.

This test gives information on outliers and possible extreme values in need for further investigation.  This test is also made in step 7, 9 and 10.

Evaluation of bias adjustment

Statistics for climatological periods such as mean, median, minimum, maximum values over the full ensemble are calculated for every grid point. Experts inspect the bias adjustment performance. Statistics on missing values are used to identify scenarios where the bias adjustment did not work properly. Bias adjustment with Distribution Based Scaling (DBS, Yang et al., 2010) method requires post processing for temperature, where possible shifts between maximum and minimum temperature are corrected. This procedure is made in step 2.

Mapping of climate scenarios to HYPE catchments

The global hydrological assessment is performed with WW-HYPE (Arheimer et al., 2020). Manual inspections are made to confirm height correction created from HydroGFD weights.
Reference period mean statistics are calculated for each scenario and plotted and diff plots are created to compare with HydroGFD. The plots confirm correct calculations of weights and height correction for each sub basin in the hydrological model. This procedure is made in step nr 4.

WW-HYPE output files

Qualitative assessment of validity of HYPE output data is performed through diagnostic map plots. The validity of spatial patterns in hydrological variables is assessed through visual inspection of mapped aggregates (averages, sums) of HYPE output variables. Values of HYPE variables at selected spatial points, e.g. large river outlets, are semi-quantitatively assessed through comparison with expected ranges based on external data, e.g. observations or previous HYPE model results, to make sure values are within a reasonable range.  This procedure is made in step5.

References

Arheimer, B., Pimentel, R., Isberg, K., Crochemore, L., Andersson, J. C. M., Hasan, A., and Pineda, L.: Global catchment modelling using World-Wide HYPE (WWH), open data, and stepwise parameter estimation, Hydrol. Earth Syst. Sci., 24, 535–559, https://doi.org/10.5194/hess-24-535-2020, 2020.

Berg, P., Donnelly, C., and Gustafsson, D.: Near-real-time adjusted reanalysis forcing data for hydrology, Hydrol. Earth Syst. Sci., 22, 989–1000, https://doi.org/10.5194/hess-22-989-2018, 2018

Yang, W., Andréasson, J., Graham, P. L., Rosberg, J. and Wetterhall, F.: Distribution-based scaling to improve usability of regional climate model projections for hydrological climate change impacts studies, Hydrology Research, 41 (3-4): 211–229, https://doi.org/10.2166/nh.2010.004, 2010