# Recalibration/Bias Adjustment of UKCP18 2.2km Climate Data
## UKCP18 2.2km Data - notes
Am just using the 2.2km data Gavin shared, but really this should be automated to fetch the latest from the CEDA archive going forward
https://catalogue.ceda.ac.uk/uuid/d5822183143c4011a2bb304ee7c0baf7#collapseTwo
Depending on the variable, the data is hourly, daily, for the three time-slices (1981-2000, 2021-2040 and 2061-2080)
For each year, there are 12 of 15 runs available
Therefore 12 *60 (60 years of projections) = 720 files (each file contains all of the projects for the indicated year)
## Recalibration/Bias adjustment
Need to clarify 'recalibration' is correct as suggested in last weeks meeting - in the met office docs they do refer to it as 'bias adjustment' so I think that is the correct phrasing?
https://www.metoffice.gov.uk/binaries/content/assets/metofficegovuk/pdf/research/ukcp/ukcp18-guidance---how-to-bias-correct.pdf (Useful pdf, wish I found it earlier!)
Some key points from the met office guideance:
- Lots of contention about the most appropriate method, and even applying it in the first place
- Needs at least 10, usually 30 years of observation data to do the calibration:
- Can use HADS-Uk data - this is what the met office scripts used (*script ref*: regrid_obvs_calling.py in vmfileshare/ClimateData/Scripts/ukcp18.met.office.python.recalibration, *data ref*: https://catalogue.ceda.ac.uk/uuid/4dc8450d889a491ebb20e724debe2dfb ) - as an aside there is a very commonly used Mental Health measure also called HADS and this will get very confusing to me haha
- The physical consistency of the different climate variables may not be maintained if they are bias-corrected independently. (eg temps could be sub 0 but snowfall not increased -- this is super relevant because Emma/LCAT wants all of the variables they are interested in bias corrected, not just temperature
- Methods of bias correction relate to the statistic it is trying to correct - the mean, the variance, the distribution and the long-term trend
- Compares four methods of interest (ie Quantile mapping, Trend-preserving quantile-mapping)
- There is an R package already to implement some of these: https://github.com/SantanderMetGroup/downscaleR/wiki
### Scaled distribution mapping
However, the paper on scaled distribution mapping (SDM) which is the method used in the met office scripts shared (*script ref*: BC_timeslice_run.py)
https://hess.copernicus.org/articles/21/2649/2017/hess-21-2649-2017.pdf suggests this is better than QM at least, which the authors also suggest consensus has as the 'best' method
**Notes on SDM from Switanek et al., 2017**
- The paper lays out methods for adjusting precipitation and temperature, and applied the method differently depending on the variables:
- *'There are a few important differences between the implementation of SDM for precipitation and temperature. First, SDM scales the distribution of precipitation by a multiplicative or relative amount and temperature is scaled by an absolute amount. Second, only values of positive precipitation exceeding a specified threshold (e.g., 0.1 mm) are used to build the distributions, while with temperature all values are used. Third, temperature data is first detrended, then bias corrected, and finally, the trends are added back in. As a result, the variance is not inflated by temporal trends.'*
- I think this therefore has implications if Emma/LCAT really do want other climate variables in wind speed, particularly as in the met office climate guidelines for 2.2km they just state 'Available but not evaluated', this is something to communicate!
- There is a python implementation (albeit with limited docs: https://pycat.readthedocs.io/en/latest/intro.html) - also the method is fairly clear in the pape
### Hads UK data
Daily historic observations for tasmax, tas, rainfall downloaded/downloading to the vmfileshare (I am using wget on the vm for this - hopefully this is ok! For now just going to do tasmax and assess how large the data is
We might need wind too but am leaving that for now
The Hads data is on 1km grids so presumably needs to be aggregated to the same 2.2km grid
I *think* we can just use the observational data from 1981-2000 to match with the historic UKCP predictions so I will delete the rest (although we could also use 2021 observational data potentially)
For some reason for *tas* there doesn't seem to be daily observations - something to check out potentially
HADs data in on the fileshare now, here are the data links (copy and pasted from the readme in the fileshare)
The Hads data is to be used for calibration of the UKCP projections
Have downloaded a few of the variables required only
The file '00README_catalogue_and_licence.txt' is the met office readme
The folders contain the following:
Tasmax - https://data.ceda.ac.uk/badc/ukmo-hadobs/data/insitu/MOHC/HadOBS/HadUK-Grid/v1.1.0.0/1km/tasmax/day
(For some reason it downloads within a horrible load of nested folders, so I just c+p them out and then delete all of them first)
Rainfall - https://data.ceda.ac.uk/badc/ukmo-hadobs/data/insitu/MOHC/HadOBS/HadUK-Grid/v1.1.0.0/1km/rainfall/day
## Ongoing list of things to clarify
- Seems likely the reason the met office doesn't share recalibrated data as standard is because of the disagreement/lack of consensus on the best method/the wide documentation available for the different methods. Still be useful as an open collaborative repo though imo (ie people can add in their methods for use directly on the climate data/most recent observational data etc)
- Another paper comparing 4 methods, and suggesting distribution mapping (they just refer to the method as DM but is the same as the Switanek paper's approach: https://iwaponline.com/jwcc/article/13/4/1900/87422/Evaluation-of-four-bias-correction-methods-and)
- LCAT have requested the bias adjusted data, for the whole UK, for all their variables of interest, by autumn. Given that they present their climate variables by average per season per decade, I have a hard time believing that BA will effect these values barely at all because averaging at that time scale (although it might a bit if everything shifts with adjustment in the same direction, rather than towards the mean). I have clarified if this is actually how they will present the climate variables and will be great when we talk to the met office climate scientist if its necessary over this time period - not sure I understand the value added here (but def see the value added in a met office-turing collab for open source pipeline of recalibration)
- Do other folks in Turing outside of DyME have time to work on this if the met office collab goes ahead?
## Jen's Notes 23 July
- CEDA Archive: part of NERC's Environmental Data Service (EDS) and is responsible for looking after data from atmospheric and earth observation research
- Centre for Environmental Data Analysis
- UKCP Local Projections at 2.2km Resolution for 1980-2080
- Data download page https://data.ceda.ac.uk/badc/ukcp18/data/land-cpm/uk/2.2km
- Questions
- Can we do an auto download from CEDA archive?
- Reasoning for time slices
- What does the "bias" come from in the "bias adjustment"
- "Bias-corrected independently - inconsistency" > does this mean we must bias-correct all variables together?
- Most methods require adjusting variables separately
- Geotiff - point with climate variables
- Per day, a temp for each grid cells
- 12 probabilistic runs - 12 projections
- 2.2 model allows for less uncertainty
- Challenge: how to deal with the data rather than what it is
- Package for UK specific data + observations
- Choosing a bias adjustment method
- Input // Output?
- First dataset: 2.2km daily, tas (avg temp for day), max tas (max temp day), (maybe rainfall)
- Scale MSOA vs. Cornwall vs. National
- Auto-correlatory effect: impact of surrounding area on it?
- Check against climate scientist
- How to link historic projected data with observational data
- Match by geo point and day