# Data Readiness, Collection and Monitoring
### Overall, how would you describe our ability to create dedicated data assets to address issues related to the COVID-19 pandemic?
**Random thoughts**
- who is "our"? DS community? Gov? PHE?
- Clearly the UK DS community has the ability to create dedicated data assessts, but perhaps this ability is not being leveraged by gov/PHE
**Final message**
1. The UK has made an important effort in providing open datasets that has helped researchers, journalists and citizens to understand the evolution of the pandemic, but there is lots of room for improvement in making these datasets truly FAIR.
### Share an example of a data asset that was used to positively address issues faced during the COVID-19 pandemic
**Random thoughts**
- Issue: we want an accurate understand of how many people have died as a result of the pandemic. FT using open data to challenge the official number of COVID-19 deaths. Open data gives us the abilitiy to have independent checks
**Final message**
- Open data from different institutions (ONS and PHE) has given 3rd parties the ability to cross check import statistics such as the real death toll of the pandemic.
### Give a specific example of an aspect of data collection or monitoring that may have been neglected or not taken up by government/policymakers and the possible reasons for this?
**Random thoughts**
- Testing data, not availible in high granularity geographic regions as a function of time. Necessary to understand how effectively testing power is being distributed across the Country (much more so than just reporting total number of tests). possible reasons:
- adding geographic breadown is just harder and takes more effort
- But possible reasons for why this was not shared could be that testing power was not being efficiently disrtibuted where it was most needed.
- mention the FOI
**Final message**
1. Testing data is not availible in high geographic granularity as a function of time. Our project submitted an FOI request which was withheld under Section 22, which states that public bodies are not obliged to disclose information that is intended for future publication. To date testing data is only availible at the Nation level.
2. Data storage and format not consistent over the time. Older datasets are only found in archived pages. Testing data disagregation by pillars stop being produced in the summer, making the work of some researchers more difficult.
### What could be done to further work towards inclusion and equality in data collection and monitoring?
**Random thoughts**
- Link to previous point on testing
- What kinds of data
- Testing
- Only certain groups of people will be able to pay to access tests without having symptoms. But even in this case, it can be practically very difficult to get a test (for example the test centre is far away from you, and you don't have a car (and you can't take public transport), or your work schedule does not allow you to easily get tests).
- Mortality
- Long term effects
- Have to have a well designed strategy to understand this, start taking data now, keep at it
- Mental health
- People without easy access to healthcare might be missed when tracking long term effects
- two types:
- You don't ask a question that you should ask when collecting data, for example collecting information on ethnicity. the things
- Or, you don't include everyone in the data collection process.
**Final message**
1. Lack of transparency on the geographical distribution of testing means that citizens don't know if their regions are served with the right capacity of tests (so we cannot check if testing capacity is distributed equitably).
2. There should be open acccess to local positivity rates.
2. If you can't afford to pay for a test, it should be provided for you. Otherwise the testing program will not be inclusive.
### What can be learnt from the experiences in this pandemic to further improve the way that data is collected and monitored and its impact on policy?
**Random thoughts**
- Confusion is created if data labels are not standardised/unified between different providers.
- For example there is no standard definition of a COVID-19 mortality, which makes counting the number of deaths due to the pandemic needlessly complicated. Different bodies publishing mortality counts use different definitions, and hence arrive at different counts. PHE for example use an arbitrary 28 day cutoff and therefore significantly underestimate the total number of excess deaths due to COVID-19. The number of COVID-19 deaths has strong policy implications is a key figure used by the public to judge the success of the polcies that are being rolled out by the government.
- Similarly, a monolithic test counts are misleading / useless as there are so many different types of tests, with different efficacies, false positive rates, etc. Perhaps we could come up with a "testing power" metric which leads to a more functional definition of how powerful the testing regime is in a Country. For example given the number of tests of different types being performed, and their efficacies, FPRs, it should be possible to work out what fraction of theortical positive cases you are catching. (and then report that number rather than total tests)
- New types of rapid tests with weird featuers (e.g. high false negative rates) should not be considered to be identical to other forms of testing
**Final message**
1. It would be useful to standardise (and keep fixed in time) not only the structures of the datasets, but also the definitions of common data labels. For example, there are many different ways to count the number of COVID-19 mortalities, and different protocols regarding the dates that mortalities are assigned to. This could lead to perceived inconsistencies if different definitions are used to report on what should be conceptualised as the same thing.
2. A unified place to retrieve reliable data should be set up and well maintained.
3. Dataset structure, data labels, or data aggregation schema should not change over time.
4. More useful (i.e. granular) reporting on testing capacity. With the ever increasing number of different types of test, it is useful to record not onlyt the number of tests, but also the type of test. This can be used (along with awareness of the different efficacies and their turnaround times) to better understand aggregate "testing capacity" of the Nation.
### Any other comments that are important to our learning review and did not easily sit in the other question headings