# AZmet QA/QC
- Matt "plugged in" rounding to correct precision--only applied to *new* data (since the last month or so)
- Part of `azmetr` could be checking for correct precision
- Matt has a place online where we can edit the measured variables and it will propogate to derived variables (we think).
- Split tasks: modeling and workflow automation
- Phase one is alert Jeremy of extreme values and imputed values (a daily report table published with Quarto, for example)
- Could fit multivariate model as additional step (can't be raining all day and have high solar radiation)
- Could detect things in derived variables that we don't see in measured variables beause of transformations?
## Report refinements
Report currently uses **all data** for rule-based validations and **just one day** for forecast-based validations. This doesn't make sense. Need some flexibility in terms of what days are being viewed in the report and consistency between types of validations.
- [ ] Get forecast-based validation to work for past dates and potentially store results in a dataset somewhere
Then, there's two options:
1. Create reports with 1 week of data weekly and save them. Make a Quarto website to easily view weekly reports.
2. Make the report a Shiny app / flexdashboard with a date-range selector.
There's also the issue of the `pointblank` style tables not being very customizable and not being very readable sometim es (e.g. with segmented validations).
## Kickoff meeting (Oct 7)
Agenda:
- look at proposal
- set some dates
When CCT system was down Matt Harmon put some static QA/QC checks into place
Jeremey will send spreadsheet of active stations
Check if stations haven't reported for > 6 hrs
Dynamic QA/QC
- Probability distributions for paramaters using historical data, flag if outside some threshold
- Look how other systems do weather QA/QC
- https://data.ess-dive.lbl.gov/view/doi:10.15485/1823516
- https://github.com/WSWUP/pyWeatherQAQC
- Compare to climatology products (e.g. Daymet) or use those product for modeling
- Compare to forecast data products (e.g. from NOAA). Use ensemble forecasts to create interval for QA/QC.
## Initial Meeting
AZMet update
- equipment maintainence / optimization
- modernization of data workflow
- need to update website
QA/QC needs
- sensor ranges
- check for physically impossible values
- Some (but not all) stations have data loggers that deal with small errors (e.g. RH = 100.1%)
- Collection frequency = hourly
- Maybe a Shiny dashboard (or other data visualization) to see the data plotted in addition to just alerts
- Basic Shiny apps for this exist
- Done some resarch on industry standards on QA/QC for environmental sensors. Can use this as source for QA/QC checks
- Reach goal: inter-station comparisons
Q: Is there software already in existance for meterology QA/QC?
A: Haven't seen anything from mesonets. Need to look around.
- FluxNet has shared tools for QA/QC
David suggestion---for every data point, what's the probability that it's an error given history? What's the probability it's a "true" data point?
Matt Harmon
- was responsible for porting AZMet to more modern infrastructure
- dataloggers -> JSON -> python scripts
- some basic range checking and normalization is in python object
- Future idea that ranges would get pulled from a database table
- E.g. some of the weather stations are in golf courses that use sprinklers
- ranges in database table could be dynamic
- Simple validations built into code that pulls data (e.g. laws of physics) and have more sophisticated validations that rely on spatial and temporal variation in another layer.
3 layers:
1. laws of physics checks---hard coded
2. range checks from ranges in database (seasonally dynamic?)
3. spatio-temporal checks on data output including calculated values
Layer 1 in existing python code (or can be added)
Layer 2 maybe added to existing python code
Layer 3 dashboard
Notifications:
- Some of this done currently using SQL database
- Whatever tools do QA/QC just need to be able to interact with SQL database or REST API for notifications
Q: Are all tables exposed as REST API?
A: Depends on what we want to do with data. Currently API hides some things---can't address a particular table or see metadata without authorization.
Data Integrity:
- Previously some manual manipulation of data coming off data loggers without any tracking
- Now manual modifications and updates are versioned
Q: Do you want provisional data (ASAP) and validated data? What's the desired turn-around for validated data?
A: Not a problem to have near-real-time data being provisional. Don't know what the turn-around should be for validated data. Need to figure it out.
Q: Could you define what kind of dashboard or automated reporting would be most useful?
A: Need to think through it still
- Need a separate dashboard for QA/QC for use in the field for diagnosing problems (doesn't need to be constrained to AZmet website)
- Separate dashboard for data consumers
- For website, focus on tabular data presentation (`gt` package?)
- For website, haven't gone beyond built-in capabilities of drupal
- Haven't gone into visualizations for website
API conversation
- Whenever we (David and Eric) only have read access to database that's a good thing.
Next steps:
- Drafting proposal for data incubator
- Jeremy will get it started and send around
# meeting
Loading time isn't a big issue since only used internally
Likes the shiny app interface with date selector
AZMET in good position to fund continued work
First step:
Consistency checks, date slider, daily data shiny app
Report just pulls data from azmetr, doesn't need targets pipeline