AZmet QA/QC
- Matt "plugged in" rounding to correct precision–only applied to new data (since the last month or so)
- Part of
azmetr
could be checking for correct precision
- Matt has a place online where we can edit the measured variables and it will propogate to derived variables (we think).
- Split tasks: modeling and workflow automation
- Phase one is alert Jeremy of extreme values and imputed values (a daily report table published with Quarto, for example)
- Could fit multivariate model as additional step (can't be raining all day and have high solar radiation)
- Could detect things in derived variables that we don't see in measured variables beause of transformations?
Report refinements
Report currently uses all data for rule-based validations and just one day for forecast-based validations. This doesn't make sense. Need some flexibility in terms of what days are being viewed in the report and consistency between types of validations.
Then, there's two options:
- Create reports with 1 week of data weekly and save them. Make a Quarto website to easily view weekly reports.
- Make the report a Shiny app / flexdashboard with a date-range selector.
There's also the issue of the pointblank
style tables not being very customizable and not being very readable sometim es (e.g. with segmented validations).
Kickoff meeting (Oct 7)
Agenda:
- look at proposal
- set some dates
When CCT system was down Matt Harmon put some static QA/QC checks into place
Jeremey will send spreadsheet of active stations
Check if stations haven't reported for > 6 hrs
Dynamic QA/QC
- Probability distributions for paramaters using historical data, flag if outside some threshold
- Look how other systems do weather QA/QC
- Compare to climatology products (e.g. Daymet) or use those product for modeling
- Compare to forecast data products (e.g. from NOAA). Use ensemble forecasts to create interval for QA/QC.
Initial Meeting
AZMet update
- equipment maintainence / optimization
- modernization of data workflow
- need to update website
QA/QC needs
- sensor ranges
- check for physically impossible values
- Some (but not all) stations have data loggers that deal with small errors (e.g. RH = 100.1%)
- Collection frequency = hourly
- Maybe a Shiny dashboard (or other data visualization) to see the data plotted in addition to just alerts
- Basic Shiny apps for this exist
- Done some resarch on industry standards on QA/QC for environmental sensors. Can use this as source for QA/QC checks
- Reach goal: inter-station comparisons
Q: Is there software already in existance for meterology QA/QC?
A: Haven't seen anything from mesonets. Need to look around.
- FluxNet has shared tools for QA/QC
David suggestion–-for every data point, what's the probability that it's an error given history? What's the probability it's a "true" data point?
Matt Harmon
- was responsible for porting AZMet to more modern infrastructure
- dataloggers -> JSON -> python scripts
- some basic range checking and normalization is in python object
- Future idea that ranges would get pulled from a database table
- E.g. some of the weather stations are in golf courses that use sprinklers
- ranges in database table could be dynamic
- Simple validations built into code that pulls data (e.g. laws of physics) and have more sophisticated validations that rely on spatial and temporal variation in another layer.
3 layers:
- laws of physics checks–-hard coded
- range checks from ranges in database (seasonally dynamic?)
- spatio-temporal checks on data output including calculated values
Layer 1 in existing python code (or can be added)
Layer 2 maybe added to existing python code
Layer 3 dashboard
Notifications:
- Some of this done currently using SQL database
- Whatever tools do QA/QC just need to be able to interact with SQL database or REST API for notifications
Q: Are all tables exposed as REST API?
A: Depends on what we want to do with data. Currently API hides some things–-can't address a particular table or see metadata without authorization.
Data Integrity:
- Previously some manual manipulation of data coming off data loggers without any tracking
- Now manual modifications and updates are versioned
Q: Do you want provisional data (ASAP) and validated data? What's the desired turn-around for validated data?
A: Not a problem to have near-real-time data being provisional. Don't know what the turn-around should be for validated data. Need to figure it out.
Q: Could you define what kind of dashboard or automated reporting would be most useful?
A: Need to think through it still
- Need a separate dashboard for QA/QC for use in the field for diagnosing problems (doesn't need to be constrained to AZmet website)
- Separate dashboard for data consumers
- For website, focus on tabular data presentation (
gt
package?)
- For website, haven't gone beyond built-in capabilities of drupal
- Haven't gone into visualizations for website
API conversation
- Whenever we (David and Eric) only have read access to database that's a good thing.
Next steps:
- Drafting proposal for data incubator
- Jeremy will get it started and send around
meeting
Loading time isn't a big issue since only used internally
Likes the shiny app interface with date selector
AZMET in good position to fund continued work
First step:
Consistency checks, date slider, daily data shiny app
Report just pulls data from azmetr, doesn't need targets pipeline