AZmet QA/QC

Matt "plugged in" rounding to correct precision–only applied to new data (since the last month or so)
Part of azmetr could be checking for correct precision
Matt has a place online where we can edit the measured variables and it will propogate to derived variables (we think).
Split tasks: modeling and workflow automation
Phase one is alert Jeremy of extreme values and imputed values (a daily report table published with Quarto, for example)
Could fit multivariate model as additional step (can't be raining all day and have high solar radiation)
Could detect things in derived variables that we don't see in measured variables beause of transformations?

Report currently uses all data for rule-based validations and just one day for forecast-based validations. This doesn't make sense. Need some flexibility in terms of what days are being viewed in the report and consistency between types of validations.

Get forecast-based validation to work for past dates and potentially store results in a dataset somewhere

Then, there's two options:

Create reports with 1 week of data weekly and save them. Make a Quarto website to easily view weekly reports.
Make the report a Shiny app / flexdashboard with a date-range selector.

There's also the issue of the pointblank style tables not being very customizable and not being very readable sometim es (e.g. with segmented validations).

Kickoff meeting (Oct 7)

Agenda:

look at proposal
set some dates

When CCT system was down Matt Harmon put some static QA/QC checks into place

Jeremey will send spreadsheet of active stations
Check if stations haven't reported for > 6 hrs

Dynamic QA/QC

Probability distributions for paramaters using historical data, flag if outside some threshold
Look how other systems do weather QA/QC
- https://data.ess-dive.lbl.gov/view/doi:10.15485/1823516
- https://github.com/WSWUP/pyWeatherQAQC
Compare to climatology products (e.g. Daymet) or use those product for modeling
Compare to forecast data products (e.g. from NOAA). Use ensemble forecasts to create interval for QA/QC.

Initial Meeting

AZMet update

equipment maintainence / optimization
modernization of data workflow
need to update website

QA/QC needs

sensor ranges
check for physically impossible values
Some (but not all) stations have data loggers that deal with small errors (e.g. RH = 100.1%)
Collection frequency = hourly
Maybe a Shiny dashboard (or other data visualization) to see the data plotted in addition to just alerts
Basic Shiny apps for this exist
Done some resarch on industry standards on QA/QC for environmental sensors. Can use this as source for QA/QC checks
Reach goal: inter-station comparisons

Q: Is there software already in existance for meterology QA/QC?
A: Haven't seen anything from mesonets. Need to look around.

FluxNet has shared tools for QA/QC

David suggestion–-for every data point, what's the probability that it's an error given history? What's the probability it's a "true" data point?

Matt Harmon

was responsible for porting AZMet to more modern infrastructure
dataloggers -> JSON -> python scripts
some basic range checking and normalization is in python object
Future idea that ranges would get pulled from a database table
E.g. some of the weather stations are in golf courses that use sprinklers
ranges in database table could be dynamic
Simple validations built into code that pulls data (e.g. laws of physics) and have more sophisticated validations that rely on spatial and temporal variation in another layer.

3 layers:

laws of physics checks–-hard coded
range checks from ranges in database (seasonally dynamic?)
spatio-temporal checks on data output including calculated values

Layer 1 in existing python code (or can be added)
Layer 2 maybe added to existing python code
Layer 3 dashboard

Notifications:

Some of this done currently using SQL database
Whatever tools do QA/QC just need to be able to interact with SQL database or REST API for notifications

Q: Are all tables exposed as REST API?
A: Depends on what we want to do with data. Currently API hides some things–-can't address a particular table or see metadata without authorization.

Data Integrity:

Previously some manual manipulation of data coming off data loggers without any tracking
Now manual modifications and updates are versioned

Q: Do you want provisional data (ASAP) and validated data? What's the desired turn-around for validated data?
A: Not a problem to have near-real-time data being provisional. Don't know what the turn-around should be for validated data. Need to figure it out.

Q: Could you define what kind of dashboard or automated reporting would be most useful?
A: Need to think through it still

Need a separate dashboard for QA/QC for use in the field for diagnosing problems (doesn't need to be constrained to AZmet website)
Separate dashboard for data consumers
For website, focus on tabular data presentation (gt package?)
For website, haven't gone beyond built-in capabilities of drupal
Haven't gone into visualizations for website

API conversation

Whenever we (David and Eric) only have read access to database that's a good thing.

Next steps:

Drafting proposal for data incubator
Jeremy will get it started and send around

meeting

Loading time isn't a big issue since only used internally

Likes the shiny app interface with date selector

AZMET in good position to fund continued work

First step:
Consistency checks, date slider, daily data shiny app

Report just pulls data from azmetr, doesn't need targets pipeline

AZmet QA/QC

Report refinements

Kickoff meeting (Oct 7)

Initial Meeting

meeting

Read more

Questions for Personal Trainers

Ecosystems of the Southeast US

"Manually" updating PEcAn.ED / ED2 singularity container on HPC

Mismatch between priors and data