# 2023 DMAC
## IOOS Office Updates - Becky Baltes
- Becky is the acting office chief while Derick is distracted
- NOAA strategic goals
- Climate ready nation
- Equity
- Accelerate growth in an information based blue ecconomy
- IRA planning
- 2024 Code Sprint in DC
- Coastal and Ocean Modeling
- Highlights
- Gliders
- github.com/ioos/glider-dac
- Marine Life
- Passive Acoustics SoundCOOP
- HF-Radar
- HAB
- Working through Congress and likely to get their blessing
- Product updates
- ioos.us
- New tiling tech
- Environmental sensor map
- new version coming out shortly
- Model viewer
- Also a new version coming soon
- Metrics
- Built from the region progress reports
## Global Ocean Monitoring and Observing (GOMO) and IOOS: Future opportunities for aligning data communities of practice (Ann Zinkann, Alyse Larkin, Cindy Garcia/GOMO)
- What is GOMO
- Global Ocean Monitoring and Observing program
- Provide and support high quality global ocean observations...
- Support ~3k platforms
- No formal data strategy
- What now
- Develop a streamline, robust, and integrated the GOMO data ecosystem
- Alignment with IOOS Data Management
## CIOOS Updates (Ray Brunsting/CIOOS & Scott Bruce)
- Update number 5, 5 years in
- About 2% of the funding as IOOS
- Not really paying for gear to be put in the water
- Focused on the data side
- Working together alot
- National committees
- Shared tooling
- Shared infra
- 2 year funding cycle
- CIOOS has become a regional association of GOOS as an IOOS sibling rather than combing into a North America region
- Infra
- Each region runs their own CKAN and ERDDAP
- Institutions often run their own ERDDAP and/or CKANs
- National CKAN that harvests from the regions
- National tools
- Data explorer
- More comprehensive tool, filter, aggregate, and download various datasets
- Can fall back to aggregate ERDDAP data if CKAN isn't avaliable
- Integrated GLOS data this way
- Catalogue map
- Data catalogue
- Driven by CKAN
- Hassle Matt Foster for more info
- variable categories
- Bioecosystems
- biogeochemical
- cross-displinary
- Physical
- Integrated into Ocean InfoHub
- Choosing to federate early has worked out well
- Atlantic
- Hurricane Dashboard
- Historial storm details
- nea-realtime data nad models predictions from Evironment Canada
- Under development, talk to Scott
- Saint Lawrence
- Navigation
- Boating aid too
- Current, wind, tide, weather forecasts
- HAB monitoring
- Model developed by Department of Fisheries and Oceans
- Pacific
- Ocean Connect - https://oceanconnect.ca/
- Salish sea focused
- Trying to use higher res model compared to what Windy is doing
- Bringing in
- Areas of focus
- Continued development of national and regional tools
- Integration with global data catalogs
- Model data integration
- AI
## OTN, ATN, Nodes, and OBIS (Jon Pye/OTN, Angela Dini (OTN))
- History
- Started in 2008
- Anomaly for Canadian programs as its supposed to work with foreign partners
- Initally had a single centralized station with 300 million detections
- Infrastructure has moved funding sources to a 6 year repeating cycle
- Now coordinating other communities and helping them stay compatible
- Nodes
- 'Everything is related to everything eelse. But near things are more related than distant things'
- More trusting of neighbors with their data
- PostGIS database
- But it more a social problem than tech problem
- OTN can host nodes if folks don't have the funding to do so locally
- Hosted by others
- SECORA
- Hundreds of millions of detections
- Hosted by Axiom
- MARACOOS and Smithsonian
- ...
- US Nodes
- Publication pipelines
- OBIS
- Any OTN node can push up to OBIS
- Python based, any node manager can run
-
- Data sharing inititatives
- DaViT
- visualizes acoustically detected animal movement data on a per speiecs and per project level
- Adding more QC checks
- Rshiny
- subsetting and downloading from PATH
- Animal tracking network
- AniBOS, wifi equipped seals
## Towards a Prototype National Harmful Algal Bloom DAC: Successes, Challenges, and Lessons Learned (Rob Bochenek/Axiom)
- Establish a California regional HAB hub
- Imaging Flow CytoBot
- real time photos of critters
- very ad-hoc research data processing currently
- Transitioning to operations
- ML models from random forests to convolutional neural netowkrs
- oerational data processing pipeline for processing and classification
- improve quality of traing data set through data governance
- Impacts
- improved accuracy from 70% to 90%
- moved latency from weeks to near real time
- Regional expansion
- National implementaton plan to be released soon
- Work with regional pilots to assimilate models
- Processing pipeline
- Rsyncing raw data
- Instruments need to push data due to federal IT reqirements
- Dask for analysis
- ML model classifiers are run in real time on a 4 V100 GPU cluster
- 12-15$ an hour to get that power in the cloud
- machine in Axioms data center
- PyTorch and Tensorflow models run on the same feed for comparison
- WHOI dashboard for instrument operators
- Postgres and Streamlit for alert and other mangement for operators
- Lots of the processing was managed by people
- Focus on performance
- calculations are done as close to data as possible
- FastAPI
- reduced query times from 20+ seconds to <1
- Pre-calculating winners
- Assing data using streamlit
- can be worked on by many levels of engineer/scientists
- Visions
- Want to make it plug and play for IFCBs
- Rapidly view sample images on dashboard
- Compare regional classifiers side by side
- Also how much processing can happen on the instruments?
- Annotated image libraries
- Automated product development
- Script libraries for data archiving
- Regional Diversity
- Management Products People Want
## SoundCoop - Passive Acoustic Monitoring Access Network (Carrie Wall/NCEI)
- Federal acousting monitoring and management
- Existing portal capacity to compare or access standard projects
- SanctSound
- PACM
- Passive acoustic cetacian map
- Whale detection data over time
- NCEI
- Passive acoustic data viewer
- How to leverage all the systems by advancing standards and community tools
- Passive Acoustic Monitoring National Cyberinfrastructure Center (SoundCoop)
- Community focued
- IOOS, BOEM, NAVFAC, ONR
- Progression - 3 year effort
- 1 - Build national repository
- NCEI archive
- 2 - Integrated access
- Build infrastructure to discover and access existing raw files and data products across seperate repositories
- 3 - Visualization tools
- 4 - ?
- User profiles
- Temporal sound level analysis in the arctic oceans
- Spatial sound level analysis of NERACOOS, CeNCOOS, and SECOORA
- Monhegan and Petit Manan
- Offshore energy development datasets
- Integrate BOEM and state funding
- North atlantic right whale detections
- International datasets
- Synergy with international effors
- NEFS - Tim Rowell
- Woring with AWI/OPUS and JOMOPANS
- Tasks
- Run MANTA to create sound level metrics
- daily files of 1 min resolution hybrid milidecade band spectra
- Standalone matlab, no license needed
- Evaluate pypam
- As comparitive tool for sould level metrics
- Archive data and products at NCEI with a NetCDF standard
- Establish workflows to access data in cloud
- ...
- Passive acoustics community is largely producing CSVs
- No idea what NetCDF is or why metadata is helpful
- Power of millidecade NetCDFs
- Audio data
- 4 TB, 51k files
- ~ 77 MB, Wav
- Hybrid Millidecade spectra
- 5.4 GB
- 1,440 files
- 15 MB
- CSV, netCDF, and pngs
- Hybrid Millidecade spectra - daily
- ...
- Cyberinfra framework
- 3 PAM repos
- NCEI Google Cloud
- Hybrid millidecade from 14 recording sites across 8 sites using MANTA netCDF
- https://www.ncei.noaa.gov/products/passive-acoustic-data
- MBARI AWS
- Hybrid from 1 site using pypam netCDF
- Axiom in house
- Hybrid millidecade from 1 site using MANTA
- Audio files
- Standard derived products and metadata
- Common software stack
- Jupyter
- MANTA
- ERDDAP
- Visualizing
- SoundCOOP portal
- points show recording sites
- Level one product
- Development of community tools
- Notebooks
- How to access netCDF
- Everyone currently has to recreate the wheel
- Comparison with other types of data
- Next steps
- Finalize processing of all datasets
- Develop user environment to pull raw data from repository and create standard metrics
- Finalize portal and co-visualize results
- Working on making sure it's possibly to go back to the raw files from the daily millidecimated netCDFs
## Data Management Workflow and Challenges in Developing a California Ocean Acidification and Hypoxia Data Portal (Marine Lebrec/CeNCOOS)
- Objectives
- improve data quality, interoperability, and access
- streamline data ingestion for enhanced
- Ingesting new datasets
- NOAA PMEL West Coast ocean acidification crusies
- via NCEI
- CTD
- Underway PCO2
- Petropod abundance/disolution
- Using the Axiom research workspace
- Co-located environment and plankton monitoring
- via ERDDAP
- MBARI underway PCO2 via MySQL
- 1993-present
- cruise data aggregated via 1km grids
- Water quality dashboard for shelfish growers
- Update hourly
- 14-day trend
- QARTOD filtering
- Growers wanted a simple user interface to access water quality data
- JSON parameter file for each shore station
- build URLS for ERDDAP query
- Fetch data via erddapy, write to pandas df
- Calculate hourly mean, remove failed QARTOD
- Future direction
- Integration of other datasets
- Development of next gen dashboards with Axiom
- Shelfish growers
- Water management
- Fisheries
- MPAs
- Notification/alert system for users
- Still scoping
## Standardizing Marine Biological Data working group update (Tim Van Der Stap/Hakai)
- Started during the 2019 code sprint
- Aligning marine biological data to Darwin Core to share to OBIS
- Practical and informal
- Distracted by the idea of lunch
## ATN data pipeline updates - GTS/OceanOPS and NCEI (satellite telemetry sharing) (Megan McKinzie/ATN)
- Mutli agency funding so serving their needs as well as the scientific community
- DAC
- run by Axiom
- Recieves data from multiple sources/tag types
- Enables archiving at NCEI,...
- Primarily satelite data from ARgos
- Serving animal borne profiles to GTS
- Proposed
- PI, taxon, or region specific configs
- Additional tests
- Animal borne profiles
- increase in ocean model quality and decreased error as well as storm and weather predictions
- Data processing pipeline
- Metadata in ADR
- Near real time auto ingesting
- Every two hours
- Run through QARTOD
- viz.pmel.noaa.gov/omsc/
- AniBOS OceanOPS dashboard
- Need to translate QARTOD flags into GTS flags
- ATN Seal of Approval
## Application of Cloud-Native Solutions to challenging datasets (Kelly Knee/Audra Luscher)
- Reaching for the cloud
- Decrease barriers for entry
- Prototype roadmap
- Direct access from S3
- Ingest
- Storage and discovery
- Processing
- Analysis and Presentation
- Case studies
- NCDIS 40 year reanalysis - CO-OPS
- Outlooks for sea level rise
- Climatological
- Gridded
- National Coastal Data Informations System Multi-daecadal Year Reanalysis
- Long term reanalysys of water levels and waves
- Integrating models and observations to predict flooding between tide stations
- Fills extensive gaps between observation locations
- 500m grid
- ~70 TB of storage
- Monthly high tide outlook
- Sea level trends
- National water model
- 5 GB per day, updated hourly
- networked
- 2.7 million reaches
- Additionaly hydrologic information on 1km, 250m, and 100m grids
- NODD S3 bucket
- Lat/lon not within the existing files
- Kerchunk to add lat long
## NCOS Coastal Reports and Ocean Finder: Real Time Geoanalytics of IOOS Data Streams in Support of Marine Spatial Planning (Rob Bochenek/Axiom)
- If you end up in Anchorage, please drop by Axiom to visit or if you need a place to work
- Coastal Reports
- Building on Ocean Reports (previous tool)
- Draw a polygon and produce analytics and reports
- Ocean Finder
- Spatial suitability analytics across user defined criteria across multidisciplinary datasets across the US EEZ
- Jupyter lab, Spark, Sedona, FastAPI, Parquet, GeoTIFF, netCDF
- H3 heiarchial, spatial indexing
- triangles had 12 neighbors
- squares have 8 neighbors
- hexagons always have 6 neighbors and share edges and vertices
- Aperture 7 mesh
## Extending data services beyond the US EEZ (Felimon Gayanilo, GCOOS)
- Hard to know what is going on in the Gulf of Mexico without knowing what is going on around the Yucatan Peninsula
## GLOS Updates: Seagull Update/Overview + Technical Overview (Tim Kerns, Joe Smith/GLOS)
- Wanted to build a tech platform
## HF Radar Range Series Archival Project Status Update (Shane St. Savage/Axiom)
- Goals
- Organize near-real time HF radar range series and instrument config files from all IOOS HF operators
- New features
- Data and config download via ERDDAP
- Broke tables and had to switch to the lazy loading of directory only access
- Next steps
- Expand to more operators
- Enable multi-file downloads with a UI tool to generate curl commands
- Improve UI plot rendering perfomance
- Make deployment more robust
## Serving QC solutions: New approaches (Eugene Burger/PMEL)
- https://github.com/NOAA-PMEL/QCaaS
## Water Level QC AI (Lindsay Abrams/CO-OPS, James Spore, Hassan Moustahfid/IOOS)
- Current QC workflow
- Primary and backup sensors
- Onsite QC
- Preliminary
- Manual processing and verification
- Takes a lot of people and resources
- Common issues
- Spikes
- Missing
- Flats
- Rare events
- but actually legit, so should be identified as good
- Tsunami
- Ideal AI workflow
- preprossing
- classify good and bad using AI model
- currently here
- correct bad data automatically using secondary algorith
- Model testings
- Logistic regression
- Random Forest
- Gradient Boost
- Neural Net
- All had around 99.7% accuracy, except regression was around 98.9%
- Neural network approach
- Inputs
- primary
- backup
- predictions
- residual level
- sigma water level...
- two hidden layers
- scikit learn, TensorFlow, Keras
- Regional model training
- Started building models for specific regions for better accuracy
- NorthEast tends to be more accurate than all but Hawaii
- NW did especially poorly
- NE Model
- loss stabilized
- NW model
- loss never stabilized
- Some bad backup data (Tacoma) caused some overfitting
- Takeaways
- ML approaches can be used to classify good/bad water level data
- Neural net model is best approach from testing so far
- Quality of data used for training is very important
- GPUs can be used to preform many experiments at once and accelerate AI/ML research
- An automated AI-enabled QC application would reduce CO-OPS resouce requirements and could be used as a community tool
- Next steps
- Troubleshoot issues with AI model for classifying bad data
- Bad data is rare
- Make sure it's better represented in training datasets
- Develop algorithms to correct bad data
- Initial results are promising using a regression model
- Using HPC
- Converted to Parquet to work with RAPIDS
- OpenACC Hack-a-thon allowed use of GPUs and HPC
- Questions for crowd
- Employed automatic QC?
- Interested in a QC tool that automatically identifies and corrects bad data
- What gap filling algorithms do you currently use?
- Recomendations on AI/ML methods for gap filling
- Opportunities to collaborate
- Can we use your water level datasets to test our model
## IOOS/NOAA Cloud Sandbox Update (Patrick Tripp/RPS, Tiany C. Vance/IOOS)
- Running regional coastal models in the cloud
- Gulf of Maine is an OFS domain
- Lynker is currently deploying the sandbox on NOAA's cloud
## IOOS Sensor Map ERDDAP Integration Status Update (Shane St. Savage/Axiom)
- RA should have more control what shows in the map
- Beta
- https://beta.sensors.ioos.us/
- Limited to data served by RAs
- may include other sources to augument spatial coverage
- Discovered in IOOS catalog
- Served via ERDDAP tabledap
- Queries IOOS Ckan search daily
- Harvests data every 15 minutes
- Current status
- Monthly processing reports emailed to RA data managers
- ~12k stations where old one had ~46k
- Transitioning back to using the IOOS catalog for most RAs
- Stable
- Kafka > TimescaleDB
- Hex binning
- Temporal binning in API endpoints
- Comprehensive view of data but doesn't reflect RA curation
- Next steps
- Quality of life tooling for ERDDAP
- docker-erddap
- erddap-gold-standard
- Reaching out to RAs to sort out ERDDAP service avaliablity or data issues
- Improve operational status reporting
- Coverage gap analysis
## ERDDAP update
- New NDBC division chief
- Meet the regions where they are
- IT systems are frankenstein monsters
- Need to get that fixed
- Priorities for the next few years for ERDDAP development
- Improve the testing workflow
- Takes maybe a day to run
- Sharing the private backlog of issues and request publicly
- Wants to move to more standard libraries but need to be aware of memory usage since it's already very heavy
- Reduce the complexity of the current codebase
- Part of the role of the committee
- Update directory structure to match other maven projects
- ERDDAP for archiving
- ERDDAP slaps a timestamp in NetCDFs on build
- in the history
- doesn't matter if the source data hasn't changed
- RAs are supposed to generate netCDFs manually in the WAF
- What does CIOOS need out of ERDDAP/what are their plans?
- Friendlier UI and downloader
## Links
- Quality Control as a service - https://github.com/NOAA-PMEL/QCaaS
- ERDDAP util Tomcat log parsing - https://erddaputil.readthedocs.io/en/latest/tomtail.html
- Axiom intake repos - https://github.com/search?q=org%3Aaxiom-data-science+intake&type=repositories
- Intake ERDDAP - https://github.com/axiom-data-science/intake-erddap
- Xpublish Intake serving - https://github.com/axiom-data-science/xpublish-intake
- Intake Axiom - https://github.com/axiom-data-science/intake-axds
- Intake CO-OPS - https://github.com/axiom-data-science/intake-coops
- ERDDAP getting started - https://coastwatch.pfeg.noaa.gov/erddapinfo/index.html
- Axiom hacking - https://github.com/axiom-data-science/hackathon-tabular
- Axiom intake catalogs to load model data - https://github.com/axiom-data-science/mc-goods
- Coastwatch satellite course? - https://coastwatch.pfeg.noaa.gov/courses/satellite_course_info.html
- NCEI passive acoustic data - https://www.ncei.noaa.gov/products/passive-acoustic-data
-