# 2023 DMAC ## IOOS Office Updates - Becky Baltes - Becky is the acting office chief while Derick is distracted - NOAA strategic goals - Climate ready nation - Equity - Accelerate growth in an information based blue ecconomy - IRA planning - 2024 Code Sprint in DC - Coastal and Ocean Modeling - Highlights - Gliders - github.com/ioos/glider-dac - Marine Life - Passive Acoustics SoundCOOP - HF-Radar - HAB - Working through Congress and likely to get their blessing - Product updates - ioos.us - New tiling tech - Environmental sensor map - new version coming out shortly - Model viewer - Also a new version coming soon - Metrics - Built from the region progress reports ## Global Ocean Monitoring and Observing (GOMO) and IOOS: Future opportunities for aligning data communities of practice (Ann Zinkann, Alyse Larkin, Cindy Garcia/GOMO) - What is GOMO - Global Ocean Monitoring and Observing program - Provide and support high quality global ocean observations... - Support ~3k platforms - No formal data strategy - What now - Develop a streamline, robust, and integrated the GOMO data ecosystem - Alignment with IOOS Data Management ## CIOOS Updates (Ray Brunsting/CIOOS & Scott Bruce) - Update number 5, 5 years in - About 2% of the funding as IOOS - Not really paying for gear to be put in the water - Focused on the data side - Working together alot - National committees - Shared tooling - Shared infra - 2 year funding cycle - CIOOS has become a regional association of GOOS as an IOOS sibling rather than combing into a North America region - Infra - Each region runs their own CKAN and ERDDAP - Institutions often run their own ERDDAP and/or CKANs - National CKAN that harvests from the regions - National tools - Data explorer - More comprehensive tool, filter, aggregate, and download various datasets - Can fall back to aggregate ERDDAP data if CKAN isn't avaliable - Integrated GLOS data this way - Catalogue map - Data catalogue - Driven by CKAN - Hassle Matt Foster for more info - variable categories - Bioecosystems - biogeochemical - cross-displinary - Physical - Integrated into Ocean InfoHub - Choosing to federate early has worked out well - Atlantic - Hurricane Dashboard - Historial storm details - nea-realtime data nad models predictions from Evironment Canada - Under development, talk to Scott - Saint Lawrence - Navigation - Boating aid too - Current, wind, tide, weather forecasts - HAB monitoring - Model developed by Department of Fisheries and Oceans - Pacific - Ocean Connect - https://oceanconnect.ca/ - Salish sea focused - Trying to use higher res model compared to what Windy is doing - Bringing in - Areas of focus - Continued development of national and regional tools - Integration with global data catalogs - Model data integration - AI ## OTN, ATN, Nodes, and OBIS (Jon Pye/OTN, Angela Dini (OTN)) - History - Started in 2008 - Anomaly for Canadian programs as its supposed to work with foreign partners - Initally had a single centralized station with 300 million detections - Infrastructure has moved funding sources to a 6 year repeating cycle - Now coordinating other communities and helping them stay compatible - Nodes - 'Everything is related to everything eelse. But near things are more related than distant things' - More trusting of neighbors with their data - PostGIS database - But it more a social problem than tech problem - OTN can host nodes if folks don't have the funding to do so locally - Hosted by others - SECORA - Hundreds of millions of detections - Hosted by Axiom - MARACOOS and Smithsonian - ... - US Nodes - Publication pipelines - OBIS - Any OTN node can push up to OBIS - Python based, any node manager can run - - Data sharing inititatives - DaViT - visualizes acoustically detected animal movement data on a per speiecs and per project level - Adding more QC checks - Rshiny - subsetting and downloading from PATH - Animal tracking network - AniBOS, wifi equipped seals ## Towards a Prototype National Harmful Algal Bloom DAC: Successes, Challenges, and Lessons Learned (Rob Bochenek/Axiom) - Establish a California regional HAB hub - Imaging Flow CytoBot - real time photos of critters - very ad-hoc research data processing currently - Transitioning to operations - ML models from random forests to convolutional neural netowkrs - oerational data processing pipeline for processing and classification - improve quality of traing data set through data governance - Impacts - improved accuracy from 70% to 90% - moved latency from weeks to near real time - Regional expansion - National implementaton plan to be released soon - Work with regional pilots to assimilate models - Processing pipeline - Rsyncing raw data - Instruments need to push data due to federal IT reqirements - Dask for analysis - ML model classifiers are run in real time on a 4 V100 GPU cluster - 12-15$ an hour to get that power in the cloud - machine in Axioms data center - PyTorch and Tensorflow models run on the same feed for comparison - WHOI dashboard for instrument operators - Postgres and Streamlit for alert and other mangement for operators - Lots of the processing was managed by people - Focus on performance - calculations are done as close to data as possible - FastAPI - reduced query times from 20+ seconds to <1 - Pre-calculating winners - Assing data using streamlit - can be worked on by many levels of engineer/scientists - Visions - Want to make it plug and play for IFCBs - Rapidly view sample images on dashboard - Compare regional classifiers side by side - Also how much processing can happen on the instruments? - Annotated image libraries - Automated product development - Script libraries for data archiving - Regional Diversity - Management Products People Want ## SoundCoop - Passive Acoustic Monitoring Access Network (Carrie Wall/NCEI) - Federal acousting monitoring and management - Existing portal capacity to compare or access standard projects - SanctSound - PACM - Passive acoustic cetacian map - Whale detection data over time - NCEI - Passive acoustic data viewer - How to leverage all the systems by advancing standards and community tools - Passive Acoustic Monitoring National Cyberinfrastructure Center (SoundCoop) - Community focued - IOOS, BOEM, NAVFAC, ONR - Progression - 3 year effort - 1 - Build national repository - NCEI archive - 2 - Integrated access - Build infrastructure to discover and access existing raw files and data products across seperate repositories - 3 - Visualization tools - 4 - ? - User profiles - Temporal sound level analysis in the arctic oceans - Spatial sound level analysis of NERACOOS, CeNCOOS, and SECOORA - Monhegan and Petit Manan - Offshore energy development datasets - Integrate BOEM and state funding - North atlantic right whale detections - International datasets - Synergy with international effors - NEFS - Tim Rowell - Woring with AWI/OPUS and JOMOPANS - Tasks - Run MANTA to create sound level metrics - daily files of 1 min resolution hybrid milidecade band spectra - Standalone matlab, no license needed - Evaluate pypam - As comparitive tool for sould level metrics - Archive data and products at NCEI with a NetCDF standard - Establish workflows to access data in cloud - ... - Passive acoustics community is largely producing CSVs - No idea what NetCDF is or why metadata is helpful - Power of millidecade NetCDFs - Audio data - 4 TB, 51k files - ~ 77 MB, Wav - Hybrid Millidecade spectra - 5.4 GB - 1,440 files - 15 MB - CSV, netCDF, and pngs - Hybrid Millidecade spectra - daily - ... - Cyberinfra framework - 3 PAM repos - NCEI Google Cloud - Hybrid millidecade from 14 recording sites across 8 sites using MANTA netCDF - https://www.ncei.noaa.gov/products/passive-acoustic-data - MBARI AWS - Hybrid from 1 site using pypam netCDF - Axiom in house - Hybrid millidecade from 1 site using MANTA - Audio files - Standard derived products and metadata - Common software stack - Jupyter - MANTA - ERDDAP - Visualizing - SoundCOOP portal - points show recording sites - Level one product - Development of community tools - Notebooks - How to access netCDF - Everyone currently has to recreate the wheel - Comparison with other types of data - Next steps - Finalize processing of all datasets - Develop user environment to pull raw data from repository and create standard metrics - Finalize portal and co-visualize results - Working on making sure it's possibly to go back to the raw files from the daily millidecimated netCDFs ## Data Management Workflow and Challenges in Developing a California Ocean Acidification and Hypoxia Data Portal (Marine Lebrec/CeNCOOS) - Objectives - improve data quality, interoperability, and access - streamline data ingestion for enhanced - Ingesting new datasets - NOAA PMEL West Coast ocean acidification crusies - via NCEI - CTD - Underway PCO2 - Petropod abundance/disolution - Using the Axiom research workspace - Co-located environment and plankton monitoring - via ERDDAP - MBARI underway PCO2 via MySQL - 1993-present - cruise data aggregated via 1km grids - Water quality dashboard for shelfish growers - Update hourly - 14-day trend - QARTOD filtering - Growers wanted a simple user interface to access water quality data - JSON parameter file for each shore station - build URLS for ERDDAP query - Fetch data via erddapy, write to pandas df - Calculate hourly mean, remove failed QARTOD - Future direction - Integration of other datasets - Development of next gen dashboards with Axiom - Shelfish growers - Water management - Fisheries - MPAs - Notification/alert system for users - Still scoping ## Standardizing Marine Biological Data working group update (Tim Van Der Stap/Hakai) - Started during the 2019 code sprint - Aligning marine biological data to Darwin Core to share to OBIS - Practical and informal - Distracted by the idea of lunch ## ATN data pipeline updates - GTS/OceanOPS and NCEI (satellite telemetry sharing) (Megan McKinzie/ATN) - Mutli agency funding so serving their needs as well as the scientific community - DAC - run by Axiom - Recieves data from multiple sources/tag types - Enables archiving at NCEI,... - Primarily satelite data from ARgos - Serving animal borne profiles to GTS - Proposed - PI, taxon, or region specific configs - Additional tests - Animal borne profiles - increase in ocean model quality and decreased error as well as storm and weather predictions - Data processing pipeline - Metadata in ADR - Near real time auto ingesting - Every two hours - Run through QARTOD - viz.pmel.noaa.gov/omsc/ - AniBOS OceanOPS dashboard - Need to translate QARTOD flags into GTS flags - ATN Seal of Approval ## Application of Cloud-Native Solutions to challenging datasets (Kelly Knee/Audra Luscher) - Reaching for the cloud - Decrease barriers for entry - Prototype roadmap - Direct access from S3 - Ingest - Storage and discovery - Processing - Analysis and Presentation - Case studies - NCDIS 40 year reanalysis - CO-OPS - Outlooks for sea level rise - Climatological - Gridded - National Coastal Data Informations System Multi-daecadal Year Reanalysis - Long term reanalysys of water levels and waves - Integrating models and observations to predict flooding between tide stations - Fills extensive gaps between observation locations - 500m grid - ~70 TB of storage - Monthly high tide outlook - Sea level trends - National water model - 5 GB per day, updated hourly - networked - 2.7 million reaches - Additionaly hydrologic information on 1km, 250m, and 100m grids - NODD S3 bucket - Lat/lon not within the existing files - Kerchunk to add lat long ## NCOS Coastal Reports and Ocean Finder: Real Time Geoanalytics of IOOS Data Streams in Support of Marine Spatial Planning (Rob Bochenek/Axiom) - If you end up in Anchorage, please drop by Axiom to visit or if you need a place to work - Coastal Reports - Building on Ocean Reports (previous tool) - Draw a polygon and produce analytics and reports - Ocean Finder - Spatial suitability analytics across user defined criteria across multidisciplinary datasets across the US EEZ - Jupyter lab, Spark, Sedona, FastAPI, Parquet, GeoTIFF, netCDF - H3 heiarchial, spatial indexing - triangles had 12 neighbors - squares have 8 neighbors - hexagons always have 6 neighbors and share edges and vertices - Aperture 7 mesh ## Extending data services beyond the US EEZ (Felimon Gayanilo, GCOOS) - Hard to know what is going on in the Gulf of Mexico without knowing what is going on around the Yucatan Peninsula ## GLOS Updates: Seagull Update/Overview + Technical Overview (Tim Kerns, Joe Smith/GLOS) - Wanted to build a tech platform ## HF Radar Range Series Archival Project Status Update (Shane St. Savage/Axiom) - Goals - Organize near-real time HF radar range series and instrument config files from all IOOS HF operators - New features - Data and config download via ERDDAP - Broke tables and had to switch to the lazy loading of directory only access - Next steps - Expand to more operators - Enable multi-file downloads with a UI tool to generate curl commands - Improve UI plot rendering perfomance - Make deployment more robust ## Serving QC solutions: New approaches (Eugene Burger/PMEL) - https://github.com/NOAA-PMEL/QCaaS ## Water Level QC AI (Lindsay Abrams/CO-OPS, James Spore, Hassan Moustahfid/IOOS) - Current QC workflow - Primary and backup sensors - Onsite QC - Preliminary - Manual processing and verification - Takes a lot of people and resources - Common issues - Spikes - Missing - Flats - Rare events - but actually legit, so should be identified as good - Tsunami - Ideal AI workflow - preprossing - classify good and bad using AI model - currently here - correct bad data automatically using secondary algorith - Model testings - Logistic regression - Random Forest - Gradient Boost - Neural Net - All had around 99.7% accuracy, except regression was around 98.9% - Neural network approach - Inputs - primary - backup - predictions - residual level - sigma water level... - two hidden layers - scikit learn, TensorFlow, Keras - Regional model training - Started building models for specific regions for better accuracy - NorthEast tends to be more accurate than all but Hawaii - NW did especially poorly - NE Model - loss stabilized - NW model - loss never stabilized - Some bad backup data (Tacoma) caused some overfitting - Takeaways - ML approaches can be used to classify good/bad water level data - Neural net model is best approach from testing so far - Quality of data used for training is very important - GPUs can be used to preform many experiments at once and accelerate AI/ML research - An automated AI-enabled QC application would reduce CO-OPS resouce requirements and could be used as a community tool - Next steps - Troubleshoot issues with AI model for classifying bad data - Bad data is rare - Make sure it's better represented in training datasets - Develop algorithms to correct bad data - Initial results are promising using a regression model - Using HPC - Converted to Parquet to work with RAPIDS - OpenACC Hack-a-thon allowed use of GPUs and HPC - Questions for crowd - Employed automatic QC? - Interested in a QC tool that automatically identifies and corrects bad data - What gap filling algorithms do you currently use? - Recomendations on AI/ML methods for gap filling - Opportunities to collaborate - Can we use your water level datasets to test our model ## IOOS/NOAA Cloud Sandbox Update (Patrick Tripp/RPS, Tiany C. Vance/IOOS) - Running regional coastal models in the cloud - Gulf of Maine is an OFS domain - Lynker is currently deploying the sandbox on NOAA's cloud ## IOOS Sensor Map ERDDAP Integration Status Update (Shane St. Savage/Axiom) - RA should have more control what shows in the map - Beta - https://beta.sensors.ioos.us/ - Limited to data served by RAs - may include other sources to augument spatial coverage - Discovered in IOOS catalog - Served via ERDDAP tabledap - Queries IOOS Ckan search daily - Harvests data every 15 minutes - Current status - Monthly processing reports emailed to RA data managers - ~12k stations where old one had ~46k - Transitioning back to using the IOOS catalog for most RAs - Stable - Kafka > TimescaleDB - Hex binning - Temporal binning in API endpoints - Comprehensive view of data but doesn't reflect RA curation - Next steps - Quality of life tooling for ERDDAP - docker-erddap - erddap-gold-standard - Reaching out to RAs to sort out ERDDAP service avaliablity or data issues - Improve operational status reporting - Coverage gap analysis ## ERDDAP update - New NDBC division chief - Meet the regions where they are - IT systems are frankenstein monsters - Need to get that fixed - Priorities for the next few years for ERDDAP development - Improve the testing workflow - Takes maybe a day to run - Sharing the private backlog of issues and request publicly - Wants to move to more standard libraries but need to be aware of memory usage since it's already very heavy - Reduce the complexity of the current codebase - Part of the role of the committee - Update directory structure to match other maven projects - ERDDAP for archiving - ERDDAP slaps a timestamp in NetCDFs on build - in the history - doesn't matter if the source data hasn't changed - RAs are supposed to generate netCDFs manually in the WAF - What does CIOOS need out of ERDDAP/what are their plans? - Friendlier UI and downloader ## Links - Quality Control as a service - https://github.com/NOAA-PMEL/QCaaS - ERDDAP util Tomcat log parsing - https://erddaputil.readthedocs.io/en/latest/tomtail.html - Axiom intake repos - https://github.com/search?q=org%3Aaxiom-data-science+intake&type=repositories - Intake ERDDAP - https://github.com/axiom-data-science/intake-erddap - Xpublish Intake serving - https://github.com/axiom-data-science/xpublish-intake - Intake Axiom - https://github.com/axiom-data-science/intake-axds - Intake CO-OPS - https://github.com/axiom-data-science/intake-coops - ERDDAP getting started - https://coastwatch.pfeg.noaa.gov/erddapinfo/index.html - Axiom hacking - https://github.com/axiom-data-science/hackathon-tabular - Axiom intake catalogs to load model data - https://github.com/axiom-data-science/mc-goods - Coastwatch satellite course? - https://coastwatch.pfeg.noaa.gov/courses/satellite_course_info.html - NCEI passive acoustic data - https://www.ncei.noaa.gov/products/passive-acoustic-data -