# Welcome CLIVAR CMIP6 Bootcamp https://hackmd.io/@pangeo/clivar-2022 ![](https://i.imgur.com/fIHHpD1.png) ###### tags: `clivar` `pangeo-data` Thank you for joining the CLIVAR CMIP6 Bootcamp! We're delighted to have you here 🎉 **What?** [Pangeo](https://pangeo.io/) is a community-centric initiative that promotes open, reproducible, and scalable geoscience. ***All questions, comments and recommendations are welcome!*** ### Code of conduct :heavy_check_mark: * [Take a moment to read this](https://github.com/pangeo-data/governance/blob/master/conduct/code_of_conduct.md) ### Sign-up :pencil: **Name + an emoji to represent your mood today ([emoji cheatsheet](https://github.com/ikatyang/emoji-cheat-sheet/blob/master/README.md))** *(Remember that this is a public document. You can use a pseudonym if you'd prefer.)* * Anne Fouilloux, Simula Research Laboratory, Norway, @AnneFouilloux, 😃 * Tina Odaka, UMR-LOPS IFREMER France, :sunflower: * Marion, Princeton University, United States, :face_in_clouds: * Ruth, DMI :pensive: * Jakob, Geophysical Institute, University of Bergen 🌧️ * Phoebe, University of Edinburgh :yawning_face: ### Course material - [Introduction to Pangeo ecosystem](https://docs.google.com/presentation/d/1XB9jmKlPnyAtUWRG_xzGC9h3qn_88gVSegOI3uDcaKo/edit?usp=sharing) - [Pangeo Tutorial at CLIVAR CMIP6 Bootcamp](https://pangeo-data.github.io/clivar-2022/intro.html) with all the Jupyter notebooks in [`tutorial/pangeo101` folder](https://github.com/pangeo-data/clivar-2022/tree/main/tutorial/pangeo101). ### github organisation to develope our code together : https://github.com/orgs/clivar-bootcamp2022/ ### Table that each workgroup need to fill in | Name | Input data missing in cloud | tmp data size you need (cloud backet size for your work group) | Any guess for computation size? | | ---- | --------------------------- | -------------------------------------------------------------- | ------------------------------- | |waffles (Phoebe, Jakob, Marion)|OMIP2 - Omon/SImon/Ofx - gn - areacello, deptho, mlotst, siconc, so, thetao, umo, vmo, zos [1958-2018, only (the last 61 years for each model)] | 1.2 TB | *maybe ...* | |BLWG (Ali D, Robbie, Vicki, Antonie, Julia) | All **historical:** </br> 3hr: tslsi,hfss, rlus, rlds, clwvi. </br> CFday: clwvi </br> LiMon: snc, sftflfl, snd, siitdsnthick, tsn (and dtesn?), sitepsnic </br> 6hrPlevPt: ta, ts, tas </br> SIday: siconca sithick sisnthick exchanges (ocean) | HighResMIP - </br>hist-1950,highres-future :</br>Monthly Data (Omon) </br> Models - </br>1. HadGEM3-GC31 (LL,MM,HM,HH)</br>2. ECMWF-IFS (LR,MR,HR) </br>3. CNRM-CM6 (LR,HR) </br>4. EC-Earth3P (LR,HR) </br></br> 3D Variables: uo, vo, thetao, so, thkcello </br> </br>2D Variables : sos, tos, seaice (simass, sivol, sithick, siconc), zos | 1.5 TB |Storylines | mlotst (Omon) (not available for all models) - need at least one realisation for historical, ssp370 and ssp585 for as many models as possible (available from the full ESGF archive). Other Omon variables requested for upload to cloud storage accessible using Pangeo: Primary productivity (e.g. chl, chlos, intpp, phycos, pp, ppos), ph, dfe, no3, si, po4)| | | | GREENLAND | | | | |PAMIP | | | | |MOSAIC | | | | | | | | | ### Q&A :question: *(Add here any question or issue you might need assistance. Feel free to put it below* Question from Robbie: My understanding was that if you call: ```python from netCDF4 import Dataset d = Dataset('mydata.nc') chunk_of_data = d['surf_temp'][10:20] ``` Then you import a chunk of size 10. That is to say, the netcdf Dataset function does lazy loading, and doesn't load all of mydata.nc into RAM. So something like: ```python for i in range(0,200,10): chunk_of_data = d['surf_temp'][i:i+10] some_analysis_func(chunk_of_data) ``` Basically does a chunked load and processes mydata.nc in chunks of 10. So what's the point of zarr? [end_question] [Answer from Tina] The point of zarr is that it is natively chunked, thus much easier and simpler than using netcdf to create a chunk'ed output, that each dask worker can read those chunk later. Each chunk are in different files. Thus it can avoid concurent access. It is like Xarray simplifies the indexing than doing computation just with numpy, zarr simplifies your parallel IO in python framework than using NetCDF. ### Feedback Please provide some feedback (one up and one down): #### What went well: + #### What could be improved - > *This HackMD is adapted under a CC-BY license from [_the EDS_ show and tell template](https://github.com/alan-turing-institute/environmental-ds-book/blob/master/book/community/templates/template-coworking-showtell.md)*