# Container Camp, Day 1 - March 15th ---- ## Topic: Introduction to Containers **Container Camp Materials/Useful links** - Course Homepage - [https://learning.cyverse.org/projects/cyverse-container-camp/en/latest/index.html](https://learning.cyverse.org/projects/cyverse-container-camp/en/latest/index.html) - Course Schedule - [https://learning.cyverse.org/projects/cyverse-container-camp/en/latest/agenda.html](https://learning.cyverse.org/projects/cyverse-container-camp/en/latest/agenda.html) - Containers vs. Images: - [https://phoenixnap.com/kb/docker-image-vs-container](https://phoenixnap.com/kb/docker-image-vs-container) - Cleaning up after Docker (this is for on your own, don't worry about this with Atmosphere): - [https://www.digitalocean.com/community/tutorials/how-to-remove-docker-images-containers-and-volumes](https://www.digitalocean.com/community/tutorials/how-to-remove-docker-images-containers-and-volumes) --- ### Discussion and Notes: **General notes:** Please Introduce Yourself below *(Name, Institution, What would you like to use containers to do?, Strangest Class You Took in Undergrad)*: - Ryan Bartelme, CyVerse, make reproducible workflows for scientific analyses, Vampires in Literature & Film - Tyson Swetnam, CyVerse, all the things! Exploration of Mars. - Tina Lee, CyVerse-support team for Camp, what are containers?, college was sooo last century I can't recall. - Mery Touceda-Suárez, UofA, I would like containers to simplify my life when I use software from other people, strangest class: honestly my undergrad didn't allow me to take any cool classes, but I did a UN modeling course that was very bizarre - Mike Trizna; Smithsonian Institution; I would like to use containers to make workflows reproducible -- and easily parallelized; Insects in Human Society - Alice Jacques, NSF's NOIRLab, analyze scientific data easily, Into the Wilderness - Michael Huggins, Cal Poly, Make reproduceable tools, make containers in which troublesome libraries can be installed, trampoline. - Andrew Antaya, Research Specialist, School of Natural Resources UArizona, I would like to learn how to setup and run containers in the cloud (e.g., Cyverse) Strangest class: Intermediate Free-Heel Skiing (Cross-Country and Telemark Skiing) - Rhiannon Peery, UAlberta, Longevity of data/analyses in our labs workflows, strangest class was probably economics - as a field biologist I just didn't get it - Bill Morgan, College of Wooster (OH), Use Docker container to run circlator to assemble mtDNA genomes, microbial genetics (for its format) - Chris Weber, School of Government and Public Policy, UA. Share data workflow, particularly in R, and facilitate collaboration with coauthors and others. Create a data analysis tool/environment for PhD methods seminars. Strangest class: Applied learning theories in psychology. I trained chickens to do tricks using schedules of reinforcement. - Travis Simmons, The College of Coastal Georgia / UofA, I would like to use containers to share geospacial processing tools, my strangest class was a fisheries methods course! - Matt Aiello-Lammens, Pace University, I would like to continerize some of my analyses and work flows, and get a general sense for containers as well; strangest class was The American Radical Tradition - Jinlong Ru. Helmholtz Center Munich. Use container to package and share my pipeline. I took standard classes. - Artin Majdi, Data7, Learn about the use of containers within the CyVerse infrastructure. - Nathalia Graf-Grachet, UofA going into industry soon, containarize analysis pipelines & reproducibility. My undergrad classes were standard and all ag. related, sorry :( - Sierra D. Miller, National Institute of Standards and Technology, reproducible workflows for standard pipelines and datasets, no clue - I took pretty standard classes - Daniele Filiault, Gregor Mendel Institute, increase reproducibililty of work for collaboration (sharing pipelines), either cross country skiing or Francophonie in Switzerland - Kristina Riemer, University of Arizona, I already use containers in my work but need to understand them better to be able to fix problems and bugs, my intro stats professor had multiple raps she would do to teach us different concepts - Christian Ayala, University of Arizona, I would like to use containers to package scripts and data anaylsis pipelines so they can be used multiple times with different datasets. Strangest class: Cinema appreciation - Jessica Guo, UofA digital agriculture group, reproducible workflows for scientific publications, Masterpieces of Western Music - Chris Langfield, Columbia U Medical Center; package scripts, data, and results for easy sharing; General Relativity (I got a 27 on the final) - Laura Timm, University of Alaska Fairbanks/NOAA AFSC Auke Bay Lab Genetics, interested in integrating containers into pipeline development (using genomics to support stock assessments and fisheries management) and understanding how to access existing containers, strangest class: Texas A&M "Fish Camp", where incoming freshman learn about all the Texas A&M Aggie traditions - Matt Bunting, University of Arizona, I want to learn how to containerize everything! Strangest class: Reverse Engineering the Fly - Arminda Estrada, University of Arizona, run simulator, Popular Music in America - Tim Bailey, Watershed Research and Training Center, running large Geospatial workflows with containers. Particularly for Forestry applications. I had a Cosmology and Culture class that got very weird. Good but Weird. - Minh Tran, University of Arizona. I'm currently working on a project including a large amount of data stored on Cyverse and I've been struggling because I have no experience with Cyverse before and I'm not comfortable with it. I would like to get more familiar with Container Technology and I want to make reducible workflows. The strangest class that I took could probably be Animal Learning. - Kristen Wade, University of Colorado-Anschutz medical campus, I have a genome analysis pipeline that is a combination of multiple custom scripts and I would like to learn how to use Docker to create a reproducible workflow that I can publish with the associated manuscript. My strangest undergrad class was Russian Literature. - Nick Bielski, University of Arizona, I would like to use containers to package up snippets of code into a pipeline, Introduction to Film - Eric Sokol, National Ecological Observatory Network (NEON), Battelle, Boulder, CO. I'm interested in learning how to use containers because we are using them in our data data data data data data publication pipeline. Strange class is cult archaeology. - Kenneth (Kenny) Acosta, Rutgers University, I would like to use containers to make my research easier to reproduce, elementary french - Shannon Quinn, University of Georgia. I want to containerize all the research I'm doing, both for reproducibility and for creating larger applications consisting of multiple different containers. The strangest class I ever took was a modern philosophy course - Hd of new everything in computational - Jonathan Sprinkle, Associate Professor at University of Arizona. Our projects are storing significant amounts of data on CyVerse now, and we're ready to start containerizing and porting our on-laptop research code to be crontabbed and containerized! Strangest class I took in undergradate...probably an honors colloquium on outdoors food/wilderness survival. - Celine Caseys, University of California Davis, I would like to learn how to stabilize packages for R and python by using dockers. weirdest class was behavior ecology where the prof was explaining seduction technics in some insects. - Roman Golota, University of Arizona, easily reproducible workflows and containers on cyverse. - Safwan Elmadani, University of Arizona, I would like to use the containers to deploy applications on cyverse. - Joseph Ahrens, University of Colorado Anschutz Medical Campus, Release Codebases for public use without people needing to worry about a ton iof obscure dependencies. I took a class called "Big Bang, Black Holes, No Math" where we learned about the general ideas behind supersymmetry, quantum mechanics, dark matter, and expanding spacetime from an MIT physicist without getting into differential equations, etc.e erfe erf - Arpit Mishra, University of Washington Seattle Campus, use containers to make an working example with small set of user case data for publication , - Hendrick, Taipei Medical University, learning how to deploy application into container, - Dajiang Ding, Smithsonian Institution, system admin, not a data scientist but like to learn how container technology are used for data science - Marc Crepeau, University of California at Davis. Just trying to stay relevant at my UCD staff job. I remember when I thought I was going to be a biologist and now I spend all my time trying to get computers to do stuff. As an undergrad I took a tractor driving course. - Sylvain Korzennik - Center for Astrophysics | Harvard & Smithsonian. Learner programming using punch cards ;-) 30+yrs of HPC experience (some of the machines I used are now museum pieces). - Kulbir Kaur, Columbia University Medical Center. Use containers for data sharing and analysis. --- ### Questions for Nirav: *Please post any questions you have for Nirav in this section.* - While container perfo > VM, isn't the container perfo < native mode, and by what approx factor? Answer: [ Evaluation of Docker Containers for Scientific Workloads in the Cloud ](https://dl.acm.org/doi/pdf/10.1145/3219104.3229280) See the Conclusion section bullet #2 - Are there tools for scanning a container for malware? Or do you just run normal malware-scanning software inside the container? Answer from Edwin (CyVerse): [Docker Hub](https://hub.docker.com) and [Quay](https://quay.io), two of the most popular public container image registries, provide security scanning for images that are uploaded to their sites. Here are articles that provide details about how to enable or use their security scanning features: https://developers.redhat.com/blog/2019/06/26/using-quay-io-to-find-vulnerabilities-in-your-container-images/ https://docs.docker.com/docker-hub/vulnerability-scanning/ There are also tools to scan your container images without using Docker Hub and Quay, including [Anchore](https://anchore.com/), [Clair](https://github.com/quay/clair), and [Trivy](https://github.com/aquasecurity/trivy). It seems like there are new container-based security scanning software being developed all the time. Each solution seems to take a different approach to security scanning. So, you might need to experiment in finding the tool that works for your workflow. The easier tools to use are Anchore, which can be used as a container itself, and Trivy, which can be installed by a package manager. When using Singularity, there is built-in integration with Clair using Singularity's tools. Information about Singularity tools can be found here: https://github.com/singularityhub/stools - Do you have resources for GPU and containers? Aiming to stabilize R modules using the nvidia cuda Yes, contact Tyson on Slack, or email tswetnam@cyverse.org for details. --- ## What does Reproducibility mean for your research? - Porting my software to HPC and cloud - Less installation for users attempting to reproduce an analysis - To check someone else's work, see if you can do it yourself and get the same results - Reproducibility is being able to get similar results or outcomes without having the specific conditions that are present in the first instance. - When other people are able to get the same results executing the same set of instructions on the same data. - Being able to download, install and execute another lab's code and run on a provided test dataset (or the data published with the manuscript) to get the exact same result as the lab that published the code - If I can't get the same results you did, following your recipe, I am inherently mistrustful of your results. I expect my science to be held to the same standard. - Ability for someone to redo the experiment, whether that be analytical or other methods - One big aspect of reproducibility for me is being able to help colleagues troubleshoot errors. I often cannot reproduce their environments or data. - Being able to use the same data that was used in an analysis, follow the same analysis steps, and get the same quantitative answers (and perhaps qualitative answers too) - Honestly, for my own science it isn't a big deal right now. But I often read science that I don't fully understand, and the best way for me to understand it is to actually see the full computational methods, which I can't do if they aren't provided. - Sufficient information to reproduce previous research findings. This includes information about procedures, materials, etc. - To be able to redo analysis from others - Maintaining identical or comparable performance on different machines. - Our goal is to be able to do the same computations on different machines and get the same results. Some of the challenges we'd address are to speed up processing (or distribute it) so that we can achieve this goal. - I'm worried about the future- what it means to reproduce or replicate our workflow 5, 10, 50 years down the road. - Being able to hand over a tool and data to a user and they get the same results as I do - Having the entire scientific workflow in a paper that is replicable (at least in the analysis) - Other people can actually reproduce what I can actually reproduce - Faster, easier, and more accurate science. Data cleaning and analysis not being restricted to only one person. Easier to go back and reuse previous work. - Allowing data and code to produce the same outcome between researchers - Able to reproduce by independent labs - As a staff scientist at NEON, I want to make workflows open and accessbilt to data users, and facilitate sharing of workflows among users. As an ecologist, I want to be able to share analysis workflows I create so others can assess (e.g., in peer review) and use in their science. - It means that the same results can be obtained by whoever reanalyze the same dataset. --- **Breakout notes:** --- ### Homework - Poke around in Atmosphere, find a container and run it--bring your questions for tomorrow. - fyi, a common error for folks who are following along with connecting to Atmosphere and using the mac terminal or windows wsl is that their cyverse username is different from that on their laptop. To properly connect, they will need to use `ssh yourusername@128.196.x.x` e.g.: `ssh edwins@128.196.142.1` ----