# CodeRefinery Workshop @ NTNU February 25-27 You find this page at **http://bit.ly/CodeRefineryFeb25**. This page is created from a common Markdown file. Anyone with the link can edit the page. ## Before we begin... https://coderefinery.org/workshops/2020-02-25-trondheim/ - Come on in, please try to sit next to someone, because we do lots of group work. - **Any installation issues? Please talk to us.** - Check [git configuration instructions (https://bit.ly/2wnTNV6)](https://coderefinery.github.io/git-refresher/01-setup/#configuring-git) link. - If you want, check the general [Git refresher (https://bit.ly/2VQExub)](https://coderefinery.github.io/git-refresher/). **Please test the following when arriving**: - Test that you have Git installed: `git --version` - Test that you have Python installed: `python --version` - Verify whether you can use both above commands in the same environment/terminal. ## Present Tor - Chemistry - Fortran / Python Bartul - Neuroscience - Python Nicholas - Neuroscience - Python/Julia/Fortran/C++ Yihan Cao - Statistics -R Anne - Geosciences - Python/Fortran/R Caitlin - Ecology - R Diogo Kramel - Industrial Ecology - Python Bård - NTNU-IT - python / perl Anders - Industrial Ecology - MATLAB / Learning Python Kristoffer Jan Zieba - Geoscience - Python/Matlab Sameer Hassan - Genomics and Transcriptomics - Python Roberto Agromayor - Energy Engineering - Python/C++/Matlab Jorge Mendoza - Structural Engineering - Matlab/Python Xiangping Hu - Industrial Ecology - Matlab/R/Python/Fortran/Julia/C/ShellScripting Tor Erik, Topological Data Analysis, Python *Tufan Arslan, Python/Fortran Asanthi - Energy & process tech - matlab/python Yuemin - Energy & process engineering - Matlab/Python/C/C++/java/ Tomasz - EPT -matlab/learning python Xingji - EPT -matlab/java/learning python Mojtaba - Python, R ## CodeRefinery Code Repository https://source.coderefinery.org/users/sign_in More information at https://coderefinery.org/repository/ ## h2 mkdir recipe cd recipe git init # Social Coding - Trondheim 2020 ### Exercise - sharing or not sharing? **Discuss in pairs or groups (5-10 minutes):** ## Group-1 #### Reasons for sharing your scripts/code/data - Debugging/finding issues - Fosters citing your work - Advertisment of our research #### Reasons for not sharing - Someone may steal the work legally - A job with many issues might damage your image - We may lose the opportunity to earn money #### Why is software often treated differently from papers? - Paper can be cited but software cannot be - ## Group-2 #### Reasons for sharing your scripts/code/data - peer review feedback - open for collaborations - to improve and proper documentation - Increasing the reproducibility of science #### Reasons for not sharing - Not publishing while it is not working well and documented. This harm the reputation to the project. #### Why is software often treated differently from papers? - No credit currently for coding - ## Group-3 #### Reasons for sharing your scripts/code/data - increase the popularity of your algorithm - Reproducibility - Saves time - Accountability - Looks good on the CV / job market #### Reasons for not sharing - Others can profit from it sooner than you - Fear of being found out for the monster that you are - More work (people ask questions) #### Why is software often treated differently from papers? - ## Group-4 #### Reasons for sharing your scripts/code/data ### Group/lab: - Save time (not done twice) - Help ### Public: - Others can build on our work - Transparency - Robust - Direct feedback - Increases citations #### Reasons for not sharing - Stealing, theft, uncited copying #### Why is software often treated differently from papers? - Peer review ## Group-5 #### Reasons for sharing your scripts/code/data - recognition - reproducibility of research #### Reasons for not sharing - protecting IP - security - commercial collaboration #### Why is software often treated differently from papers? - tool vs method # Exercise - What contributes to reuse? ## Group-1: - debuging and documentation - demonstration with show cases - technical support - ## Group-2: - conduct workshops - proper documenttaion - publish in conference or journals - add section how to contribute to the project ## Group-3: - Documentation with examples - Ease of use - Modular ## Group-4: - Documentation, including examples - Adaptable and modular code ## Group-5: - a friendly code of conduct - the founder(s) need to devote a lot of time initially - # Exercice - Which of these is derivative work? - Download some code from a website and add on to it - Download some code and use a function in your code - Changing the code - Extending the code - Completely rewriting the code - Rewriting the code to a different programming language - Linking to libraries (static or dynamic), plug-ins, and drivers - Clean room design (somebody explains you the code but you have never seen it) - You read a paper, understand algorithm, write own code # Exercise - Licenses - What is the StackOverflow license for code you copy and paste? - A journal requests that you release your software during publication. You have copied a portion of the code from another package, which you have forgotten. Can you satisfy the journal's request? - You want to fix a bug in a project someone else has released, but there is no license. What risks are there? - How would you ask someone to add a license? - You incorporate MIT, GPL, and BSD3 licensed code into your project. What possible licenses can you pick for your project? - You do the same as above but add in another license that looks viral. What possible licenses can you use now? - Do licenses apply if you don't distribute your code? Why or why not? - Which licenses are most/least attractive for companies with proprietary software? # Day 2 - ## Modular Code Development We do the group discussion in new groups. The groups are sorted according to programming language (almost). ### Group 1 (Fortran) #### What best practices can you recommend to arrive at well structured, modular code in your favourite programming language? - Divide the code; have a clear strategy for routines and subroutines - Small main program - Functions in separate files - Start early with make file and changelog (git or other) #### What would you recommend your colleague who starts in the same programming language? - Look at best practices - Use git - Be aware of available tools - Understand how different languages interact - Be aware of license issues for compilers - Get an overview of useful libraries to reduce work time and complexity - If working on existing code -> learn tree structure #### How do you deal with code complexity in your projects? - See best practices - Use a propper de-bugger - Good comments in source code - Know program tree structure - Clear definition of variables ### Group 2 (Matlab) - What best practices can you recommend to arrive at well structured, modular code in your favourite programming language? -- Use matlab function, class, & structure -- specific comments in sections -- help %% -- Careful planning before start coding -- Use small functions -- Use something that already exists -- Use version control For Python here: -- Dig existing libraries. There are huge number of packages, just use them to start. -- Use web-based GUI, such as Jupyter -- Pack the intermediate code when it is ready -- - What would you recommend your colleague who starts in the same programming language? -- Use cell (%%) for intermediate debugging -- To build a function, begin with a script and make it a function once a working version is achieved. -- Start with simple examples and get hands dirty -- Use toolbox of Matlab -- - How do you deal with code complexity in your projects? -- Use modular coding and start with simple functions -- Good comments for the code -- Report your work with a diary -- Prepare a document of developed functions, like input, output, data structure... -- Visual diagram showing the dependancy of functions. Is there any tool for this? https://www.mathworks.com/help/matlab/matlab_prog/identify-dependencies.html -- Use source insight to read complex matlab code -- Get help from experts on Matlab ### Group 3 (R) - What best practices can you recommend to arrive at well structured, modular code in your favourite programming language? * Create R Projects * Integrate with Git * Use R libraries * Use R markdown for documentation * Use formatR and/or lintr - What would you recommend your colleague who starts in the same programming language? * Take an actual course in R if at all possible * Read other people's code (choose the right people) * Attend R meeting * Detailed documentation - How do you deal with code complexity in your projects? * Write functions for tasks you will repeat * Write packages to contain a related set of functions * Good documentation ### Group 4 (Python) - What best practices can you recommend to arrive at well structured, modular code in your favourite programming language? 1. get hands on writing the code to get the results, then start changing it to the functions and making it well structured. 2. get a project leader who structures the project and split it into small functions/tasks before the project starts 3. make functions, then modules, separate modules into different files, and pack all into a package - What would you recommend your colleague who starts in the same programming language? 1. write reusable functions 2. start with small chunks of code 3. search solutions to errors on web such as stackflow 4. document what the code is doing and why so 5. give meaningful names to variables/functions... 6. Use PEP8 style guide in e.g. PyCharm 7. Use debugger in e.g. Pycharm - How do you deal with code complexity in your projects? ### Group 5 (Python) - What best practices can you recommend to arrive at well structured, modular code in your favourite programming language? 1. try to make functions for *everything* 2. naming and commenting, camelcode for classes and underscores for functions (aka stick to some principle) - What would you recommend your colleague who starts in the same programming language? 1. Stack Overflow! 2. Good tutorial pages, e.g. The Real Python 3. Re-implement functions from other prog. languages - How do you deal with code complexity in your projects? 1. divide into functions, modules, packages # Github username - annefou - arslantuf - bartulem - torlarse - tommuszy - rezasabzi1986 - jorgemendozaesp - caitlinmandeville - AsanthiJ - shelly77 - xiangph - DiogoKramel - btesaker - sameerh - torhaugl - IoELab - AndersArvesen - yuxingji - nichchris - MortezaHaddadi - codeseyed # Good practices,tools and books ## Code-completion engine for vim https://github.com/ycm-core/YouCompleteMe ## The Pro Git book https://git-scm.com/book/en/v2 ## The Architecture of Open Source Applications http://aosabook.org/en/index.html ## Understanding pip and conda https://www.anaconda.com/understanding-conda-and-pip/ ## Linking a pull request (PR) to an issue: https://help.github.com/en/github/managing-your-work-on-github/linking-a-pull-request-to-an-issue ## Where to get help https://innsida.ntnu.no/wiki/-/wiki/English/IT-support+for+PhD+students+-+Mimes+Brønn ### Emails - help@hpc.ntnu.no. You will get a ticket number. - orakel@ntnu.no, write that the email is to "IT Utvikling Forskningsstøtte". You will get at ticket number. - research-data@ntnu.no Zulip, https://coderefinery.zulipchat.com # Jupyter ecosystem ## Group-1 - allows for programming to be combined with a "narrative" - good for teaching, presenting data, testing & debugging, widgets - useful for data science ## Group-2 - Show code for presentation purposes (educational) - Supports markdown, equations and also Julia/Python/R - To quickly test commands instead of REPL - It's in a web browser and looks nice - Nicely integrates with GitHub and similar - For smaller codes (Don't make a huge code in Jupyter!) - Reproduicble and easy to follow research ## Group-couch - Easy way to visualize and give explanations - Simple tool for visualizing plots and simple small scripts without having many windows open (and having to close them...) - Awesome for teaching situations