Programming formalisms for life scientists and bioinformaticians

# Programming formalisms - [Course website](https://uppmax.github.io/programming_formalisms/) - [Meeting notes](https://uppmax.github.io/programming_formalisms/meeting_notes/) ## Invitation ### Living document Feb 12 version Richel The goal of this highly-interactive 5-day course is to be able to develop academic software that you can trust to be 'good enough'. We assume you have written code 'that (sometimes) just works'. The course follows a formal development process from start to finish, with a selection of topics and best practices we think are most important, with the goal of developing academic software that is actually good enough. - When: May 5-9 from 9:00-16:00 each day - Where: online, Zoom links will be sent out - [Course material](https://uppmax.github.io/programming_formalisms/) - [More info and registration (soon to open)](https://docs.uppmax.uu.se/courses_workshops/programming_formalisms/) ### Living document Feb 12 (First update) This full 5-day course aims to give scientists, bioinformaticians and other research engineers with some experience in programming and scripting an understanding of the underlying principles of software development, design, and programming. The course aims to strengthen the understanding of more advanced programming concepts, ability to produce more reusable scripts through modular programming and to enable a better understanding of how to evaluate a script or programs performance. We will cover a formal development process form start to finish. We use Test-Driven Development as a good example of a development process, requirements modeling, risk assessment and structured design. Some of the topics covered are modular development and (code) reusability, testing and optimization. We will cover theory with bridging practical examples and applications to enhance the theoretical understanding of the principles. When: May 5-9, at 9:00-16:00 [More information and registration (opens soon)]( https://docs.uppmax.uu.se/courses_workshops/programming_formalisms/). ### Old This full 5-day course aims to give scientists, bioinformaticians and other research engineers with some experience in programming and scripting an understanding of the underlying principles of software development, design, and programming. The course aims to strengthen the understanding of more advanced programming concepts, ability to produce more reusable scripts through modular programming and to enable a better understanding of how to evaluate a script or programs performance. We will cover a formal development process form start to finish. We use Test-Driven Development as a good example of a development process, requirements modeling, risk assessment and structured design. Some of the topics covered are modular development and (code) reusability, testing and optimization. We will cover theory with bridging practical examples and applications to enhance the theoretical understanding of the principles. When: November 18-22, onboarding: November 15 For more information and registration, please visit: https://www.uu.se/en/centre/uppmax/study/courses-and-workshops/programming-formalisms. ## arbetsdok för projekt ### Beskrivning #### Brief ```text Do some analysis on data from an Uppsala weather station. ``` Details - You have a temperature data file (Uppsala) - Build a program that performs and presents some analysis - graph of running means - other statistics - daily maximums/min - yearly maximums/min - meta data extraction - name of station - dates , times and varible from which column - presentation - header of graphs (name) - axis labels - When the program is run, user should be able to decide which analysis to perform, for instance wit arguments from the commandline. - How? - modular design - Different teams get different tasks focus - Start with Uppsala file - If time allows make program more general: - make possible to read other files (defined by) - there is a document that connects filenames to stations - argument can be station name and the program reads the right file #### The outline of the project ... - needs analysis and risk assesment - uses Python and matplotlib plot the data - uses `scipy` and `numpy` alternatively writing the functions from scratch, depending on the needs in Algorithms and TDD sections. - uses modular design to be able to swap out the prediction functions. - uses object oriented design and development #### Technical requirements possibility for different file names and arguments defining what program should do. modularity seems interesting --> modularity session split and mentioned earlier in the project start previous brief: !!! info "The brief" We want to see (an approximation and cleanly looking) hourly daily temperature data from Uppsala, using a long-lasting dataset that recorded the temparature a couple of times per day. It needs to be approximate, as there are only few recordings per day. - It needs to be cleanly looking as [Björn takes over here!] - We want to have the average curve be displayed to reduce noise of visualization. - This project is based of the works of Bergström and Moberg [Bergström and Moberg 2002](https://www.smhi.se/polopoly_fs/1.175744!/Bergstr%C3%B6m_Moberg_Uppsala.pdf) The outline of the project ... - needs analysis and risk assesment - uses Python and matplotlib plot the data - uses `scipi` and `numpy` alternatively writing the functions from scratch, depending on the needs in Algorithms and TDD sections. - uses modular design to be able to swap out the prediction functions. - uses object oriented design and development opendata-download-metobs.smhi.se/ portal to get station data: https://www.smhi.se/data/meteorologi/ladda-ner-meteorologiska-observationer/airtemperatureInstant/97530 smhi-opendata_1_97530_20250122_131109.csv Example from Uppsala airport with data and meta data in same file. However, in Swedish Questions for team(s) OK, we know that we should do something with the data (we give hints) What data do we have? How to read it? What about quality flags? What to include? Is reformatting needed? How can we get information about station name and label our results accordingly? Which functions needed? Modules that do different statistics? Use many/all stations to find dependence on latitude altitude? Needed from us? Translation of metadata or have one Swedish speaker in each group Cons filename does not tell the station name ugly files with metadata in same file Pros meta content in same file more possible analyses to be made with several stations Suggestion from Björn Many raw files (perhaps not all stations in Sweden) collected in a data folder Different teams get different tasks focusing on different modules/analyses In the end build a main program which user run with arguments to perform the needed analysis I am happy there is some love for the project, in our efforts to make a simpler student project. However, I do wonder we are falling in the trap to making it complex again :-) Many raw files (perhaps not all stations in Sweden) collected in a data folder Seems great to me! We start with the simplest files :-) Different teams get different tasks focusing on different modules/analyses Sounds good. It is close to our use of issues 👍 In the end build a main program which user run with arguments to perform the needed analysis Can do. We do not discuss program arguments yet anywhere and I will volunteer to put some time in that if it does not fit better elsewhere. ### Design https://github.com/UPPMAX/programming_formalisms/commit/1349dc589d9dfde13cc39087216ee3e77f99bfdf !**Answer** First! we ask ourself what is the scope and magnitude of this project - This is a small project expected for the team to finish in about one week - The project will be driven and implemented by a small team of distributed developers. - the aim of the project is to teach the team the SDLC process **conclusion**- The project needs and goes through more formalized steps than a regular project of similar size. The projects needs a minimal risk assessment and and needs determination Second! Determination of business and legal space. The project operates inside an open non-profit open source scope, The Data Protection Impact assessment is not needed for this data set since it is open and readily available , it non sensitive non personal data with minimal economic impact. This project is conducted inside a educational setting in Sweden and therefor the legal space is simple. With that out of the way(Usually not conducted for small project but included here for completeness) Third! first iteration of Needs gathering The brief gives us - make graph of running means - other statistics - daily maximums/min - yearly maximums/min - meta data extraction - name of station - dates , times and varible from which column - presentation - header of graphs (name) - axis labels Needs/requirments matix: Requirement ID | Requirement Description | Acceptance Criteria | Test Cases -------------|------------------------|-----------------|-----------------------| R1|Open source development|the development process follows the principles of open source development| In each step, evaluate the accessability and openness R2|Analysis|the program performs and presents some analysis related to temperature|| R3|File IO|The software should be able to read and parse the data in the SMHI station data csv files| forth! Risks Remember to think of both primary and secondary/derived risks We start looking at the Risk involved in the software project. -Over engineering - In this kind of small project the risk of a project failing is increased with the risk of making the solutions significantly more complicated than they need to be -risk of Fifth! second iteration of needs analyzis

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.