The goal of this highly-interactive 5-day course is to be able to develop academic software that you can trust to be 'good enough'. We assume you have written code 'that (sometimes) just works'.
The course follows a formal development process from start to finish, with a selection of topics and best practices we think are most important, with the goal of developing academic software that is actually good enough.
This full 5-day course aims to give scientists, bioinformaticians and other research engineers with some experience in programming and scripting an understanding of the underlying principles of software development, design, and programming. The course aims to strengthen the understanding of more advanced programming concepts, ability to produce more reusable scripts through modular programming and to enable a better understanding of how to evaluate a script or programs performance.
We will cover a formal development process form start to finish. We use Test-Driven Development as a good example of a development process, requirements modeling, risk assessment and structured design.
Some of the topics covered are modular development and (code) reusability, testing and optimization.
We will cover theory with bridging practical examples and applications to enhance the theoretical understanding of the principles.
When: May 5-9, at 9:00-16:00
More information and registration (opens soon).
This full 5-day course aims to give scientists, bioinformaticians and other research engineers with some experience in programming and scripting an understanding of the underlying principles of software development, design, and programming. The course aims to strengthen the understanding of more advanced programming concepts, ability to produce more reusable scripts through modular programming and to enable a better understanding of how to evaluate a script or programs performance.
We will cover a formal development process form start to finish. We use Test-Driven Development as a good example of a development process, requirements modeling, risk assessment and structured design.
Some of the topics covered are modular development and (code) reusability, testing and optimization.
We will cover theory with bridging practical examples and applications to enhance the theoretical understanding of the principles.
When: November 18-22, onboarding: November 15
For more information and registration, please visit: https://www.uu.se/en/centre/uppmax/study/courses-and-workshops/programming-formalisms.
Do some analysis on data from an Uppsala weather station.
Details
scipy
and numpy
alternatively writing the functions frompossibility for different file names and arguments defining what program should do.
modularity seems interesting
–> modularity session split and mentioned earlier in the project start
previous brief:
!!! info "The brief"
We want to see (an approximation and cleanly looking)
hourly daily temperature data from Uppsala,
using a long-lasting dataset that recorded the temparature
a couple of times per day.
It needs to be approximate, as there are only few recordings
per day.
It needs to be cleanly looking as [Björn takes over here!]
We want to have the average curve be displayed to reduce noise of visualization.
This project is based of the works of Bergström and Moberg
Bergström and Moberg 2002
The outline of the project …
- needs analysis and risk assesment
- uses Python and matplotlib plot the data
- uses `scipi` and `numpy` alternatively writing the functions from
scratch, depending on the needs in Algorithms and TDD sections.
- uses modular design to be able to swap out the prediction functions.
- uses object oriented design and development
opendata-download-metobs.smhi.se/
portal to get station data: https://www.smhi.se/data/meteorologi/ladda-ner-meteorologiska-observationer/airtemperatureInstant/97530
smhi-opendata_1_97530_20250122_131109.csv
Example from Uppsala airport with data and meta data in same file. However, in Swedish
Questions for team(s)
OK, we know that we should do something with the data (we give hints)
What data do we have?
How to read it?
What about quality flags? What to include?
Is reformatting needed?
How can we get information about station name and label our results accordingly?
Which functions needed?
Modules that do different statistics?
Use many/all stations to find dependence on latitude altitude?
Needed from us?
Translation of metadata or have one Swedish speaker in each group
Cons
filename does not tell the station name
ugly files with metadata in same file
Pros
meta content in same file
more possible analyses to be made with several stations
Suggestion from Björn
Many raw files (perhaps not all stations in Sweden) collected in a data folder
Different teams get different tasks focusing on different modules/analyses
In the end build a main program which user run with arguments to perform the needed analysis
I am happy there is some love for the project, in our efforts to make a simpler student project.
However, I do wonder we are falling in the trap to making it complex again :-)
Many raw files (perhaps not all stations in Sweden) collected in a data folder
Seems great to me! We start with the simplest files :-)
Different teams get different tasks focusing on different modules/analyses
Sounds good. It is close to our use of issues 👍
In the end build a main program which user run with arguments to perform the needed analysis
Can do. We do not discuss program arguments yet anywhere and I will volunteer to put some time in that if it does not fit better elsewhere.
https://github.com/UPPMAX/programming_formalisms/commit/1349dc589d9dfde13cc39087216ee3e77f99bfdf
!Answer
First! we ask ourself what is the scope and magnitude of this project
- This is a small project expected for the team to finish in about one week
- The project will be driven and implemented by a small team of distributed developers.
- the aim of the project is to teach the team the SDLC process
**conclusion**- The project needs and goes through more formalized steps than a regular project of similar size.
The projects needs a minimal risk assessment and and needs determination
Second! Determination of business and legal space.
The project operates inside an open non-profit open source scope, The Data Protection Impact assessment is not needed for this data set since it is open and readily available , it non sensitive non personal data with minimal economic impact.
This project is conducted inside a educational setting in Sweden and therefor the legal space is simple.
With that out of the way(Usually not conducted for small project but included here for completeness)
Third! first iteration of Needs gathering
The brief gives us
- make graph of running means
- other statistics
- daily maximums/min
- yearly maximums/min
- meta data extraction
- name of station
- dates , times and varible from which column
- presentation
- header of graphs (name)
- axis labels
Needs/requirments matix:
Requirement ID | Requirement Description | Acceptance Criteria | Test Cases
-------------|------------------------|-----------------|-----------------------|
R1|Open source development|the development process follows the principles of open source development| In each step, evaluate the accessability and openness
R2|Analysis|the program performs and presents some analysis related to temperature||
R3|File IO|The software should be able to read and parse the data in the SMHI station data csv files|
forth! Risks
Remember to think of both primary and secondary/derived risks
We start looking at the Risk involved in the software project.
-Over engineering - In this kind of small project the risk of a project failing is increased with the risk of making the solutions significantly more complicated than they need to be
-risk of
Fifth! second iteration of needs analyzis