Programming formalisms

Invitation

Living document Feb 12 version Richel

The goal of this highly-interactive 5-day course is to be able to develop academic software that you can trust to be 'good enough'. We assume you have written code 'that (sometimes) just works'.

The course follows a formal development process from start to finish, with a selection of topics and best practices we think are most important, with the goal of developing academic software that is actually good enough.

Living document Feb 12 (First update)

This full 5-day course aims to give scientists, bioinformaticians and other research engineers with some experience in programming and scripting an understanding of the underlying principles of software development, design, and programming. The course aims to strengthen the understanding of more advanced programming concepts, ability to produce more reusable scripts through modular programming and to enable a better understanding of how to evaluate a script or programs performance.

We will cover a formal development process form start to finish. We use Test-Driven Development as a good example of a development process, requirements modeling, risk assessment and structured design.

Some of the topics covered are modular development and (code) reusability, testing and optimization.

We will cover theory with bridging practical examples and applications to enhance the theoretical understanding of the principles.

When: May 5-9, at 9:00-16:00

More information and registration (opens soon).

Old

This full 5-day course aims to give scientists, bioinformaticians and other research engineers with some experience in programming and scripting an understanding of the underlying principles of software development, design, and programming. The course aims to strengthen the understanding of more advanced programming concepts, ability to produce more reusable scripts through modular programming and to enable a better understanding of how to evaluate a script or programs performance.

We will cover a formal development process form start to finish. We use Test-Driven Development as a good example of a development process, requirements modeling, risk assessment and structured design.

Some of the topics covered are modular development and (code) reusability, testing and optimization.

We will cover theory with bridging practical examples and applications to enhance the theoretical understanding of the principles.

When: November 18-22, onboarding: November 15

For more information and registration, please visit: https://www.uu.se/en/centre/uppmax/study/courses-and-workshops/programming-formalisms.

arbetsdok för projekt

Beskrivning

Brief

Do some analysis on data from an Uppsala weather station.

Details

  • You have a temperature data file (Uppsala)
  • Build a program that performs and presents some analysis
    • graph of running means
    • other statistics
      • daily maximums/min
      • yearly maximums/min
    • meta data extraction
      • name of station
      • dates , times and varible from which column
    • presentation
      • header of graphs (name)
      • axis labels
  • When the program is run, user should be able to decide which analysis to perform, for instance wit arguments from the commandline.
  • How?
    • modular design
    • Different teams get different tasks focus
    • Start with Uppsala file
    • If time allows make program more general:
      • make possible to read other files (defined by)
      • there is a document that connects filenames to stations
      • argument can be station name and the program reads the right file

The outline of the project

  • needs analysis and risk assesment
  • uses Python and matplotlib plot the data
    • uses scipy and numpy alternatively writing the functions from
      scratch, depending on the needs in Algorithms and TDD sections.
    • uses modular design to be able to swap out the prediction functions.
    • uses object oriented design and development

Technical requirements

possibility for different file names and arguments defining what program should do.
modularity seems interesting
> modularity session split and mentioned earlier in the project start

previous brief:
!!! info "The brief"

​​​​We want to see (an approximation and cleanly looking)
​​​​hourly daily temperature data from Uppsala,
​​​​using a long-lasting dataset that recorded the temparature
​​​​a couple of times per day.

​​​​It needs to be approximate, as there are only few recordings
​​​​per day.
  • It needs to be cleanly looking as [Björn takes over here!]

  • We want to have the average curve be displayed to reduce noise of visualization.

  • This project is based of the works of Bergström and Moberg
    Bergström and Moberg 2002

The outline of the project

​​​​- needs analysis and risk assesment
​​​​- uses Python and matplotlib plot the data
​​​​- uses `scipi` and `numpy` alternatively writing the functions from
​​​​  scratch, depending on the needs in Algorithms and TDD sections.
​​​​- uses modular design to be able to swap out the prediction functions.
​​​​- uses object oriented design and development

opendata-download-metobs.smhi.se/
portal to get station data: https://www.smhi.se/data/meteorologi/ladda-ner-meteorologiska-observationer/airtemperatureInstant/97530

smhi-opendata_1_97530_20250122_131109.csv

Example from Uppsala airport with data and meta data in same file. However, in Swedish

Questions for team(s)

​​​​OK, we know that we should do something with the data (we give hints)
​​​​What data do we have?
​​​​How to read it?
​​​​What about quality flags? What to include?
​​​​Is reformatting needed?
​​​​How can we get information about station name and label our results accordingly?
​​​​Which functions needed?
​​​​Modules that do different statistics?
​​​​Use many/all stations to find dependence on latitude altitude?

Needed from us?

​​​​Translation of metadata or have one Swedish speaker in each group

Cons

​​​​filename does not tell the station name
​​​​ugly files with metadata in same file

Pros

​​​​meta content in same file
​​​​more possible analyses to be made with several stations

Suggestion from Björn

​​​​Many raw files (perhaps not all stations in Sweden) collected in a data folder
​​​​Different teams get different tasks focusing on different modules/analyses
​​​​In the end build a main program which user run with arguments to perform the needed analysis

I am happy there is some love for the project, in our efforts to make a simpler student project.

However, I do wonder we are falling in the trap to making it complex again :-)

​​​​Many raw files (perhaps not all stations in Sweden) collected in a data folder

Seems great to me! We start with the simplest files :-)

​​​​Different teams get different tasks focusing on different modules/analyses

Sounds good. It is close to our use of issues 👍

​​​​In the end build a main program which user run with arguments to perform the needed analysis

Can do. We do not discuss program arguments yet anywhere and I will volunteer to put some time in that if it does not fit better elsewhere.

Design

https://github.com/UPPMAX/programming_formalisms/commit/1349dc589d9dfde13cc39087216ee3e77f99bfdf

!Answer

​​​​    First! we ask ourself what is the scope and magnitude of this project
​​​​    - This is a small project expected for the team to finish in about one week
​​​​    - The project will be driven and implemented by a small team of distributed developers.
​​​​    - the aim of the project is to teach the team the SDLC process

​​​​    **conclusion**- The project needs and goes through more formalized steps than a regular project of similar size.
​​​​    The projects needs a minimal risk assessment and and needs determination

​​​​    Second! Determination of business and legal space.
​​​​        The project operates inside an open non-profit open source scope, The Data Protection Impact assessment is not needed for this data set since it is open and readily available , it non sensitive non personal data with minimal economic impact.
​​​​        This project is conducted inside a educational setting in Sweden and therefor the legal space is simple.
​​​​    With that out of the way(Usually not conducted for small project but included here for completeness)

​​​​    Third! first iteration of Needs gathering
​​​​        The brief gives us 
​​​​        - make graph of running means
​​​​            - other statistics
​​​​            - daily maximums/min
​​​​            - yearly maximums/min
​​​​        - meta data extraction
​​​​            - name of station
​​​​            - dates , times and varible from which column
​​​​        - presentation
​​​​            - header of graphs (name)
​​​​            - axis labels
​​​​            
​​​​    Needs/requirments matix:
​​​​    
​​​​         Requirement ID | Requirement Description | Acceptance Criteria | Test Cases
​​​​         -------------|------------------------|-----------------|-----------------------|
​​​​         R1|Open source development|the development process follows the principles of open source development| In each step, evaluate the accessability and openness
​​​​         R2|Analysis|the program performs and presents some analysis related to temperature||
​​​​         R3|File IO|The software should be able to read and parse the data in the SMHI station data csv files|
​​​​         
​​​​            
​​​​            
​​​​            
​​​​            
​​​​            forth! Risks
​​​​    Remember to think of both primary and secondary/derived risks
​​​​    We start looking at the Risk involved in the software project.
​​​​        -Over engineering - In this kind of small project the risk of a project failing is increased with the risk of making the solutions significantly more complicated than they need to be
​​​​        -risk of

​​​​    Fifth! second iteration of needs analyzis
Select a repo