Tools of Scientific Computing

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

tags: `Kickstart`

Work in progress by `simo.tuomisto@aalto.fi`. Looking for contributors/co-authors, add your name if you contributed. All original text and images under CC-BY 4.0.

Image Not Showing Possible Reasons
The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported
Learn More →

Image Not Showing Possible Reasons
The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported
Learn More →

Journey of scientific computing is a winding one

As showed in the short introduction to scientific computing, a typical research project might look like this:

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Figure 1 - A common pipeline of scientific computing

A single project can require a huge number of skills and tools:

data cleaning
programming
running computational simulations
testing and profiling
documenting
data plotting
sharing data/code with collaborators

Any skills learned through a project are often reused in subsequent projects as well.

This brings an interesting problem:

I suppose it is tempting, if the only tool you have is a hammer, to treat everything as if it were a nail.

Abraham Maslow

This means that even bad habits can be reused in subsequent projects.

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Figure 2 - xkcd #1739: Fixing Problems

Thus one should always keep a critical eye on what you're using and whether it is a good tool for the job at hand.

In the next sections we look through some tools and skills that might help you on your journey.

However, remember that these are just suggestions and if your tools work for you, maybe you're using the correct tools for the correct task.

Image Not Showing Possible Reasons
The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported
Learn More →

Packing your bag for a small journey of exploration

Small journey is a situation where you want to get a quick glimpse of a possible project.

Maybe you want to explore a new dataset, to test out a new algorithm or to see if you can use a new programming language.

For these kinds of projects you should invest in the following:

Pick a scientific programming language of your choice.
Pick an editor or an IDE with syntax highlighting.
Use an interactive IDE that allows you to write your code as a script while you're running it.

Scientific programming languages

There are many programming languages that can be used for general scientific computing. These kinds of languages can do (among other things):

Easy file input/output.
Mathematical functions e.g. linear algebra.
Easy plotting features.
Good tools for calling other languages.
Robust packaging ecosystem.

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Figure 3 - Arbitrary definition of a generic scientific programming language

Python

Python is the most popular language for general scientific computing. The main features are:

Very popular for various uses.
Scientific package ecosystem provided by the numpy family of packages (numpy, scipy, matplotlib, pandas, scikit-learn etc.).
Generic programming language, so it has good packages for web applications etc.

Matlab

Matlab is a commercial numerical computing language. It is quite popular in signal processing and in laboratories. The main features are:

Comprehensive IDE.
Toolboxes provided by Matlab.

When working with Matlab it is good to remember that it is a commercial product and not everyone can access a licence of it.

Julia

Julia is a newer language that has been designed for high-performance computing. The main features are:

Fast.
Designed for scientific computing from the beginning.
Good package ecosystem for scientific computing.

R

R is a language for statistics and data analysis. It is especially popular in statistics and bioinformatics. The main features are:

Designed for statistics and data science.
Huge number of packages.

Editors and IDEs

There are plenty of good editors and IDEs. Here's a list of some of the most popular ones.

Graphical editors

Generic IDEs:

Python IDEs:

Julia IDEs:

Julia for VSCode

R IDEs:

Rstudio

Text-based editors:

Using scripts or notebooks instead of an interactive console

Even when you're testing stuff it is a good idea to write your code as a script or as a Jupyter notebook. This way your ideas will not disappear when you close your console.

Working like this might seem self-evident, but it is important to keep in mind what commands you have written. Saving your code is the first step of documenting your work.

Writing scripts in IDEs

IDEs commonly have an interactive console and an editor window where you can write your scripts. You can run commands in the console itself, but it is better practice to write your code as a script and then execute the code in the interactive console.

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Figure 4 - Writing a script with an IDE

Writing a notebook with Jupyter

Jupyter notebooks work a bit differently to typical scripts. A notebook is split into cells that can contain code or documentation. Code cells can be executed by a kernel, which can be of any language.

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Figure 5 - Writing a notebook

For more on Jupyter, see for example:

Image Not Showing Possible Reasons
The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported
Learn More →

What you should pack for a new project

A new project is something that takes more than a few days to complete. Even a single weekend or working on a different project can break your train of thought and thus it is important to record your work.

Maybe your initial exploration looked promising and now you want to try it out for real.

For these kinds of projects you should invest in the following:

Use a version control system with a remote provider.
Do some simple documentation and commenting.
Keep track of any requirements.
When possible, use already existing scientific computing packages and frameworks.

Using version control

Version control is an invaluable tool whether you're doing scientific research or software engineering and Git is by far the most popular version control tool available.

Version control tools such as git will track changes on a line by line basis and will record changes into commits. This allows users of version control to revert changes, merge commits made by other collaborators and keep code up to date across multiple systems.

Thus if you do not use version control, start learning on git immediately. There are plenty of good resources such as:

CodeRefinery's intro to version control.
Software Carpentry's git course.

There are many providers for centralized repository storage. For public projects:

GitHub is an excellent choice for any projects that do not have proprietary secrets or private data. Nowadays free users can create private repositories as well.
version.aalto.fi is a private GitLab instance for Aalto students and employees.

Whenever you're starting a new project, it is a good idea to start by creating a new repository for the project.

Commenting & documenting

Commenting your code and your project will help you and your collaborators in keeping track of the different pieces of the project.

You might think that you can keep all of the moving pieces in your memory, but that will put unnecessary pressure on your long-term memory. It's better to remember the big picture and whenever you need to look at the details, you look at the comments to refresh your memory.

There are plenty of tools that help with commenting:

Markdown (or CommonMark as the standard is called) is a lightweight and easy markup language. There are plenty of good Markdown cheat sheets such as this and this. This HackMD document is written in Markdown as well!
There are various style guides for different programming languages. Taking one standard and following the convention will help you write good looking code and you don't have to worry about choosing comment styles etc.
You can use linters: programs that check whether your code has syntax errors and style errors to. For list of linters, see for example this page

It is good to also remember that no-one likes commenting and documenting.

Keeping track of requirements

All projects will utilize programs that are not part of the project. Keeping track of these requirements from the start is very important, as it allows you to:

Know what your program needs.
Recreate your environment easily.
Help your collaborators replicate your environment.
Debug problems with the installed versions of software.

There are various ways of keeping track of you requirements. Below are few examples:

Keep a text file with a list of commands you have run when you installed the program
Keep a simple documentation (such as a markdown INSTALL.md) with instructions on how to install the program
Keep a requirements file such as requirements.txt (PIP), environment.yml (conda) that works with a package managing system or keep a list of packages you have installed.

Most important thing is to keep yourself up to date on what your code utilizes and recording it somewhere.

Using existing packages and frameworks

If I have seen further it is by standing on the sholders [sic] of Giants.

Isaac Newton

No single person can know everything and no one has time to implement every feature themselves. Thus using existing packages and frameworks is imperative for effictive scientific computations.

If you start writing your own function for e.g. calculating an integral or doing a least-squares fit, ask yourself if someone else has ever needed the same function. If the answer to that question is yes, most likely someone has already implemented the functionality in a scientific computing package or framework.

By using packages made by others you avoid bugs in your own implementations and you make your code easier to read for other people.

Below are examples of few frameworks that might interest you:

Python

numpy ecosystem ecosystem - Numpy ecosystem containts pakcages for scientific calculations, data science, machine learning, plotting etc. (numpy, scipy, pandas, scikit-learn, matplotlib, …).
bioconda - Collection of packages for bioinformatics.
geopython - Collection of packages for working with geospatial data.
astropy - Collection of packages for astronomy.
PyTorch - PyTorch is a popular deep learning framework.
Tensorflow & keras - Tensorflow is a popular deep learning framework and Keras is a simpler API for it.

Matlab

Matlab toolboxes - Official toolboxes from MathWorks for various specialized cases.
MathWorks File Exchange - Community hub for Matlab packages.

Julia

Gadfly.jl - A versatile plotting library.
Flux.jl - A machine learning library for Julia.
DifferentialEquations.jl - A library for solving differential equations.

R

Tidyverse - Environment of helper functions for easier data analysis.
gglot2 - Extremely popular plotting library.
Bioconductor - Popular framework for bioinformatics.
For a good list of other packages, see for example this community managed list.

How to build a foundation for a lasting project

When you're starting on a long-term project, you need to plan ahead for possible future needs.

Maybe you'll want to scale up your computations in an HPC environment. Maybe you'll want to share your project to the wider world.

Like building a house, you'll want to have the project on a solid foundation.

Keeping the following ideas at the back of your head will help you in creating such a foundation:

Make sure that it is possible to run your code from the command line.
Find the part of your program that does most of the work and optimize that.
Think what happens to the project after it is finished.
Remember to ask for help when you feel like you need it.

Running code from the command line

The command line (also known as a shell) can be used to run programs without a graphical interface. This might seem like an old way of running programs, but in many cases it is the most efficient way.

One of the main advantages of the command line is that it can be used in all kinds of different systems as long as they have the same. Especially in ones that do not have a graphical interface.

Other advantage is that when you're running a program through a command line, you do not need the IDE that was used to create the program. This makes it easier to port the program to other systems and to other users.

These features are imperative in high-performance computing (HPC) systems that usually do not have graphical interface and where you want to focus on running the code and nothing else.

Optimizing at the right places

Typically projects will have parts that are used rarely and some things that are used all the time.

Most scientific programs have similar structure: some parts of the code are called again and again, while some parts are called once or twice.

Thus when you reach a point where you want to do more, it is usually important to know what actually takes the time and focus on that.

Figure 6: xkcd #1205: Is It Worth the Time?

All languages have tools for profiling. They will help you figure out where the bottlenecks might be. Some basic profilers are listed below.

After you have profiled which parts of your program take the most time, you can fix possible bugs in the code itself or use specialized libraries to optimize the code.

However, one should try to avoid premature optimization. If the code does not do what you want it to do, running it fast won't help.

Think of what happens to the project in the long run

What will happen to the project if I'm not going to update it any more? Will anyone else use this after me? Should I make my project public?

It is good to ask such questions when you start a new project. Knowing what is the end goal of the project will be helps you make choices throughout the project.

In many cases, opening up code in a shared repository and creating code publicly is the best option. This will help you design code not only for yourself, but for others as well.

If fully public design is not an option, using a private repository with your colleagues might be a better one. Getting feedback from your colleagues and minimizing the burden of contributing is always a good idea.

This topic is very wide and there is no good single answer, but for more information, you can check out the following:

Remember to ask for help

A single person cannot know everything about everything. Thus whether you're starting a project, designing a project, working on the project or sharing the project with others, you'll encounter questions you do not know the answer for.

Scientific computing is a field where sharing information is crucial:

We write code with editors created by others
We use programming languages created by others
We use specialized libraries created by others
We use scientific models and algorithgms created by others
We use HPC systems we share with others
We do research with others
We share our results with others

At every step of the way, we work with other people, in one way or another. When you encounter a problem, someone else might already have a solution. When you find a solution, someone else might need it to solve their problem.

Sharing information and asking for help are probably the most important skills you might learn when doing scientific computing.

There are many sources of help/information available:

For help:

Your colleagues
Scientific computing team in your university (Aalto Scicomp)
Research software engineers in your university (Aalto RSE)

For information:

Documentation of the libraries you're using
GitHub issues for a package/library you're using
Stack Overflow
CodeRefinery

Tools of Scientific Computing

tags: `Kickstart`

Image Not Showing Possible Reasons
The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported
Learn More →

Journey of scientific computing is a winding one

Image Not Showing Possible Reasons
The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported
Learn More →

Packing your bag for a small journey of exploration

Scientific programming languages

Python

Matlab

Julia

R

Editors and IDEs

Graphical editors

Text-based editors:

Using scripts or notebooks instead of an interactive console

Writing scripts in IDEs

Writing a notebook with Jupyter

Image Not Showing Possible Reasons
The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported
Learn More →

What you should pack for a new project

Using version control

Commenting & documenting

Keeping track of requirements

Using existing packages and frameworks

Python

Matlab

Julia

R

How to build a foundation for a lasting project

Running code from the command line

Optimizing at the right places

Think of what happens to the project in the long run

Remember to ask for help

Image sources

Treasure map icon(CC-BY-SA-3.0)

Research project(CC-BY-SA-4.0)

xkcd #1739: Fixing Problems(CC BY-NC 2.5)

Backpack icon(CC-BY-SA-4.0)

Tent icon(CC-BY-SA-4.0)

House with garden icon(CC-BY-SA-4.0)

xkcd #1205: Is It Worth the Time?(CC BY-NC 2.5)

Tools of Scientific Computing

tags: Kickstart

Image Not Showing Possible Reasons The image file may be corruptedThe server hosting the image is unavailableThe image path is incorrectThe image format is not supported Learn More → Journey of scientific computing is a winding one

Image Not Showing Possible Reasons The image file may be corruptedThe server hosting the image is unavailableThe image path is incorrectThe image format is not supported Learn More → Packing your bag for a small journey of exploration

Scientific programming languages

Python

Matlab

Julia

R

Editors and IDEs

Graphical editors

Text-based editors:

Using scripts or notebooks instead of an interactive console

Writing scripts in IDEs

Writing a notebook with Jupyter

Image Not Showing Possible Reasons The image file may be corruptedThe server hosting the image is unavailableThe image path is incorrectThe image format is not supported Learn More → What you should pack for a new project

Using version control

Commenting & documenting

Keeping track of requirements

Using existing packages and frameworks

Python

Matlab

Julia

R

How to build a foundation for a lasting project

Running code from the command line

Optimizing at the right places

Think of what happens to the project in the long run

Remember to ask for help

Image sources

Treasure map icon(CC-BY-SA-3.0)

Research project(CC-BY-SA-4.0)

xkcd #1739: Fixing Problems(CC BY-NC 2.5)

Backpack icon(CC-BY-SA-4.0)

Tent icon(CC-BY-SA-4.0)

House with garden icon(CC-BY-SA-4.0)

xkcd #1205: Is It Worth the Time?(CC BY-NC 2.5)

Read more

2025 summer kickstart - triton tutorial notes

RSE training checklist

RSE tech kickstart checklist

Software evaluation flowchart

tags: `Kickstart`

Image Not Showing Possible Reasons
The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported
Learn More →

Journey of scientific computing is a winding one

Image Not Showing Possible Reasons
The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported
Learn More →

Packing your bag for a small journey of exploration

Image Not Showing Possible Reasons
The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported
Learn More →

What you should pack for a new project