0. Welcome
0.1 Recap & motivation: why collaboration and best research software engineering practices in the first place?
0.2 Difference between "coding" and "research software engineering"
0.3 What you’ll learn
0.4 Target audience
0.5 Pre-requisites
1. Let's start! Introduction into the project & setting up the environment
1.1 The project
1.2 GitHub CodeSpaces
1.3 Integrated Development Environments
1.4 Git and GitHub
1.5 Creating virtual environments
2. Ensuring correctness of software at scale
2.1 Unit tests
2.2 Scaling up unit tests
2.3 Debugging code & code coverage
2.4 Continuous integration
3. Software design
3.1 Programming paradigms
3.2 Object-oriented programming
3.3 Functional programming
4. Writing software with - and for - others: workflows on GitHub, APIs, and code packages
4.1 GitHub pull requests
4.2 How users can use the program you write: application programming interfaces
4.3 Producing a code package
5. Collaborating research and sharing experiment results
5.1 Experiment analysis: custom training curve plotting
5.2 Tensorboard: a standard for training metrics plotting and export
5.3 DL Experiment Tracking and Management as a tool for collaborative research and result sharing
6. Wrap-up
7. Further resources
8. License
9. Original course
10. Acknowledgements
Different things can be meant by the term "software design":
Design patterns are typical solutions to commonly occurring problems in software design (from any of the domains/levels mentioned above). From Refactoring Guru:
Programming paradigms such as object-oriented or functional programming (we'll get to those in a minute!) are not so straightforward to allocate w. r. t. the different facets of software design mentioned above: a programming paradigm represents an own way of thinking about and structuring code, with pros and cons when used to solve particular types of problems.
Technical debt: If we don't follow best practices around code, including addressing design questions, we may build up too much technical debt - the cost of refactoring code due to having chosen a quick-and-dirty solution instead of having used a better approach that would have initially taken longer.
There are two major families that we can group the common programming paradigms into: Imperative and Declarative.
We will look into two major paradigms from the imperative and declarative families that may be useful to you - functional programming and object-oriented programming.
In object-oriented programming, objects encapsulate data in the form of attributes and code in the form of methods that manipulate the objects’ attributes and define how objects can behave (in interaction with each other).
A class is a template for a structure and a set of permissible behaviors that we want our data to comply to, thus, each time we create some data using a class, we can be certain that it has the same structure.
If you know about Python lists and dictionaries, you may recognize that they behave similarly to how we may define a class ourselves:
Encapsulating data
Let's have a look at a simple class:
class Patient:
def __init__(self, name):
self.name = name
self.observations = []
Alice = Patient('Alice')
print(Alice.name)
Output:
Alice
__init__
- the initialiser method which sets up the initial values and structure of the data inside a new instance of the class. We call the __init__
method every time we create a new instance of the class, as in Patient('Alice'). The argument self
refers to the instance on which we are calling the method and gets filled in automatically by Python whenever we instantiate a new class instance.Alice.name
).Encapsulating behavior
Let's add a method to the above class which operates on the data that the class contains: adding a new observation to a Patient instance.
class Patient:
"""A patient in an inflammation study."""
def __init__(self, name):
self.name = name
self.observations = []
def add_observation(self, value, day=None):
if day is None:
try:
day = self.observations[-1]['day'] + 1
except IndexError:
day = 0
new_observation = {
'day': day,
'value': value,
}
self.observations.append(new_observation)
return new_observation
Alice = Patient('Alice')
print(Alice)
observation = Alice.add_observation(3)
print(observation)
print(Alice.observations)
Output:
<__main__.Patient object at 0x7f67f424c190>
{'day': 0, 'value': 3}
[{'day': 0, 'value': 3}]
self
(using this name is not strictly necessary, but is a very strong convention). Similar to the initialiser method, when we call a method on an object, the value of self is automatically set to this object - hence the name.Alice.add_observation(3)
).Dunder Methods
The__init__
method begins and ends with a double-underscore - it is a dunder method. These dunder methods (also called magic methods) are not meant to be invoked directly by you, but the invocation happens internally from the class on a certain action. Built-in classes such in Python as the int
class define many magic methods.
print(Alice)
, it returned <__main__.Patient object at 0x7fd7e61b73d0>
which is the string represenation of the Alice object. Functions like print()
or str()
use __str__()
.__str__
method within our class to display the object's name instead of the object's string representation.
class Patient:
"""A patient in an inflammation study."""
def __init__(self, name):
self.name = name
self.observations = []
def add_observation(self, value, day=None):
if day is None:
try:
day = self.observations[-1]['day'] + 1
except IndexError:
day = 0
new_observation = {
'day': day,
'value': value,
}
self.observations.append(new_observation)
return new_observation
def __str__(self):
return self.name
Alice = Patient('Alice')
print(Alice)
Output:
Alice
Relationships between classes
There are two fundamental types of object characteristics which also denote the relationships among classes:
Composition
In object oriented programming, we can make things components of other things, e.g., we may want to say that a doctor has patients or that a patient has observations. In the way we had written our class so far, a patient already has observations - which is a case of composition.
Let's separate the two and make an own Observation class, and make use of it in the Patient Class.
class Observation:
def __init__(self, day, value):
self.day = day
self.value = value
def __str__(self):
return str(self.value)
class Patient:
"""A patient in an inflammation study."""
def __init__(self, name):
self.name = name
self.observations = []
def add_observation(self, value, day=None):
if day is None:
try:
day = self.observations[-1].day + 1
except IndexError:
day = 0
new_observation = Observation(day, value)
self.observations.append(new_observation)
return new_observation
def __str__(self):
return self.name
Alice = Patient('Alice')
obs = Alice.add_observation(3, 3)
print(obs)
Output:
3
Inheritance
Inheritance is about data and behaviour that two or more classes share: if class X inherits from (is a) class Y, we say that Y is the superclass or parent class of X, or X is a subclass of Y - X gets all attributes and methods of Y.
If we want to extend the previous example to also manage people who aren’t patients we can add another class Person. But Person will share some data and behaviour with Patient - in this case both have a name and show that name when you print them. Since we expect all patients to be people (hopefully!), it makes sense to implement the behaviour in Person and then reuse it in Patient.
To write our class in Python, we used the class keyword, the name of the class, and then a block of the functions that belong to it. If the class inherits from another class, we include the parent class name in brackets.
class Observation:
def __init__(self, day, value):
self.day = day
self.value = value
def __str__(self):
return str(self.value)
class Person:
def __init__(self, name):
self.name = name
def __str__(self):
return self.name
class Patient(Person):
"""A patient in an inflammation study."""
def __init__(self, name):
super().__init__(name)
self.observations = []
def add_observation(self, value, day=None):
if day is None:
try:
day = self.observations[-1].day + 1
except IndexError:
day = 0
new_observation = Observation(day, value)
self.observations.append(new_observation)
return new_observation
class Patient(Person)
), as well as in the initialiser (super().__init__(name)
).
__init__
method for our subclass, Python will look for one on the parent class and use it automatically). This is true of all methods - if we call a method which doesn’t exist directly on our class, Python will search for it among the parent classes.self.name = name
in the Patient class becomes obsolete.! QUESTION 1 ! What outputs do you expect here?
Alice = Patient('Alice')
print(Alice)
obs = Alice.add_observation(3)
print(obs)
Bob = Person('Bob')
print(Bob)
obs = Bob.add_observation(4)
print(obs)
! TASK 4 ! Write a Doctor class to hold the data representing a single doctor:
test_patient.py
.Final note: When deciding how to implement a model of your particular system, you often have a choice of either composition or inheritance, where there is no obviously correct choice - multiple implementations may be equally good. (See more on that in the The Composition Over Inheritance Principle.
Key points and benefits of OOP:
In functional programming, programs apply and compose/chain functions. It is based on the mathematical definition of a function f()
which does a transformation/mapping from input x
to output f(x)
).
Contrary to imperative paradigms, it does not entail a sequence of steps during which the state of the code is updated to reach a final desired state. It describes the transformations to be done without producing such side effects.
The following two code examples implement the calculation of a factorial in procedural and functional styles, respectively. The factorial of a number n
(denoted by n!
) is calculated as the product of integer numbers from 1 to n.
Procedural style factorial function
def factorial(n):
"""Calculate the factorial of a given number.
:param int n: The factorial to calculate
:return: The resultant factorial
"""
if n < 0:
raise ValueError('Only use non-negative integers.')
factorial = 1
for i in range(1, n + 1): # iterate from 1 to n
# save intermediate value to use in the next iteration
factorial = factorial * i
return factorial
factorial
in the for loop) and advance towards the result.Functional style factorial function
def factorial(n):
"""Calculate the factorial of a given number.
:param int n: The factorial to calculate
:return: The resultant factorial
"""
if n < 0:
raise ValueError('Only use non-negative integers.')
if n == 0 or n == 1:
return 1 # exit from recursion, prevents infinite loops
else:
return n * factorial(n-1) # recursive call to the same function
factorial
in the above example), or modify data that exists outside the current function, including the input data (e.g., printing text, writing to a file, modifying the value of an input argument, or changing the value of a global variable).
n
, while the procedural impl. runs faster. It is vital to consider your use case before chosing which kind of paradgim to use for your software.! QUESTION 2 ! Which of these functions are pure?
def add_one(x):
return x + 1
def say_hello(name):
print('Hello', name)
def append_item_1(a_list, item):
a_list += [item]
return a_list
def append_item_2(a_list, item):
result = a_list + [item]
return result
3.3.1.1 Mapping
General syntax: map(f, C)
: apply function f
to each element of the collection C
and return the results as a new collection of the same size.
As an example, we are interested in getting the length of each name in the list ["Mary", "Isla", "Sam"]
:
name_lengths = map(len, ["Mary", "Isla", "Sam"])
print(list(name_lengths))
# Output
[4, 4, 3]
Another example: let's write a one-liner that squares every number in the collection C=[0, 1, 2, 3, 4]
using an anonymous lambda
expression:
squares = map(lambda x: x * x, [0, 1, 2, 3, 4])
print(list(squares))
# Output
[0, 1, 4, 9, 16]
A quick note on Lambda expressions in Python
lambda x, y, z, ... : <expression>
x, y, z ...
,<expression>
def f(x, y, z, ...):
return <expression>
! TASK 6 ! Check Inflammation Patient Data Against A Threshold Using Map:
Write a new function named daily_above_threshold()
in our inflammation models.py
that determines whether or not each daily inflammation value for a given patient exceeds a given threshold.
Given a patient row number in our data, the patient dataset itself, and a given threshold, write the function to use map() to generate and return a list of booleans, with each value representing whether or not the daily inflammation value for that patient exceeded the given threshold.
def daily_above_threshold(patient_num, data, threshold):
"""Determine whether or not each daily inflammation value exceeds a given threshold for a given patient.
:param patient_num: The patient row number
:param data: A 2D data array with inflammation data
:param threshold: An inflammation threshold to check each daily value against
:returns: A boolean list representing whether or not each patient's daily inflammation exceeded the threshold
"""
return list(map(lambda x: x > threshold, data[patient_num]))
3.3.1.2 Reduce
Conversely, reduce(f, C, initialiser)
functions taken in a function f()
, a collection C
of data items (and an optional initialiser
), then returns a single cumulative value that aggregates all values in the collection.
f()
to either a) the first two values in C
or b)the initialiser
and the first value in C
.f()
to the result of the previous operation and the next element in C
, until the whole list is exhausted.f(f(f(f(C[0], C[1]), C[2]), C[3]), C[4])
# This leverages composability of functional programming
As an example, let's use from functools import reduce
to compute the product of the sequence of numbers l = [1, 2, 3, 4]
from functools import reduce
l = [1, 2, 3, 4]
def product(a, b):
return a * b
print(reduce(product, l))
# The same reduction using a lambda function
print(reduce((lambda a, b: a * b), l))
# Output
24
24 # lambda version
! TASK 7 ! Calculate the sum of a sequence of numbers using reduce:
We aim to reproduce the behavior of Python's native sum()
using reduce
:
from functools import reduce
l = [1, 2, 3, 4]
def add(a, b):
return a + b
print(reduce(add, l))
# The same reduction using a lambda function
print(reduce((lambda a, b: a + b), l))
# Output
10
10 # lambda version
3.3.1.3 Putting it all together: MapReduce
Let us now write a function sum_of_squares
that calculates the sum of the squares of the values in a list using the MapReduce approach:
from functools import reduce
def sum_of_squares(l):
squares = map(lambda x: x * x, l) # use map
# squares = [x * x for x in l] # use list comprehension for mapping
return reduce(lambda a, b: a + b, squares)
# Output
10
10 # lambda version
Test it with
print(sum_of_squares([0]))
print(sum_of_squares([1]))
print(sum_of_squares([1, 2, 3]))
print(sum_of_squares([-1]))
print(sum_of_squares([-1, -2, -3]))
and confirm the follwoing input:
0
1
14
1
14
! TASK 8 ! Extend Inflammation Threshold Function Using Reduce:
Extend the daily_above_threshold()
function you wrote previously to return a count of the number of days a patient’s inflammation is over the threshold.
Use reduce()
over the boolean array that was previously returned to generate the count, then return that value from the function.
You may choose to define a separate function to pass to reduce()
, or use an inline lambda expression to do it (which is a bit trickier!).
Some hints:
initialiser
value with reduce()
to help you start the counter<value> if <condition> else <another_value>
in the expression.
from functools import reduce
# [... other test code ...]
def daily_above_threshold(patient_num, data, threshold):
"""Count how many days a given patient's inflammation exceeds a given threshold.
:param patient_num: The patient row number
:param data: A 2D data array with inflammation data
:param threshold: An inflammation threshold to check each daily value against
:returns: An integer representing the number of days a patient's inflammation is over a given threshold
"""
above_threshold = map(lambda x: x > threshold, data[patient_num])
return reduce(lambda a, b: a + 1 if b else a, above_threshold, 0)
As an example of composability, let's look at Python decorators: as seen in the episode on parametrising our unit tests, a decorator can take a function, modify/decorate it, then return the resulting function.
This is possible because in Python, functions can be passed around as normal data.
Here, we discuss decorators in more detail and learn how to write our own.
Let’s look at the following code for ways on how to “decorate” functions.
# define function where additional functionality is to be added
def ordinary():
print("I am an ordinary function")
# define decorator, or outer function for first function
def decorate(func):
# define the inner function
def inner():
# add some additional behavior to original function
print("I am a decorator")
# call original function
func()
# return the inner function
return inner
# decorate the ordinary function
decorated_func = decorate(ordinary)
# call the decorated function
decorated_func()
Output:
I am a decorator
I am an ordinary function
ordinary()
is to be decorated,decorate(func)
is the function that decorates another function,decorate(ordinary)
builds another function that adds functionality to ordinary()
.Another way to use decorators is to add @decorate before the function to be decorated:
# define decorator, or outer function for first function
def decorate(func):
# define the inner function
def inner():
# add some additional behavior to original function
print("I am a decorator")
# call original function
func()
# return the inner function
return inner
# define function where additional functionality is to be added
@decorate
def ordinary():
print("I am an ordinary function")
# call the decorated function
ordinary()
Output:
I am a decorator
I am an ordinary function
! TASK 9 ! Write a decorator that measures the time time taken to execute a particular function using the time.process_time_ns() function.
time
.start = time.process_time_ns()
, and get another time stamp once the calculation in question is done using end = time.process_time_ns()
.
def measure_me(n):
total = 0
for i in range(n):
total += i * i
return total
import time
def profile(func):
def inner(*args, **kwargs):
start = time.process_time_ns()
result = func(*args, **kwargs)
stop = time.process_time_ns()
print("Took {0} seconds".format((stop - start) / 1e9))
return result
return inner
@profile
def measure_me(n):
total = 0
for i in range(n):
total += i * i
return total
print(measure_me(1000000))
Pull requests let you tell others about changes you've pushed to a branch in a repository on GitHub. Once a pull request is opened, you can discuss and review the potential changes with collaborators and add follow-up commits before your changes are merged into the base branch. Code review plays an essential role in this process.
Code review is one of the most important practices of collaborative software development that improves code quality and increases knowledge about the codebase across the team. Before contributions are merged into the main branch, code will need to be reviewed, e.g., by the maintainer(s) of the repository.
Although the role of code review can't be overstated, we will not go into the details here, as it's better suited for self-study compared to other building blocks in research software engineering that we touch upon in this tutorial. See, e.g., a guide on code review from Kimmo Brunfeldt here.
The way you and your team provide contributions to the shared codebase depends on the type of development model you use in your project. Two commonly used models are the following:
shared repository model: folks are granted push access to a single shared code repository, but feature branches for new developments are still created. This model is good for core contributors who may wish to have faster workflows in the testing and merging cycle.
fork and pull model: folks fork an existing repository (to create their copy of the project linked to the source) and push changes to their personal fork. A contributor can therefore work independently on their own fork as they do not need permissions on the source repository to push modifications to their own fork. The project maintainer can then pull the contributors' changes into the source repository on request and after a code review process.
! TASK 10 ! Making a PR to the original project, mock code review and discussion around the process.
develop
.1. Starting from the main
branch, create a new branch add-name
:
git branch main
git switch -C add-name # creates and switch to the branch directly
2. Edit the README.md
file by adding your name under the "Participants" section.
3. Track the changes and commit them with:
git add README.md
git commit -m "Added my name for PR exercise"
Keep an eye out from here: Git might prompt you on whether or not you would like to fork the repository into your account so that your changes can be tracked.
Here, Github Codespaces comes in very handy, as it will create a fork of the original project, since you probably don't have write permission to our original repository.
4. Push your newly created branch to your own fork:
git push -u origin add-name
5. Create a PR to the original repository
For the sake of simplicity, we create the PR in Github's web interface.
If the previous steps were followed precisely, the Code
tab of your fork of the project should give you the option to create a PR for your changes in add-name
to the branch of your choice on the original (upstream
) repository.
6. From the maintainer's perspective
Once a PR is received, we usually perform a code review.
The "File changed" tab in the PR's interface is a very useful tool to gage the changes that a PR makes to its targeted branch.
Once the code is reviewed, the maintainer can either request for some changes, or proceed to merge the PR if it satisfies all the requirements.
In our case, since the changes are quite minimal, we just proceed to merge and close the PR.
Once that is done, your changes will be reflected in the corresponding branch of the project, or in our case, the main
branch.
The Continuous Integration using Github Actions can be used at the PR level to make sure incoming code is adequate and passess the relevant tests.
We will now have a look at inflammation-analysis.py
which, in our example, is the entry point of our simple application - users will need to call it within a CLI, alongside a set of arguments:
python3 inflammation-analysis.py data/inflammation-03.csv
How to use the application and which arguments to specify can be accessed via
python3 inflammation-analysis.py --help
inflammation-analysis.py
can be run in different ways - as an imported library, or as the top-level script in which case the global dunder variable __name__
will be set to "__main__"
.
Advanced API configuration tools
__name__
In inflammation-analysis.py
, we see the following code:
# import modules
def main():
# perform some actions
if __name__ == "__main__":
# perform some actions before main()
main()
__name__
is a special dunder variable which is set, along with a number of other special dunder variables, by the python interpreter before the execution of any code in the source file. What value is given by the interpreter to __name__
is determined by the way in which it is loaded.
If you run the following command (i.e., run the file as a script), __name__
will be equal to __main__
, and everything following after the if-statement will be executed:
$ python3 inflammation-analysis.py
If you import your file as a module via import inflammation-analysis
, __name__
will be set to the module name, i.e., __name__ = "inflammation-analysis"
.
In other words, the global variable __name__
allows you to execute code when the file runs as a script, but not when it’s imported as a module.
Python sets the global name of a module equal to __main__
if the Python interpreter runs your code in the top-level code environment.
“Top-level code” is the first user-specified Python module that starts running. It’s “top-level” because it imports all other modules that the program needs.
To be able to run inflammation-analysis.py
in the CLI, we need to enable Python to read command line arguments. The standard Python library for reading command line arguments passed to a script is argparse
. Let's look into inflammation-analysis.py
again.
# we first initialise the argument parser class,
# passing an (optional) description of the program:
parser = argparse.ArgumentParser(
description='A basic patient inflammation data
system')
# we can now add the arguments that we want argparse
# to look out for; on our case, we only want to process
# the names of the file(s):
parser.add_argument(
'infiles',
nargs='+',
help='Input CSV(s) containing inflammation series for each patient')
# we parse the arguments passed to the script:
args = parser.parse_args()
infiles
), the number of arguments to be expected (nargs='+', where '+' indicates that there should be 1 or more arguments passed); and a help string for the user (help='Input CSV(s) containing inflammation series for each patient'
).parser.parse_args()
returns an object (called arg
) containing all the arguments requested. These can be accessed using the names that we have defined for each argument, e.g., args.infiles
would return the filenames that were used as inputs.python3 inflammation-analysis.py data/inflammation-03.csv
, nothing will happen at that point, as views.py
used matplotlib
, but a our CLI will output only text. But we could add another modality in views.py
to be able to generate output that is shown in the CLI.We will now look at how we can package software for release and distribution, using Poetry
to manage our Python dependencies and produce a code package we can use with a Python package indexing service such as PyPi.
Here, we only marginally touch upon important factors to consider before publishing software, most of which have to do with documentation. Documentation is a foundational pillar in coding/writing software. While its significance can't be overstated, we omit this part in this tutorial, as it's better for self-study compared to other building blocks in research software engineering.
Documentation
Before releasing software for reuse, make sure you have
Marking a software release
There are different ways in which we can make a software release from our code in Git/on GitHub, one of which is tagging: we attach a human-readable label to a specific commit, e.g., "v1.0.0", and push the change to our remote repo:
! FOLLOW ALONG IN YOUR CODESPACE !
$ git tag -a v1.0.0 -m "Version 1.0.0"
$ git push origin v1.0.0
We will use Python's Poetry
library which we'll install in our virtual environment (make sure you're in the root directory when avtivating the virtual environment, and let's check afterwards that we installed Poetry
within it):
! FOLLOW ALONG IN YOUR CODESPACE !
$ source venv/bin/activate # if not already
$ pip3 install poetry
$ which poetry
Poetry uses a pyproject.toml
file to describe the build system and requirements of the distributable package.
To create a pyproject.toml
file for our code, we can use poetry init
which will guide us through the most important settings (for each prompt, we either enter our data or accept the default).
Below, you see the questions with the recommended responses, so do follow these (and use your own contact details).
Poetry
can automatically find the code.$ poetry init
Output:
This command will guide you through creating your pyproject.toml config.
Package name [example]: inflammation
Version [0.1.0]: 1.0.0
Description []: Analyse patient inflammation data
Author [None, n to skip]: Nadine Spychala <nadine.spychala@gmail.com>
License []: MIT
Compatible Python versions [^3.8]: ^3.8
Would you like to define your main dependencies interactively? (yes/no) [yes] no
Would you like to define your development dependencies interactively? (yes/no) [yes] no
Generated file
[tool.poetry]
name = "inflammation"
version = "1.0.0"
description = "Analyse patient inflammation data"
authors = ["Nadine Spychala <nadine.spychala@gmail.com>"]
license = "MIT"
[tool.poetry.dependencies]
python = "^3.8"
[tool.poetry.dev-dependencies]
[build-system]
requires = ["poetry-core>=1.0.0"]
build-backend = "poetry.core.masonry.api"
Do you confirm generation? (yes/no) [yes] yes
When we add a dependency using Poetry
, Poetry will add it to the list of dependencies in the pyproject.toml
file, and automatically install the package into our virtual environment.
NumPy
. The latter are dependencies which are needed/essential in order to develop code, but not required to run it, e.g., pylint
or pytest
.$ poetry add matplotlib numpy
$ poetry add --dev pylint
$ poetry install
Let's build a distributable version of our software:
$ poetry build
This should produce two files for us in the dist
directory of which the most important one is the .whl
or wheel file. This is the file that pip
uses to distribute and install Python packages, so this is the file we’d need to share with other people who want to install our software.
If we gave this wheel file to someone else, they could install it using pip
:
$ pip3 install dist/inflammation*.whl
If we need to publish an update, we just update the version number in the pyproject.toml
file, then use Poetry
to build and publish the new version. Any re-publishing of the package, no matter how small the changes, needs to come with a new version number.
So far, we have seen general principles for collaborative research software design.
In deep learning and its tangential fields, we usually train models, log various metrics along the way, which are then used to analyze the performance and support the claims made in the sibling paper.
This section aims to give a brief overview of different ways to handling experiment results and analyzing them, culminating into a set of principles and tools for collaborative experiment management and analysis.
One way to analyze the results it to have manually designed plotting functions.
While it allows us a very precise control on the type of graphs and analysis tool we can deploy, it comes at a relatively higher cost in engineering, and is relatively rigid.
Some caveats
When publishing code and a paper, we should ideal also publish the datasets and raw experiments runs that were used in the paper for better reproducibility.
Tensorboard is a visualization toolkit tailed for deep learning experiments.
Its main use is to track various metrics during the training of deep learning models.
It also provides a set of graphical user interfaces for quick assessment of the logged metrics and comparison between runs.
You are probably already familiar with this interface.
While Tensorboard might be easy to setup and use, depending on the type of experiment one runs, it can quickly become lacking for more advanced analysis.
However, the analysis tools of Tensorboard can be limited depending on one's use case and the type of analysis that needs to be perform.
Anything more advanced would thus require a fall back on custom analysis pipeline.
Nowadays, there exist tools that handle ML experiment tracking like Tensorboard, while offering a lot of flexibility for plotting and analysis.
The prominent ones at the time of this writing are as follows:
A more exhaustive list on neptune.ai site with comparison.
Let's walk though a few of the benefits of using those tools for collaborative analysis and publishing results, with an emphasis on WANDB (disclaimer: not a sponsored advertisment).
Benefits
Overall, this type of tool can improve the productivity at an individual level, which also scales to teams of researchers, with a relatively low engineering cost.
Finally, let's recapitulate the content of this tutorial into a few takeaways points:
git
, separation of active development into branchespip install
then python -m my-program
A graphical summary of the concept covered in this tutorial, from the original course material.
The present tutorial is a modified version of Nadine Spychala's Collaborative Research Software Tutorial Course.
Modifications Overview
This material is made available under the Creative Commons Attribution license. The following is a human-readable summary of (and not a substitute for) the full legal text of the CC BY 4.0 license.
You are free:
for any purpose, even commercially.
The licensor cannot revoke these freedoms as long as you follow the license terms.
Work is derived from work that is Copyright © Nadine Spychala, 2023
Except where otherwise noted, the example programs and other software provided by Nadine Spychala are made available under the OSI-approved MIT license.
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal with the Software without restriction (i.e., without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software), subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.