Units in scientific writing

# Units in scientific writing Author: Frank --- ## The units in our simulations In any simulation we run, our code normally just working with numbers: it does not keep track of units. Unlike in experiments, where a distance is clearly measured in e.g. meters, nanometers, or kilometers, in a simulation the distance between two particles is simply stored as a number, like 1.2. When writing about your simulation results, it is then important to keep in mind what these numbers actually mean. Even in simulations, distances are measured in units of something, and you have made some sort of choice of what that unit is when implementing the simulation. Let us say, for example, that we have a simple Monte Carlo code of a system of interacting particles. As an example, we can use the Lennard-Jones potential, where the pairwise potential is given by $$u(r_{ij}) = \epsilon \left(\left(\frac{\sigma}{r}\right)^{12} - \left(\frac{\sigma}{r}\right)^{6} \right).$$ Here, $\sigma$ is a particle size (it is a length!) and $\epsilon$ indicates the strength of the interaction (and has dimensions of energy). So what kind of numbers would we use for $\sigma$ and $\epsilon$? In principle, you could consider some realistic simulation you are trying to model, and come up with reasonable choices for the particle size $\sigma$ in SI units (i.e. meters) and use this in your simulation. But this typically leads to rather awkward units, and often we do not really have a specific system in mind anyway. So in most cases, this is not the ideal choice. Instead, as the equation already suggests, it is practical in your simulation to ``set $\sigma$ to 1'', which basically means we use $\sigma$ as the unit of length. This has the nice effect that relevant distances are usually nice, easy numbers to deal with: interesting behavior tends to happen on length scales on the order of $\sigma$, not on the order of $10^9 \sigma$ or $10^{-9} \sigma$. Then what about the energy? Well, in this system, there are two logical energy scales: there is the $\epsilon$ in the interaction potential, and (as Monte Carlo simulations imply a system where a temperature plays a role) there is the thermal energy scale $k_B T$. In fact, since these are the *only* two energy scales that exist in the system, the only thing that matters for the behavior of the system is their ratio: $\epsilon / k_B T$. Since we are doing Monte Carlo, the energy comes into play in the acceptance rule, where the acceptance probability is governed by the Boltzmann factor $\exp(-\beta U)$, with $\beta = 1/k_B T$ and $U$ the total energy. So if we keep $\epsilon$ and $k_B T$ as two separate variables in our code, we would get the same configurations regardless of whether we set ($k_B T = 1$ and $\epsilon = 2$) or ($k_B T = 0.5$ and $\epsilon = 1$). To simplify this, we then have two options: 1) We can choose to use $\epsilon$ as our unit of energy. Hence, the function that calculates the interaction energy between two particles now actually calculates $u / \epsilon$. In the acceptance rule we get the correct Boltzmann factor by multiplying the total energy by $\epsilon / k_B T$. (Since $U/ \epsilon * \epsilon / k_B T = \beta U$.) 2) More commonly, we choose to use $k_B T$ as our unit of energy. Hence, the function that calculations the interaction energy between two particles now calculates $\beta u$, which is proportional to the dimensionless parameter $\beta \epsilon$. Note that both of these cases are fully equivalent in terms of the system being simulated. However, when we measure something, it is likely that the output is given in the same units as the simulation (although you can choose to convert as desired). For example, when measuring the total potential energy of the system, a simulation using $\epsilon$ as the unit of length will typically output $U / \epsilon$, while a simulation using $k_BT will typically output $\beta U$. These can be easily converted between each other via the parameter $\beta \epsilon$, though, which is a simulation parameter in both cases. ## Units in writing The next step is to consider the best way of reporting our data when writing a thesis or paper. The main goal is to make sure that when reporting any number (in text or in figures), it is always clear what the relevant units are. There is some freedom of choice in how to deal with units in a thesis or paper, but in all cases it is important to be careful. In my view, there are two acceptable options: 1) Consistently report units and/or use explicit dimensionless quantities whenever numbers are reported 2) Define dimensionless quantities at the start of the paper, and then use these throughout. I will describe both them below. (Note: my personal strong preference is for method 1!). ### 1) Consistently reporting units [Recommended!] This approach is the most straightforward. Essentially, this means that whenever e.g. a length or energy is reported, the unit is included. You can choose on which side of the equals sign you put the unit: * We perform a simulation at a density $\rho \sigma^3 = 0.8$, at a temperature $k_B T / \epsilon = 1.25$. * We perform a simulation at a density $\rho = 0.8 \sigma^{-3}$, at a temperature $T = 1.25 \epsilon/k_B$. Both of these are fine, although in the case of temperature $k_B T / \epsilon = 1.25$ is perhaps slightly easier to read. Similarly, you can report other quantities: * A pressure of $\beta P \sigma^3 = 0.4$ (OR $P \sigma^3 / \epsilon = 0.6$, depending on what units are the most relevant for your work) * The maximum interaction range is $2.5 \sigma$. * The average energy per particle is $\langle U \rangle/N = -0.5 \epsilon$. And in plots, you make sure that the labels on the axis are always dimensionless quantities ($\beta \epsilon$, $\rho \sigma^3$, etc.). This approach has two advantages. First, anyone looking at a figure or reading part of the text without first looking at the section where you define your units can intuitively understand them as long as they know or can guess what the symbols mean. Second, it means that every single equation in your work can be sanity-checked by making sure that both sides of the equation have the same units. This can be an easy way of spotting e.g. a missing factor of $\sigma$ in your equations. The downside is that it may be slightly cumbersome to repeat these units, but this is a small price to pay. However, it may be convenient to define some more complex units up front. For example, when doing dynamical simulations (e.g. molecular dynamics (MD)) instead of Monte Carlo, you may end up with complex-looking time units. In MD, a natural time unit to use after you have defined a particle mass $m$ is $\sqrt{\beta m \sigma^2}$ (although $\sqrt{m \sigma^2 / \epsilon}$ is another option), similar to how a second can be written as a $\sqrt{\mathrm{kg} \, \mathrm{m}^2/\mathrm{J}}$. Hence, a logical choice would be to define $\tau = \sqrt{\beta m \sigma^2}$ and then plot time-dependent quantities as a function of $t/\tau$. ### 2) Pre-defining dimensionless quantities. The second method involves defining dimensionless quantities up front, typically in your methods section. For example, one might define $\tilde{T} = k_B T / \epsilon$, and $\tilde{\rho} = \rho \sigma^3$, and then use $\tilde{T}$ and $\tilde{\rho}$ to specify numbers later: * We perform a simulation at a density $\tilde{\rho} = 0.8$, at a temperature $\tilde{T} = 1.25$. You can do the same for any other quantities you might need (e.g. pressure $\tilde{P}$). On any plots, you can directly use these quantities in the axes labels. This looks a bit cleaner, but one may have to check back to see what these quantities are ("was $\tilde{P} = \beta P \sigma^3$ or $P\sigma^3 / \epsilon$?"), and it does not work as nicely for doing dimensional analysis on your equations, since in practice almost everything will be dimensionless by definition. ### Final notes One thing that I would particularly like you to avoid (even if it does sometimes show up in published papers) is to write anything that looks like this: * We set $\sigma = 1$ in our simulations * In the following, we set the interaction energy $\epsilon$ to unity * $u(r) = 1/r^{12}$ * We simulate at temperature $T = 0.5$ Whether in word or equation form, all of these are essentially stating an identity between two quantities of different units. The exception is when these symbols are first defined as dimensionless, but I would avoid doing that with the ``standard'' symbols, such as $T$, $u$, $P$, etc. Also note that if you are working in ``real'' units for some reason (meters, Joules, etc.), then these can be substituted for the quantities used above. Still, lengths should then be given in meters (or nm, mm, etc.), and so on: make sure any equality still has matching units on the two sides!