# DNA and integrases (with Jan and Willem)
###### tags: `PhD projects` `DNA`
---
### Owners (the only one with the permission to edit the main text, others can comment)
Marjolein, Laura
---
## To do
- [ ] ...
---
## Background and experimental numbers
We want to make a toy model for studying a system consisting of DNA strands and integrases.
Some useful experimental numbers:
- 1 DNA base pair (bp) is around 0.34 nm long.
- The persistence length of DNA is approximately 40 nm, which corresponds to 118 bp.
- The DNA strands they studied consist of 3400 bp, 4800 bp, and 9100 bp, which corresponds to approximately 28.9, 40.8, and 77.4 persistence lengths, respectively.
- The binding site for an integrase to the DNA strand is approximately 1/10 persistence length.
- The "rosettes" that form probably contain something like 100 intergrases in their centers.
Some notes for the toy model:
- We use a Monte Carlo simulation in the $NVT$ ensemble.
- We simulate the DNA strand as a string of "hard" beads (WCA potential) connected via springs. Additionally, we add some bending potential to tune the stiffness of the strand.
- We make the beads the same size as the binding site for the integrase, so 1/10 persistence length. This means that the DNA strands are 289, 408, and 833 beads long.
- The integrases will be simulated as "hard" spheres (WCA potential), which can bind some finite number of DNA beads via some binding interaction.
### The potentials
For the toy model we use four different (interaction) potentials:
1. **WCA potential** for the interaction between the DNA beads and the interaction between integrases
$$
\phi_\text{WCA}(r) = \begin{cases}
4\epsilon \;\left( \left(\frac{\sigma}{r}\right)^{12} - \left(\frac{\sigma}{r}\right)^6 + 1/4 \right) &\quad\text{ for } r<2^{1/6}\sigma,\\
0 &\quad\text{ else,}
\end{cases}
$$
where $\epsilon$ is the interaction strength, and $\sigma$ is the diameter of the bead (or integrase).
2. **Finitely extensible nonlinear elastic (FENE) potential** for the springs connecting the DNA beads
$$
\phi_\text{FENE}(r) = \begin{cases}
-\frac{1}{2} k_\text{FENE}\; r_\text{max}^2 \log\left(1-\left(\frac{r}{r_\text{max}}\right)^2 \right) &\quad\text{ for } r<r_\text{max},\\
\infty &\quad\text{ else,}
\end{cases}
$$
where $k_\text{FENE}$ is the spring constant, and $r_\text{max}$ is the maximum extension of the FENE spring.
3. **Bending potential**
$$
\phi_\text{bend}(\theta) = k_\text{bend} (1-\cos\theta),
$$
where $k_\text{bend}$ is the spring constant for bending, and $\theta$ is the angle between successive bonds, i.e.
$$
\cos\theta = \frac{(\mathbf{r}_{i}-\mathbf{r}_{i-1}) \cdot (\mathbf{r}_{i+1}-\mathbf{r}_{i})}{|(\mathbf{r}_{i}-\mathbf{r}_{i-1}) | |(\mathbf{r}_{i+1}-\mathbf{r}_{i})|},
$$
where $\mathbf{r}_i$ is the position of bead $i$.
5. **Binding potential** between DNA beads and integrases
$$
\phi_\text{bind}(r) = \begin{cases}
4\epsilon_\text{bind} \;\left( \left(\frac{\sigma'}{r}\right)^{2n} - \left(\frac{\sigma'}{r}\right)^n - \left(\frac{\sigma'}{r_{cut}}\right)^{2n} + \left(\frac{\sigma'}{r_{cut}}\right)^n \right) &\quad\text{ for } r<r_\text{cut},\\
0 &\quad\text{ else,}
\end{cases}
$$
where $\epsilon_\text{bind}$ determines the depth of the attraction, $n$ determines the width of the attraction, and $r_\text{cut}$ is just a cutoff distance for the potential such that it has a finite range for simulating. Here $\sigma'=(\sigma+\sigma_I)/2$, with $\sigma$ and $\sigma_I$ the diameter of a DNA bead and an integrase, respectively.
Note that for $n=6$ the above potential is just the Lennard-Jones potential.
### A similar model from literature
In **[Ryu, Dekker et al. (2021)](https://www.science.org/doi/10.1126/sciadv.abe5905)** they studied the clustering (or phase separation) in a mixture of DNA strands and cohesin molecules. They concluded that the clustering is strongly dependent on the DNA length. Their model is very similar to our model, except the binding potential between the DNA beads and the integrase.
Some notes on their model:
- They use LAMMPS to simulate a MD simulation in the $NVT$ ensemble.
- They model DNA as a string of beads of size $\sigma$ = 2.5 nm (=7.3bp) with a persistence length $l_p=20\sigma$ (=50nm=150bp). The DNA beads interact via a WCA potential, with strength $\epsilon=0.7k_BT$. The beads are connected by FENE springs with a maximum extension of $r_\text{max}=1.5\sigma$ and spring constant $k_\text{FENE}=30\epsilon/\sigma^2$. The persistence length is introduced via a bending potential with bending constant $k_\text{bend}=20k_BT$ $(=28.6\epsilon)$.
- The cohesin molecules are modelled as spheres of size $\sigma_b$ = 25nm (also interacting via WCA with $\epsilon=0.7k_BT$) and they are decorated by 2 or 3 attractive patches, or are made fully attractive. The patches interact with the DNA beads via the Morse potential
$$
U_\text{Morse} (r) = D \;( e^{-2\alpha(r-r_0)} - 2e^{\alpha(r-r_0)} ) \quad\text{ for } r<r_c,
$$
where $D=40k_BT$, $\alpha=5\sigma^{-1}$, $r_0=0$, and $r_c=3\sigma$. This potential is strong and very short ranged, such that only one patch can bind to a specific DNA bead.
---
## Bare DNA
We start by making a model of a single DNA strand to tune the parameters $\epsilon$, $k_\text{FENE}$, $r_\text{max}$, and $k_\text{bend}$ such that we find a persistence length corresponding to the experimental system.
### Tuning the persistence length
To get a first feeling for the persistence length, I just used the parameters reported in **[Ryu, Dekker et al. (2021)](https://www.science.org/doi/10.1126/sciadv.abe5905)**, i.e. $\beta\epsilon=0.7$, $\beta k_\text{FENE}\sigma^2=21.0$, and $r_\text{max}/\sigma=1.5$, and tuned the bending potential.
However, before we take a look at the persistence length, let us first consider the average bond length $\langle b \rangle/\sigma$, which is the average center-to-center distance between adjacent beads. $\langle b \rangle/\sigma$ can be found from the minimum of the WCA + FENE potential. Underneath you can see the WCA, FENE, and total potential for $\beta\epsilon=0.7$, $\beta k_\text{FENE}\sigma^2=21.0$, and $r_\text{max}/\sigma=1.5$. The minimum of the dashed black line is located at $r/\sigma=0.9609$. Thus, eventhough the average bond length will be slightly affected by the bending potential, it will be approximately equal to this value.

Now, lets turn to the persistence length. Underneath you can see the persistence length $l_p/\langle b\rangle$ as a function of the bending constant $\beta k_\text{bend}$, as well a typical snapshot of the strand ($N=289$) for three different $\beta k_\text{bend}$. The average bond length $\langle b\rangle$ between the beads was $\langle b\rangle/\sigma=0.9742$ for all the simulations, with a maximum difference of 0.04% due to the effect of stiffness.

|  |  |  |
| -------- | -------- | -------- |
| $\beta k_\text{bend}=2.0$ | $\beta k_\text{bend}=11.0$ | $\beta k_{bend}=20.0$ |
To determine the above persistence lengths, I used strands consisting of both $N=289$ and $N=150$ bead to rule out any effect of string length. I ran 20 independent simulations for each $N$ and $\beta k_\text{bend}$. For each simulation, I initialized the strand as a random walk with a fixed stepsize equal to $\sigma$ and with a uniform distribution of $\theta \in [0,\theta_\text{max}]$, where $\theta_\text{max}$ is chosen in such a way that $\langle \cos\theta\rangle_{\theta_\text{max}}=e^{-\langle b\rangle/l_p}$, such that it already approximately matches the expected stiffness of the strand. I did all this to reduce the equilibration time of the simulations. Note that for this I just made the guess that $\langle b \rangle\approx \sigma$ and $l_p/\langle b \rangle\sim \beta k_\text{bend}$.
As an example, underneath I show for $\beta k_\text{bend}=12.0$ the mean-square of the end-to-end distance $\langle R^2\rangle$ as a function of the number of beads $n$ for $N=289$ and $N=150$. The dashed, black line is the fit to the equation
$$
\langle R^2(n) \rangle/\langle b\rangle^2 = 2n l_p/ \langle b\rangle \left( 1 - \frac{l_p}{n \langle b\rangle} \left[ 1 - e^{-n \langle b\rangle/l_p}\right] \right),
$$
with fitting parameter $l_p/\langle b\rangle$.

---
## DNA with integrases
Next, we add the integrases to the model.
### First test -- no restriction on # bonds
For a first test, we look at the situation where the integrases and DNA beads are of equal size, i.e. $\sigma_I=\sigma$ (consequently $\sigma'=(\sigma+\sigma_I)/2=\sigma$), and we do not restrict the number of DNA beads a single integrase can bond. I take the same values for the parameters as in the previous section, i.e. $\beta\epsilon=0.7$, $\beta k_\text{FENE}\sigma^2=21.0$, and $r_\text{max}/\sigma=1.5$, and take $\beta k_\text{bend}=11.0$ to get a persistence length of approximately 10 beads.
As brievely noted in the background section, the power $n$ in the binding potential tunes the width of the attraction of the bindin potential. To illustrate this, underneath you can see the binding potential for five different values of $n$, where $n=6$ denotes the regular Lennard-Jones (LJ) potential. The minimum of this potential is located at $r_\text{min}=2^{1/n}\sigma'$.

Since the regular LJ potential felt too "wide" to me, I decided on $n=12$ (with $r_{cut}=1.6\sigma'$) and $n=18$ (with $r_{cut}=1.3\sigma'$) to do some test simulations with. I start each simulation with an equilibrated DNA strand of $N$ beads and add $M$ integrases randomly to the system. To make sure that the binding is quite strong, I decided to use $\beta\epsilon_{bind}=4.0$ for these runs.
Underneath are some snapshots for $N=289$ and $N_I=500, 1000,$ or $1500$. The simulation box is $(75\sigma)^3$, so the concentration of integrases for these cases are $\rho\sigma^3=0.0012, 0.0024,$ and $0.0036$. In these snapshots, the DNA beads are colored light blue, the non-binded integrases red, and the binded integrases dark blue.
$n=12$ (with $r_\text{cut}=1.6\sigma'$):
| $N_I$ | Start | Mid | End |
| -------- | -------- | --- | -------- |
| 500 |  |  |  |
| 1000 |  |  | |
| 1500 | |  |  |
For all $N_I$ we see that the DNA strand first forms one or a couple big loops, these loops then close by "zipping up". Then they further "zip up" with other parts such that we end up with one big DNA-integrase blob (more like a rope). By increasing $N_I$ this process only goes faster. To better illustrate this rope-like structure, see the zoom-in of the final configuration for $N_I=1000$ underneath.

$n=18$ (with $r_\text{cut}=1.3\sigma'$):
| $N_I$ | Start | Mid | End |
| -------- | -------- | --- | -------- |
| 500 |  |  |  |
| 1000 | |  |  |
| 1500 |  |  |  |
Above we looked at situations with $n=18$, which has a more narrow binding potential than the previous case of $n=12$. Here we see that the system has more difficulty forming these DNA-integrase "zipper" parts. For the lowest concentration, the system does not even form a zipper within the relative short simulation time ($5\cdot10^6$ MC steps) of this first test run. It can very well be the case that, because the attraction of the binding potential is more narrow, the system just needs more time to "zip up". To check this hypothesis, I also ran some simulations for a total time of $5\cdot10^7$ MC steps. Within this simulation time, also the systems with the lowest concentration of integrases have enough time to "zip up". To get a feeling how much "time" a MC step is, the free diffusion times for these systems are around 100 MC steps.
*Note that, although I only show snapshots of one starting configuration here, I did all simulations for 8 different starting configurations of the DNA strand in total. All show the same behavior as the results shown above.*
### Restricting # bonds
In the previous section, we saw that the DNA eventually forms one big DNA-integrase blob/rope, where many integrases bond 8 or even more DNA beads. Hence, a possible reason for why we only observe this formation and not also stable loop formations could be the fact that the integrases are not restricted in the number of DNA beads that they bond. One can imagine that this blob/rope configuration maximizes the number of bonds per integrases and thus optimizes the energy gain from these bonds.
Thus, next we want to implement that we can restrict the number of bonds formed by a single integrase. To this end, we introduce the swapping potential
$$
\phi_\text{swap}(r) = \begin{cases}
-\phi_\text{bind}(r_\text{min}) &\qquad \text{ for } r<r_\text{min},\\
-\phi_\text{bind}(r) &\qquad \text{ else}.
\end{cases}
$$

This potential perfectly cancels out the energy gain of forming a bond without messing with the repulsive part. By using $\phi_\text{swap}$, we can ensure that (potential) bonds can freely swap, i.e. swap without any energy cost or gain. To illustrate how this works in practice, assume that you want to restrict the maximum number of bonds that each integrase can form to $n_\text{max}$. For each integrase we then only take into account the energy gain of the the $n_\text{max}$ shortest bonds. To do this, you have to find for each integrase all DNA beads within the distance $r_\text{cut}$ and sort them according to their center-to-center distance. The total energy of an integrase $i$ is then given by
$$
\sum_{j=1}^{n_\text{bond}} \phi_\text{bind}(r_{ij}) + \sum_{j=n_\text{max}+1}^{n_\text{bond}} \phi_\text{swap}(r_{ij}),
$$
where $\sum_{j=1}^{n_\text{bond}}$ is the sum of the sorted list of $n_\text{bond}$ DNA beads $j$ with $r_{ij}<r_\text{cut}$.
I did some test runs for $n_\text{max}=2,3,$ and 4 for a couple of DNA strands of $N=289$ beads and for $N_I=500$ and $N_I=1000$ integrases ($\rho\sigma^3=0.0012, 0.0024$) with $n=12$ (with $r_\text{cut}=1.6\sigma'$). Underneath are some typical snapshots for what happens. *These simulations ran for $5\cdot10^7$ MC steps.*
In general, I see that for $n_\text{max}=2$ the DNA strands are not really affected by the integrases, for $n_\text{max}=3$ the DNA strands sometimes forms some "loops", and for $n_\text{max}=4$ the DNA strands form a quite loose blob/rope structure.
| $n_\text{max}=2$ | $n_\text{max}=3$ | $n_\text{max}=4$ |
| -------- | -------- | -------- |
|  |  |  |
---
### "Good guesses" for integrase concentration and interaction potential
The integrase concentrations used in the experiments range from 0 nM to 2000 nM. Let's transform this to useful numbers for our simulations. For the simulations, we use the diameter of the DNA bead $\sigma=l_p/10\approx 4$ nm as a length scale. Then, if we, e.g., have an integrase concentration of $c=500$ nM, the correspoding number density in our simulations would be
$$ \rho\sigma^3 = c \cdot N_A \cdot \left(4 \cdot 10^{-8}\right)^3 \approx 1.9 \cdot 10^{-5}, $$
where $N_A$ the Avogrado constant.
To put this number density into perspective, for the snapshots shown in the previous section, the box had a length of $75\sigma$. A number density of $1.9 \cdot 10^{-5}$ would then result in a total of 8 integrases in the entire box... So the 500 and 1000 integrases that we used above were a bit of an overkill probably.
Next, we need to find good numbers for the binding interaction length, which is mediated by $n$, and the strength, which is directly related to $\beta\epsilon_\text{bind}$. We start with determining a good guess for $n$. In experiments, the maximum interaction distance for the binding interaction between DNA and integrase is about 0.5 nm, which for our DNA beads of diameter $\sigma\approx 4$ nm is approximately $0.12 \sigma$. The binding potential for some different $n$ is show below (left), where the vertical gray line indicates the maximum interaction distance from experiments, i.e. $r=1.12\sigma$. Looking at this figure, we see that something like $n=18$ or $n=24$ would be a good guess for $n$. This is already fine, but to get a more qualitative guess for $n$, I approximated the binding potential as a square well potential (SW) with depth $-\epsilon_\text{bind}$. I calculated the width $\Delta_\text{SW}$ of the square well potential by equating the area of the attractive part of both potentials. This results in
$$ \Delta_\text{SW}/\sigma = \frac{4 n}{(2n-1)(n-1)},
$$
which for $n=18$ is equal to $0.12$. In the figure below on the right, you can see the binding potential of $n=18$ and its corresponding square well approximation. Thus, to match with experiments, we will use $n=18$ and take $r_\text{cut}=1.4\sigma$.
|  |  |
| -------- | -------- |
| Binding potential for different $n$ | $n=18$ with square well approx. |
To determine a good guess for the interaction strength $\epsilon_\text{bind}$, we use that the dissociation constant $K_d$ of the binding interaction is around 500 nM for experiments. This is related to the free energy change $\Delta F$ associated with binding via
$$ K_d = \frac{1}{N_A V} \; e^{\beta\Delta F}.
$$
For a square well potential, this free energy change can be approximated as
$$ \beta\Delta F = -\beta\epsilon_\text{bind} - \ln\left(\frac{V_b}{V}\right),
$$
where $V_b$ is the so-called bonding volume, which for a square well potential is given by $V_b=\frac{4\pi\sigma^3}{3} \left((1+\Delta_\text{SW})^3 - 1 \right)$. Then solving for $\beta\epsilon_\text{bind}$, we find
$$ \beta\epsilon_\text{bind} = - \ln(K_d N_A V_b).
$$
Plugging in our numbers, we find $\beta\epsilon_\text{bind}\approx10$.
### Second test -- binding strength $\beta\epsilon_\text{bind}=10$
Using these newly determined numbers for the binding interaction, I started new simulations of DNA strands of $N=289$ beads (=3.4 kbp) and $N=408$ beads (=4.8 kbp) and used integrase concentrations {250, 500, 1000, 1500, 2000, 4000} nM. For all simulations I used a simulation box of $(220\sigma^3)$, such that the concentrations correspond to $N_I=$ {103, 205, 410, 616, 821, 1642}. This choice for the box size is sort of my compromise between fast enough simulations and having enough integrases such that the binding of some integrases to the DNA strand does not influence the global integrase concentration too much. Furthermore, I looked at the cases where the number of bonds per integrase is restricted to $n_\text{max}=3$, $n_\text{max}=4$, and no restrictions ($n_\text{max}=\infty$).
*These simulations ran for (at least) $10^8$ MC steps.*
For the low concentrations, i.e. 250 nM and 500 nM, I find that the DNA strands quickly "catches" all integrases in the box and wraps itself around them. For the higher concentrations, i.e. $\geq1000$ nM, I find that most DNA strands are just "coated" in integrases without being deformed much. Two examples of this are shown in the snapshots below for $n_\text{max}=\infty$, but $n_\text{max}=3$ and $n_\text{max}=4$ give similar results. The DNA beads are colored light blue, the free integrases red, and the bonded integrases dark blue.
Furthermore, notice that the binding of the integrases to the DNA strand just happens so fast, that the DNA strand does not even have time to change its configuration/shape due to thermal fluctuations. This is not what we saw before, i.e. with the previous tests on DNA-integrase systems, and it is also not something that we want. The reason for this change in behavior is the strong interaction strength. For the previous tests I used $\beta\epsilon_\text{bind}=4$, whereas here I used $\beta\epsilon_\text{bind}=10$. With $\beta\epsilon_\text{bind}=4$, integrases are still able to detache from the DNA strand, whereas with $\beta\epsilon_\text{bind}=10$, they do not detache. I, however, still think that a binding energy of $10k_BT$, as deduced in the previous section, is approximately okay. The important thing to note here, is that I mean the **total** binding energy. In our model, when an integrase binds to the DNA strand, it will always try (and almost always suceed) to bind right between two DNA beads, such that it binds to both. As a result, the integrase experiences a binding energy of $2\epsilon_\text{bind}$. Hence, the energy gain of binding to a single DNA bead should be $5k_BT$, and thus $\beta\epsilon_\text{bind}=5$.
| | Start | End |
| -------- | -------- | -------- |
| $N=408$, <br> $c_I=500$ nM, <br> $n_\text{max}=\infty$ |  |  |
| $N=408$, <br> $c_I=2000$ nM, <br> $n_\text{max}=\infty$ |  | |
### Third test -- binding strength $\beta\epsilon_\text{bind}=5$
In the previous section, I concluded that we should have used $\beta\epsilon_\text{bind}=5$ instead of $\beta\epsilon_\text{bind}=10$. Hence, I did the same simulations again, but now with $\beta\epsilon_\text{bind}=5$.
*These simulations ran for $5\cdot10^7$ MC steps.*
I do not observe a difference in the behavior of $N=289$ beads (=3.4 kbp) and $N=408$ beads (=4.8 kbp) strands, so I will not distinguish between these two cases when discussing the results. Underneath I discuss the behavior per concentrations and also point out the difference between no restrictions on the number of bonds per integrase ($n_\text{max}=\infty$) and restricting it to 4 bonds ($n_\text{max}=4$).
**250 nM**
For all simulations, both $n_\text{max}=\infty$ and $n_\text{max}=4$, "nothing" happens. With this I mean that sometimes integrases bind to DNA strand, but they relatively quickly unbind again and hence do not cause any clustering.
**500 nM**
For $n_\text{max}=4$, still nothing happens.
For $n_\text{max}=\infty$, in around half of the simulations, nothing happens, and in the other half a portion of the DNA strand froms a loop and then zips up. An example of this is shown below for a strand of $N=408$ beads. As you can see, first a loops is formed due to two bonded integrases. Then this loops starts to "zip up", i.e. more integrases bind to both sides of the loop. Finally, the loop is fully zipped up and we even see another loop being formed and started to zip up at a different part of the strand.
|  |  |  |
| -------- | -------- | -------- |
| Loop is formed | Starts zipping up | Zips up entirely |
**1000 nM**
For $n_\text{max}=4$, every once in a while, big loops are formed, but they do not have the opportunity to zip up before disappearing again. So they just come and go.
For $n_\text{max}=\infty$, same story as for 500 nM. But now all simulations show parts of the DNA strand that are being zipped up. Furthermore, we see that the DNA strand starts to wrap itself along the already zipped up part to form bigger blobs of zipped up DNA. An example of this is shown below for a strand of $N=289$ beads. From left to right and top to bottom, we see
1. an already zipped up part of DNA with the "free" end protruding from the bottom
2. the free end starts to wrap itself upwards along the zipped up part
3. the free end reaches the top of the zipped up part
4. the free end now starts its way down again
5. it reaches the bottom
6. and it starts its way up again
This process continues more times until it is fully wrapped up.
| |  |  |
| -------- | -------- | -------- |
|  |  |  |
**1500 nM**
For $n_\text{max}=\infty$, same behavior as for 1000 nM.
For $n_\text{max}=4$, for some simulations the behavior is the same as for 500 nM, i.e. loops forming and disappearing again such that there is no clustering in the end. For other simulations, some loops manage to survive long enough to sort of close up and cause clustering. The closing up and clustering, however, is not as dense as for $n_\text{max}=\infty$. An example of such a process is shown below ($N=408$). From left to right and top to bottom, we see
1. some first loops forming
2. the loops start to close up
3. the loops are closing up more and the protruding tail also starts to close up
4. the main cluster closes up more, but the tail opens up again
When you follow the simulation for longer, the main cluster keeps changing chape and actually also opens up a bit again (sort of like in 3.).
|  |  |
| -------- | -------- |
|  |  |
**2000 nM and 4000 nM**
For $n_\text{max}=\infty$, same behavior as for 1000 nM and 1500 nM.
For $n_\text{max}=4$, similar behavior as for 1500 nM, except now all simulations form loops that close up such that we end up with clusters in the end. An example of such a clustering process is shown below ($N=408$, 2000 nM). Again, read it from left to right and top to bottom.
|  |  |  |
| -------- | -------- | -------- |
|  |  |  |
**Conclusion**
Using $\beta\epsilon_\text{bind}=5$ instead of $\beta\epsilon_\text{bind}=10$ changed the behavior of the system back to something like is seen in experiments. For low concentrations of integrases, no clusters are formed, same as in experiments. For higher concentrations, we see a different in the behavior between no restrictions on the number of bonds per integrase ($n_\text{max}=\infty$) and restricting it to 4 bonds ($n_\text{max}=4$). For $n_\text{max}=\infty$, we see that the loops that are formed quickly zip up into very dense, and regular clusters of DNA and integrase. For $n_\text{max}=4$, we see that the loops often break up again before being able to zip/close up, especially for the middle range of concentrations (1000-1500 nM). When the loops do manage to close up, the system in the end form a very loose cluster. Especially 1500 nM is interesting, as the clusters that do form at that concentration often have some bare DNA loops protruding from it. This offers hope that, when we turn some attraction between the integrases, we will get similar "rosette" formation as in experiments.
Furthermore, based on these test simulations, I expect that we can use the binding strength $\beta\epsilon_\text{bind}$ to tune the concentration at which the different phase transitions, i.e. no clustering - rosette formation - full collapse, happen.
---
### Fourth test -- adding attraction between integrases
In the previous section, we saw that a binding strength of $\beta\epsilon_\text{bind}=5$ in combination with restricting the number of bonds per integrase to $n_\text{max}=4$ gives already promising results. In this section, we try adding some attraction between the integrases.
To add attraction between the integrases, I just changed the radial cutoff of the Lennard-Jones potential between the integrases from $2^{1/6}\sigma$, which just gives the repulsive WCA potential for the excluded volume interactions, to $3.0\sigma$. This makes sure that we keep the excluded volume interaction, but also adds some attraction. So far, I tried three different values for the attraction strength: $\beta\epsilon_I=0.7$ (which it already was for the WCA potential), $\beta\epsilon_I=1.0$, and $\beta\epsilon_I=1.5$.
To organize the results, I have decided to put my observations into a table for each attraction strenght, with each column indication a different DNA length, i.e. [150, 289, 408, 774] beads corresponding to [1765, 3400, 4800, 9100] base pairs, and each row indicating a different concentration of integrase, i.e. [250, 500, 750, 1000, 1250, 1750, 2000] nM.
In these tables you will find the following "codes" for my observations:
| Code | Meaning |
| -------- | -------- |
| <span style="padding: 6px; background-color: #c2beae"> N </span> | Nothing happens, the DNA strand is unaffected by the integrase. |
| <span style="padding: 6px; background-color: #c2beae"> N$^a$</span> | Nothing happens, but sometimes bridges are formed that break apart again. |
| <span style="padding: 6px; background-color: #ff9a57"> B </span> | Some bridges are formed, but not really a condesend cluster or rosette formation. |
| <span style="padding: 6px; background-color: #8cab55"> R </span> | Rossette formation. |
| <span style="padding: 6px; background-color: #729aad"> C </span> | Condensed cluster. |
| <span style="padding: 6px; background-color: #729aad"> C$^a$</span> | A cluster is formed, but it is not so condensed, but also not rosette-like. |
| <span style="padding: 6px; background-color: #729aad"> C$^b$</span> | Condensed cluster, but with a (long) bare DNA end still sticking out of it. |
| <span style="padding: 6px; background-color: #729aad"> C$^c$</span> | Two (or sometimes three) separate condensed clusters connected by a (long) bare DNA strand. |
**No attraction**
This is a summary of the results above, together with some more results.
| nM \ beads | 150 | 289 | 408 | 774 |
| ---- | --- | --- | --- | --- |
| **250** | <span style="padding: 6px; background-color: #c2beae"> N </span> | <span style="padding: 6px; background-color: #c2beae"> N </span> | <span style="padding: 6px; background-color: #c2beae"> N </span> | <span style="padding: 6px; background-color: #c2beae"> N </span> |
| **500** | <span style="padding: 6px; background-color: #c2beae"> N </span> | <span style="padding: 6px; background-color: #c2beae"> N </span> | <span style="padding: 6px; background-color: #c2beae"> N </span> | <span style="padding: 6px; background-color: #c2beae"> N </span> |
| **750** | <span style="padding: 6px; background-color: #c2beae"> N </span> | <span style="padding: 6px; background-color: #c2beae"> N </span> | <span style="padding: 6px; background-color: #c2beae"> N$^a$ </span> | <span style="padding: 6px; background-color: #c2beae"> N$^a$ </span> |
| **1000** | <span style="padding: 6px; background-color: #c2beae"> N </span> | <span style="padding: 6px; background-color: #c2beae"> N$^a$</span>/<span style="padding: 6px; background-color: #ff9a57"> B </span> | <span style="padding: 6px; background-color: #c2beae"> N$^a$</span>/<span style="padding: 6px; background-color: #ff9a57"> B </span> | <span style="padding: 6px; background-color: #c2beae"> N$^a$</span>/<span style="padding: 6px; background-color: #ff9a57"> B </span> |
| **1250** | <span style="padding: 6px; background-color: #c2beae"> N$^a$</span>/<span style="padding: 6px; background-color: #ff9a57"> B </span> | <span style="padding: 6px; background-color: #c2beae"> N$^a$</span>/<span style="padding: 6px; background-color: #ff9a57"> B </span> | <span style="padding: 6px; background-color: #c2beae"> N$^a$</span>/<span style="padding: 6px; background-color: #ff9a57"> B </span>/<span style="padding: 6px; background-color: #729aad">C$^a$ </span> | <span style="padding: 6px; background-color: #c2beae"> N$^a$</span>/<span style="padding: 6px; background-color: #ff9a57"> B </span>/<span style="padding: 6px; background-color: #729aad">C$^a$ </span> |
| **1500** | <span style="padding: 6px; background-color: #c2beae"> N$^a$</span>/<span style="padding: 6px; background-color: #ff9a57"> B </span> | <span style="padding: 6px; background-color: #ff9a57"> B </span>/<span style="padding: 6px; background-color: #729aad">C$^a$</span> | <span style="padding: 6px; background-color: #ff9a57"> B </span>/<span style="padding: 6px; background-color: #729aad">C$^a$</span> | <span style="padding: 6px; background-color: #ff9a57"> B </span>/<span style="padding: 6px; background-color: #729aad">C$^a$</span> |
| **1750** | <span style="padding: 6px; background-color: #ff9a57"> B </span>/<span style="padding: 6px; background-color: #729aad"> C$^a$</span> | <span style="padding: 6px; background-color: #729aad"> C/C$^{a}$ </span> | <span style="padding: 6px; background-color: #729aad"> C/C$^a$ </span> | <span style="padding: 6px; background-color: #729aad"> C/C$^a$/C$^b$ </span> |
| **2000** | <span style="padding: 6px; background-color: #729aad"> C/C$^a$</span> | <span style="padding: 6px; background-color: #729aad"> C/C$^a$ </span> | <span style="padding: 6px; background-color: #729aad"> C/C$^a$/C$^b$ </span> | <span style="padding: 6px; background-color: #729aad"> C/C$^a$/C$^b$ </span> |
Some examples of these structures:
|  |  |  |
| -------- | -------- | -------- |
| 289 beads, 1250 nM ( B ) | 408 beads, 1250 nM ( C$^a$ ) | 774 beads, 1250 nM ( C$^a$ ) |
|  |  |  |
| 150 beads, 2000 nM ( C ) | 408 beads, 2000 nM ( C ) | 774 beads, 2000 nM ( C ) |
**Attraction of $\beta\epsilon_I=0.7$**
| nM \ beads | 150 | 289 | 408 | 774 |
| ---- | --- | --- | --- | --- |
| **250** | <span style="padding: 6px; background-color: #c2beae"> N </span> | <span style="padding: 6px; background-color: #c2beae"> N </span> | <span style="padding: 6px; background-color: #c2beae"> N </span> | <span style="padding: 6px; background-color: #c2beae"> N </span> |
| **500** | <span style="padding: 6px; background-color: #c2beae"> N </span> | <span style="padding: 6px; background-color: #c2beae"> N$^a$ </span> | <span style="padding: 6px; background-color: #c2beae"> N$^a$ </span> | <span style="padding: 6px; background-color: #c2beae"> N$^a$ </span> |
| **750** | <span style="padding: 6px; background-color: #c2beae"> N$^a$ </span> | <span style="padding: 6px; background-color: #c2beae"> N$^a$ </span> | <span style="padding: 6px; background-color: #c2beae"> N$^a$</span>/<span style="padding: 6px; background-color: #ff9a57"> B </span> | <span style="padding: 6px; background-color: #c2beae"> N$^a$</span>/<span style="padding: 6px; background-color: #ff9a57"> B </span> |
| **1000** | <span style="padding: 6px; background-color: #c2beae"> N$^a$</span>/<span style="padding: 6px; background-color: #ff9a57"> B </span>/<span style="padding: 6px; background-color: #729aad">C$^a$ </span> | <span style="padding: 6px; background-color: #c2beae"> N$^a$</span>/<span style="padding: 6px; background-color: #ff9a57"> B </span>/<span style="padding: 6px; background-color: #729aad">C/C$^a$ </span> | <span style="padding: 6px; background-color: #c2beae"> N$^a$</span>/<span style="padding: 6px; background-color: #ff9a57"> B </span>/<span style="padding: 6px; background-color: #8cab55"> R </span>/<span style="padding: 6px; background-color: #729aad">C$^a$</span> | <span style="padding: 6px; background-color: #ff9a57"> B </span>/<span style="padding: 6px; background-color: #8cab55"> R </span>/<span style="padding: 6px; background-color: #729aad">C$^a$</span> |
| **1250** | <span style="padding: 6px; background-color: #c2beae"> N$^a$</span>/<span style="padding: 6px; background-color: #ff9a57"> B </span>/<span style="padding: 6px; background-color: #729aad">C </span> | <span style="padding: 6px; background-color: #729aad"> C/C$^a$/C$^b$ </span> | <span style="padding: 6px; background-color: #729aad"> C/C$^a$/C$^b$ </span> | <span style="padding: 6px; background-color: #8cab55"> R </span>/<span style="padding: 6px; background-color: #729aad"> C$^b$/C$^c$ </span> |
| **1500** | <span style="padding: 6px; background-color: #729aad"> C </span> | <span style="padding: 6px; background-color: #729aad"> C </span> | <span style="padding: 6px; background-color: #729aad"> C/C$^b$/C$^c$ </span> | <span style="padding: 6px; background-color: #729aad"> C/C$^b$/C$^c$ </span> |
| **1750** | <span style="padding: 6px; background-color: #729aad"> C </span> | <span style="padding: 6px; background-color: #729aad"> C </span> | <span style="padding: 6px; background-color: #729aad"> C/C$^b$/C$^c$ </span> | <span style="padding: 6px; background-color: #729aad"> C/C$^b$/C$^c$ </span> |
| **2000** | <span style="padding: 6px; background-color: #729aad"> C </span> | <span style="padding: 6px; background-color: #729aad"> C </span> | <span style="padding: 6px; background-color: #729aad"> C/C$^b$/C$^c$ </span> | <span style="padding: 6px; background-color: #729aad"> C/C$^b$/C$^c$ </span> |
Some examples of these structures:
|  | |  |
| -------- | -------- | -------- |
| 289 beads, 1000 nM ( C ) | 408 beads, 1000 nM ( R ) | 774 beads, 1000 nM ( R ) |
|  |  |  |
| 289 beads, 1500 nM ( C$^b$ ) | 408 beads, 1500 nM ( C ) | 774 beads, 1500 nM ( C ) |
**Attraction of $\beta\epsilon_I=1.0$**
| nM \ beads | 150 | 289 | 408 | 774 |
| ---- | --- | --- | --- | --- |
| **250** | <span style="padding: 6px; background-color: #c2beae"> N </span> | <span style="padding: 6px; background-color: #c2beae"> N </span> | <span style="padding: 6px; background-color: #c2beae"> N </span> | <span style="padding: 6px; background-color: #c2beae"> N </span> |
| **500** | <span style="padding: 6px; background-color: #c2beae"> N </span> | <span style="padding: 6px; background-color: #c2beae"> N$^a$ </span> | <span style="padding: 6px; background-color: #c2beae"> N$^a$ </span> | <span style="padding: 6px; background-color: #c2beae"> N$^a$ </span> |
| **750** | <span style="padding: 6px; background-color: #c2beae"> N$^a$ </span> | <span style="padding: 6px; background-color: #c2beae">N$^a$</span>/<span style="padding: 6px; background-color: #ff9a57">B</span>/<span style="padding: 6px; background-color: #8cab55">R</span>/<span style="padding: 6px; background-color: #729aad">C</span> | <span style="padding: 6px; background-color: #c2beae">N$^a$</span>/<span style="padding: 6px; background-color: #ff9a57">B</span>/<span style="padding: 6px; background-color: #8cab55">R</span>/<span style="padding: 6px; background-color: #729aad">C$^b$</span> | <span style="padding: 6px; background-color: #c2beae">N$^a$</span>/<span style="padding: 6px; background-color: #ff9a57">B</span>/<span style="padding: 6px; background-color: #8cab55">R</span>/<span style="padding: 6px; background-color: #729aad">C$^b$</span> |
| **1000** | <span style="padding: 6px; background-color: #c2beae"> N$^a$</span>/<span style="padding: 6px; background-color: #729aad"> C </span> | <span style="padding: 6px; background-color: #c2beae"> N$^a$</span>/<span style="padding: 6px; background-color: #729aad">C/C$^b$ </span> | <span style="padding: 6px; background-color: #8cab55"> R </span>/<span style="padding: 6px; background-color: #729aad">C$^b$ </span> | <span style="padding: 6px; background-color: #8cab55"> R </span>/<span style="padding: 6px; background-color: #729aad">C$^b$ </span> |
| **1250** | <span style="padding: 6px; background-color: #729aad"> C </span> | <span style="padding: 6px; background-color: #729aad"> C/C$^b$/C$^c$ </span> | <span style="padding: 6px; background-color: #729aad"> C/C$^b$/C$^c$ </span> | <span style="padding: 6px; background-color: #729aad"> C/C$^b$/C$^c$ </span> |
| **1500** | <span style="padding: 6px; background-color: #729aad"> C </span> | <span style="padding: 6px; background-color: #729aad"> C/C$^b$/C$^c$ </span> | <span style="padding: 6px; background-color: #729aad"> C/C$^b$/C$^c$ </span> | <span style="padding: 6px; background-color: #729aad"> C/C$^b$/C$^c$ </span> |
| **1750** | <span style="padding: 6px; background-color: #729aad"> C </span> | - | - | - |
| **2000** | <span style="padding: 6px; background-color: #729aad"> C </span> | <span style="padding: 6px; background-color: #729aad"> C/C$^b$/C$^c$ </span> | <span style="padding: 6px; background-color: #729aad"> C/C$^b$/C$^c$ </span> | <span style="padding: 6px; background-color: #729aad"> C/C$^b$/C$^c$ </span> |
Some examples of these structures:
|  |  |  |
| -------- | -------- | -------- |
| 289 beads, 750 nM ( R ) | 408 beads, 750 nM ( R ) | 774 beads, 750 nM ( R ) |
|  |  |  |
| 408 beads, 1000 nM ( R ) | 408 beads, 1000 nM ( C$^b$ ) | 774 beads, 1000 nM ( R ) |
|  |  |  |
| 289 beads, 1250 nM ( C ) | 408 beads, 1250 nM ( C$^c$ ) | 774 beads, 1250 nM ( C$^b$ ) |
**Attraction of $\beta\epsilon_I=1.5$**
| nM \ beads | 150 | 289 | 408 | 774 |
| ---- | --- | --- | --- | --- |
| **250** | <span style="padding: 6px; background-color: #c2beae"> N </span> | <span style="padding: 6px; background-color: #c2beae"> N$^a$ </span> | <span style="padding: 6px; background-color: #c2beae"> N$^a$ </span> | <span style="padding: 6px; background-color: #c2beae"> N$^a$ </span> |
| **500** | <span style="padding: 6px; background-color: #c2beae"> N$^a$ </span> | <span style="padding: 6px; background-color: #c2beae"> N$^a$</span>/<span style="padding: 6px; background-color: #8cab55"> R </span> | <span style="padding: 6px; background-color: #c2beae"> N$^a$</span>/<span style="padding: 6px; background-color: #8cab55"> R </span> | <span style="padding: 6px; background-color: #c2beae"> N$^a$</span>/<span style="padding: 6px; background-color: #8cab55"> R </span> |
| **750** | <span style="padding: 6px; background-color: #729aad"> C </span> | <span style="padding: 6px; background-color: #729aad"> C/C$^b$ </span> | <span style="padding: 6px; background-color: #8cab55"> R </span>/<span style="padding: 6px; background-color: #729aad"> C$^b$/C$^c$ </span> | <span style="padding: 6px; background-color: #8cab55"> R </span>/<span style="padding: 6px; background-color: #729aad"> C$^b$/C$^c$ </span> |
| **1000** | <span style="padding: 6px; background-color: #729aad"> C </span> | <span style="padding: 6px; background-color: #729aad"> C/C$^b$/C$^c$ </span> | <span style="padding: 6px; background-color: #729aad"> C/C$^b$/C$^c$ </span> | <span style="padding: 6px; background-color: #729aad"> C$^b$/C$^c$ </span> |
| **1250** | <span style="padding: 6px; background-color: #729aad"> C/C$^c$ </span> | <span style="padding: 6px; background-color: #729aad"> C/C$^c$ </span> | <span style="padding: 6px; background-color: #729aad"> C/C$^c$ </span> | <span style="padding: 6px; background-color: #729aad"> C$^c$ </span> |
| **1500** | <span style="padding: 6px; background-color: #729aad"> C </span> | <span style="padding: 6px; background-color: #729aad"> C/C$^c$ </span> | <span style="padding: 6px; background-color: #729aad"> C/C$^c$ </span> | <span style="padding: 6px; background-color: #729aad"> C$^c$ </span> |
| **1750** | <span style="padding: 6px; background-color: #729aad"> C/C$^c$ </span> | - | - | - |
| **2000** | <span style="padding: 6px; background-color: #729aad"> C/C$^c$ </span> | <span style="padding: 6px; background-color: #729aad"> C/C$^c$ </span> | <span style="padding: 6px; background-color: #729aad"> C/C$^c$ </span> | <span style="padding: 6px; background-color: #729aad"> C$^c$ </span> |
Some examples of these structures:
|  |  |  |
| -------- | -------- | -------- |
| 289 beads, 500 nM ( R ) | 408 beads, 500 nM ( R ) | 774 beads, 500 nM ( R ) |
|  |  |  |
| 408 beads, 750 nM ( R ) | 408 beads, 750 nM ( C$^c$ ) | 774 beads, 750 nM ( R ) |
|  |  |  |
| 289 beads, 1000 nM ( C$^b$ ) | 408 beads, 1000 nM ( C ) | 774 beads, 1000 nM ( C$^c$ ) |
**Conclusion**
When there is no attraction between the integrases, we see no rosette formation and the condensed clusters that are formed at high concentration are quite "loose". When we add attraction between the integrases, we start to see rosette formation and the condensed clusters that form after full collapse are actually condensed and no longer "loose". Thus, for the system to make rosettes, the attraction between the integrases is needed. However, at which concentrations the rosette formation and full collapse happen depends on the attraction strength. Without attraction, the transtion to full collapse happens around 1250-1500 nM, which is the same as in the experiments. But adding attractions shifts this to lower concentrations, which is not entirely surprising. To better reproduce the experimental results, we will have to play around more with the binding strength and attraction strength.
Up to a concentration of 1500 nM, I already quickly checked what the combinations $(\beta\epsilon_\text{bind}, \beta\epsilon_I)=$ (4.0,1.5), (3.0,1.5), and (3.0,3.0) would give. All of these do not form any bridges or clusters.
<span style="color:red">
Note that the integrase concentration [IN] reported for the experiments is the monomer concentration, whereas the integrases bind as tetramers to the DNA. Hence, all above reported [IN] must be multiplied by a factor of 4 to match the experimental values.
</span>
### Binding strength from experiments
In order to have one less parameter we need to tune, Willem determined the binding probability of the integrase to DNA. He did this by counting the number of integrases bound to the 1000 bp DNA strands at a concentration of 100 nM and 200 nM integrases. He found a binding probability of $1.7\cdot10^{-4}$ and $2.7\cdot10^{-4}$ for 100 nM and 200 nM, respectively. The concentration of the DNA in these experiments was 0.77 nM for DNA of 1000 bp.
I did simulations on a similar system to determine the binding probabily as a function of the binding strength. A 1000 bp DNA strand corresponds to 85 beads, and to get a concentration of 0.77 nM I used a cubic box with sides of $324\sigma$. Underneath are the results for [IN]=100 nM and 200 nM, corresponding to [IN4]=25 nM (33 integraes) and 50 nM (66 integrases). The binding probability in simulations matches that of the experiments at a binding strength between 4.5 to 5.0 $k_BT$. This is very close to the binding strength that I have been using, i.e. $5.0k_BT$. I therefor decided to keep using a binding strength of $5.0k_BT$ for the future simulations.
Note that the measured binding probability flattens out for the highest binding strenghts. The reason for this is that the binding strenght is so strong that the integrases are irreversibly bound. Hence, the DNA strand binds most of the integrase in the box during the (short) simulation.

### Semi-grand canonical simulations
As we now need to do simulations at 4 times lower integrase concentrations then we were doing, the full condensation of the DNA strand will start to deplete the concentration of free integrase in simulation box significantly. We can work around this by simulating the integrase in the grand-canonical $\mu VT$ ensemble instead of the canonical $NVT$ ensemble. This requires the addition of integrase insertion and removal moves. To prevent these moves from messing with the integrases bound to the DNA strand, we limit the volume from which integrases can be removed or inserted into. More specifically, we exclude the spherical volume surrounding the center of mass of the DNA strand with a radius equal to the distance to the furthest DNA bead plus an additional "padding" of 5 times the binding distance.
Note: I checked with an ideal gas that excluding this varying sperical volume does not mess with the $\mu VT$.
I use an $NVT$ simulation of 500 integrases (no DNA) to determine the chemical potentials $\mu$ for the different integrase concentrations.
To illustrate that this semi-grand canonical set up works well, underneath I show for a DNA strand of 289 beads (3.4 kbp) and an integrase concentration of [IN]=1600 nM (so [IN4]=400 nM) the number of total, bonded, and free integrase during the simulation, as well as the concentration of free integrase (so the concentration in the box excluding that sphere around the DNA strand). This particular run formed a fully compacted cluster at the end of the simulation. One can clearly see that although the number of total and bonded integrase increases, the number of free integrases and their concentration stays fluctuates around a constant value during the entirety of the simulation.
|  |  |
| -------- | -------- |
|  |  |
### Fifth test -- using semi-$\mu VT$
Using the semi-grand canonical simulations, I performed simulations for different attraction strengths, i.e. $\beta\epsilon_I =[1.5,2.0,2.5,3.0,3.5,4.0,4.5,5.0]$, at the integrase concentrations [IN4]$=[25,150,250,400,500]$, which corresponds to [IN]$=[100,600,1000,1600,2000]$. Increasing the attraction strength shifts the transition to the fully collapsed state to lower [IN]. From the simulations it quickly became clear that an attractions strength of $2.5k_BT$ or larger shifted the fully collapsed state to too low [IN] values and is thus too strong. Hence, I focussed on the attraction strength of $1.5k_BT$ and $2.0k_BT$ to do more simulations at additional intermediate [IN].
I looked at the concentrations [IN4]$=[25,50,10,150,200,250,300,350,400,450,500,550]$ nM corresponding to [IN]$=[100,200,400,600,800,1000,1200,1400,1600,1800,2000,2200]$ nM. For all concentrations, I did 12 simulations for each of the three DNA lengths. Underneath are the resulting bar charts for the occurance of the different states of the DNA strand in the final configurations of these simulations. Blue indicates normal or free DNA, green indicates rosette formation, and red idicates a fully collapsed state.
We see that the attraction strength of $2.0k_BT$ shifts the transition to the fully collapsed state to a similar value as in the experiments, i.e. around 1200 to 1500 nM, and we find rosettes around 800 to 1200 nM. However, note that the figures below are just based on the final configuration of each simulation. If we look at the simulations as a function of time, we see that the DNA always goes via the rosette state to the fully collapsed state. Moreover, it looks like the rosette states observed in some of final configurations are stopped in the mids of this transition and are not stable as a state themselves. Hence, the rosette state is most likely a transient state between normal DNA and fully collapsed DNA and not a stable intermediate state.
| $\beta\epsilon_I$ | 289 beads <br> (3.4 kbp) | 408 beads <br> (4.9 kbp) | 774 beads <br> (9.1 kbp) |
| -------- | -------- | -------- | -------- |
| $1.5$ |  |  |  |
| $2.0$ |  |  |  |
| $2.5$| - |  | - |
Underneath the radius of gyration as a function of the integrase concentration for the three different DNA lengths for $\beta\epsilon_I=2.0$. The blue open circles indicate the indepent simulations and the black dashed line indicates the average per [IN]. Note that for high [IN], for 774 beads and also for some of the simulations of 408 beads the DNA strands forms two (and sometimes even more) separate condensed clusters connected by a strand of bare DNA. Due to this, the $R_g$ is "too large" for the highest concentrations.
| 289 beads <br> (3.4 kbp) | 408 beads <br> (4.9 kbp) | 774 beads <br> (9.1 kbp) |
| -------- | -------- | -------- |
|  |  |  |
Lastly, we can compare the above barcharts with one obtained from simulations with purely repulsive IN-IN interaction, see below. Here blue indicates normal DNA, orange indicates DNA with integrase bridges, and brown indicates loosely collapsed DNA strands (due to bridging). Note that the range of integrase concentrations on the x-axis below is much higher than that of the barchart above.

### Classification
To study the collapse of DNA and the transient rosette state, we need to develop a good order parameter. As briefly discussed above, the radius of gyration on its own is not sufficient in distinguishing the different states of collapse. Here we use principal component analysis (PCA) on a set of input parameters to (hopefully) find a sufficient classification of the different states.
I use 15 input parameters:
- The radius of gyration $R_g$, asphericity $b$ (normalized with $R_g^2$), and anisotropy $\kappa$ of the DNA strand. Note that these are calculated using just the DNA beads and do not include the bound integrase.
- The number of integrase bound to the DNA strand, fraction of occupied DNA beads, and the average and standard deviation of the number of integrase bound per DNA bead.
- The number of unoccupied DNA chunks and the average and standard deviation of the length of these unoccupied DNA chunks.
- The average and standard deviation of the bending angle along the DNA strand.
- The number of clusters of bound intrase and the average and standard devation of the size of these clusters. Note that two integrase belong to the same cluster when their center-of-mass separation is less than $3.0\sigma_I$ and that a cluster contains at least 2 integrases, i.e. a singular bound integrase does not count as a cluster.
As a start, I train the PCA model on the configurations of a DNA strand of 408 beads (i.e. 4.9 kbp), as I did the most simulations for this DNA length. I use both the simulations of the system with purely repulsive IN-IN interaction and of the system with an attractive IN-IN interaction (with attraction strength $\beta\epsilon_I=2.0$) as input of the PCA model. I train the model on almost 15k configurations and normalize the input parameters before feeding them to PCA. The resulting first three principal components (PCs) capture 41%, 19%, and 16%, respectively. Note that the first three PCs combined capture more than 75% of the variance and that the fourth PC captures only 6%. Hence, we assume that using the first three PCs should be enough to distinguish between different structures of the given configurations.
The barcharts underneath show the (absolute) weight of each input parameter in the first three PCs. The scatterplots show the resulting distribution of the training configurations in the PC1-PC2 and PC1-PC3 plane. The arrows indicate the contribution of each input parameter to the distribution.
|  |  |
| -------- | -------- |
|  |  |
Looking at the scatterplots we can already by eye distinguish different regions in the distributions of points. This is made even more pronounced by looking at the 2D density profile of the distributions, see below.
|  |  |
| -------- | -------- |
|||
We can even use a Guassian mixture model (GMM) to find clusters this 3D distribution in an unsupervised way. The resulting clustering for 7 clusters is shown below. Note that as we will later see, these clusters quite nicely correspond to:
- <font color=#1f78b4>blue</font> = normal
- <font color=#fe7e0e>orange</font> = bridging
- <font color=#8c564a>brown</font> = loosely condensed cluster (due to bridging)
- <font color=#2aa02d>green</font> = single rosette
- <font color=#e377c2>pink</font> = single condensed cluster with (long) bare tail
- <font color=#d42726>red</font> = single fully condensed cluster
- <font color=#9466bc>purple</font> = multiple (fully/partially) condensed clusters
This clustering is already fairly accurate, except for the <font color=#e377c2>pink</font> points on the far right. These are grouped together with the configurations containing a single condensed cluster with (long) bare tail, but they actually consist of multiple (often two) fully collapsed clusters separated by a single (small) piece of bare DNA.
Furthermore, note that PC1-PC2 is already sufficient in distinguishing the majority of the distinct DNA configurations, and that PC3 is mostly useful in distinguishing the mulitple condensed clusters (<font color=#9466bc>purple</font>) from the single rosette state (<font color=#2aa02d>green</font>) and single condensed cluster with bare DNA tail (<font color=#e377c2>pink</font>).
|  |  |
| -------- | -------- |
|||
To illustrate how well this classification works, underneath I show some expamples of full simulation trajectories. Lets start with the results of the purely repulsive IN-IN interactions.
#### Repulsive IN-IN, [IN]=1000 nM
In all simulations, the DNA strand remains "normal" for the entire simulation at this "low" concentration for the purely repulsive IN-IN interaction.
| Run 1 | |
| -------- | -------- |
|  |  |
#### Repulsive IN-IN, [IN]=3000 nM
In all simulations, the DNA strand remains mostly "normal", but you start to see a tiny bit of bridging (although these break up again).
| Run 1 | |
| -------- | -------- |
|  |  |
#### Repulsive IN-IN, [IN]=4000 nM
Bridging happens more often. Sometimes the bridges mostly break apart again (e.g. run 1), whereas other times it manages to condense a bit further (e.g. run 2). To illustrate, I also show two snapshots of the latter. Note that for visualization purposes I have not depicted the free integrases.
| Run 1 | |
| -------- | -------- |
|  |  |
| Run 2 | |
| -------- | -------- |
|  |  |
| Run 2: time = 125 ($\cdot10^6$ MC cycles) | Run 2: time = 495 ($\cdot10^6$ MC cycles) |
| -------- | -------- |
|  |  |
#### Repulsive IN-IN, [IN]=5000 nM
In all simulations, the bridging results in a loosely condensed cluster.
| Run 4 | |
| -------- | -------- |
|  |  |
| Run 4: time = 150 ($\cdot10^6$ MC cycles) | Run 4: time = 500 ($\cdot10^6$ MC cycles) |
| -------- | -------- |
|  | |
Lets now look at the attractive IN-IN interactions.
#### Attractive IN-IN, [IN]=1000 nM
In most simulations, nothing happens and the DNA strand just remains a normal DNA strand for the entirety of the simulation (e.g. run 1). However, in one simulation it did manage to form a rosette and subsequently a fully condensed cluster, see run 3.
| Run 1 | |
| -------- | -------- |
|  |  |
| Run 3 | |
| -------- | -------- |
|  |  |
| Run 3: time = 158 ($\cdot10^6$ MC cycles) | Run 3: time = 188 ($\cdot10^6$ MC cycles) |
| -------- | -------- |
|  |  |
| **Run 3: time = 232 ($\cdot10^6$ MC cycles)** | **Run 3: time = 364 ($\cdot10^6$ MC cycles)** |
|  |  |
#### Attractive IN-IN, [IN]=1200 nM
Most simulations manage to form a rosette and subsequently a fully condensed cluster.
| Run 9 | |
| -------- | -------- |
|  |  |
| Run 9: time = 275 ($\cdot10^6$ MC cycles) | Run 9: time = 286 ($\cdot10^6$ MC cycles) |
| -------- | -------- |
|  |  |
| **Run 9: time = 338 ($\cdot10^6$ MC cycles)** | **Run 9: time = 466 ($\cdot10^6$ MC cycles)** |
|  |  |
Classification of run 9 over time:

#### Attractive IN-IN, [IN]=1400 nM
All simulations manage to form a rosette and subsequently a fully condensed cluster, see e.g. run 1. However, a few simulations, i.e. run 2 and 9, form multiple clusters, of which run 2 even has the time to recombine to one fully condensed cluster.
| Run 1 | |
| -------- | -------- |
|  |  |
| Run 1: time = 218 ($\cdot10^6$ MC cycles) | Run 1: time = 236 ($\cdot10^6$ MC cycles) |
| -------- | -------- |
|  |  |
| **Run 1: time = 274 ($\cdot10^6$ MC cycles)** | **Run 1: time = 350 ($\cdot10^6$ MC cycles)** |
|  |  |
| Run 9 | |
| -------- | -------- |
|  |  |
| Run 9: time = 352 ($\cdot10^6$ MC cycles) | Run 9: time = 400 ($\cdot10^6$ MC cycles) |
| -------- | -------- |
|  |  |
| Run 2 | |
| -------- | -------- |
|  |  |
| Run 2: time = 206 ($\cdot10^6$ MC cycles) | Run 2: time = 282 ($\cdot10^6$ MC cycles) |
| -------- | -------- |
|  |  |
| **Run 2: time = 368 ($\cdot10^6$ MC cycles)** | **Run 2: time = 449 ($\cdot10^6$ MC cycles)** |
|  |  |
Classification of run 2 over time:

#### Attractive IN-IN, [IN]=1600 nM (or higher)
All simulations very quickly form a rosette and subsequently collapse to a condensed cluster. However, more simulations form multiple condensed clusters.
| Run 1 | |
| -------- | -------- |
|  |  |
| Run 1: time = 92 ($\cdot10^6$ MC cycles) | Run 1: time = 92 ($\cdot10^6$ MC cycles) |
| -------- | -------- |
|  |  |
| **Run 1: time = 148 ($\cdot10^6$ MC cycles)** | **Run 1: time = 210 ($\cdot10^6$ MC cycles)** |
|  |  |
#### Classification as a function of time
Using the distribution obtained from PCA in combination with the GMM clustering, we can easily classify the structure of the DNA strand as a function of time. Underneath, I depict each individual simulation as a colored bar, where the color represents the classification at a specific time in the simuluation. I did 12 simulations for each integrase concentration and looked at three different attraction strength of the IN-IN interaction. Additionally, I did 10 simulations for a handful of integrase concentrations for a system with purely repulsive IN-IN interactions.
Meaning of the colors:
- <font color=#1f78b4>blue</font> = normal
- <font color=#fe7e0e>orange</font> = bridging
- <font color=#8c564a>brown</font> = loosely condensed cluster (due to bridging)
- <font color=#2aa02d>green</font> = single rosette
- <font color=#e377c2>pink</font> = single condensed cluster with (long) bare tail
- <font color=#d42726>red</font> = single fully condensed cluster
- <font color=#9466bc>purple</font> = multiple (fully/partially) condensed clusters
Note that when a bar is **white** at the end, it means that I stopped the simulation because it had reached a final(ish) configuration. I say "finalish" here, because for the simulations that were stopped in the <font color=#e377c2>pink</font> or <font color=#9466bc>purple</font> state, it would be wasted computational effort to simulate the last boring part of the rest of the compaction.
| [IN] (nM) | $\beta\epsilon_I=1.5$ | $\beta\epsilon_I=2.0$ | $\beta\epsilon_I=2.5$ |
| ---| --- | --- | --- |
| **100** |  |  |  |
| **200** |  |  | - |
| **400** |  |  | - |
| **600** |  |  |  |
| **800** |  |  | - |
| **1000** |  |  |  |
| **1200** |  |  | - |
| **1400** |  |  | - |
| **1600** |  |  |  |
| **1800** |  |  | - |
| **2000** |  |  |  |
| **2200** |  |  | - |
| [IN] (nM) | Repulsive IN-IN ($\beta\epsilon_I=0.7$) |
| ---| --- |
| **1000** |  |
| **3000** |  |
| **4000** |  |
| **5000** |  |
| **6000** |  |
#### Classifying the runs other DNA lengths
In order to classify the runs of the other DNA lengths together with the runs of the DNA strands of 408 beads (i.e. 4.8 kbp), I normalized the dataset of each DNA length separately and then combined the three datasets before feeding it to PCA. Note that I ommited the data of the runs with purely repulsive IN-IN interaction, as I did not have this data for the DNA strands of 289 beads (i.e. 3.4 kbp) and 774 beads (i.e. 9.1 kbp). The PCA model is trained on around 28k configurations and the resulting first two principal components (PCs) capture 46% and 18% of the variance, respectively.
The barcharts underneath show the (absolute) weight of each input parameter in the first two PCs. The scatterplots show the resulting distribution of the training configurations in the PC1-PC2 plane. In the left scatterplot the arrow indicate the contribution of each input parameter to the distribution and in the right scatterplot the color indicates the density profile of the distribution.
|  |  |
| -------- | -------- |
|  |  |
Looking at the scatterplots we can already by eye distinguish different regions in the distributions of points. With a GMM (Gaussian mixture model) we can easily cluster the distribution into 5 clusters. The resulting clustering is shown underneath, where once again <font color=#1f78b4>blue</font> are the normal configurations (i.e. no compaction), <font color=#2aa02d>green</font> are the configurations with a single rosette, <font color=#e377c2>pink</font> are the configurations with a single condensed cluster with (long) bare tail, <font color=#d42726>red</font> are the configurations with a single fully condensed cluster, and <font color=#9466bc>purple</font> are the configurations with multiple (fully/partially) condensed clusters.

Using this clustering, we can now classify all the runs of the three different DNA lengths. Underneath, I depict each individual simulation as a colored bar, where the color represents the classification at a specific time in the simuluation. Note that we are just interested in the pathway from no compaction to rosette to (full) compaction; hence, underneath I decided to group together the <font color=#9466bc>purple</font>, <font color=#e377c2>pink</font>, and <font color=#d42726>red</font> clusters, and color the single condensed cluster with (long) bare tail, single fully condensed cluster, and multiple (fully/partially) condensed clusters all as <font color=#d42726>red</font>. Furthermore, when a simulation is stopped early (as it reached a final(ish) configuration), I color the remainder of the bar with the classification of the final configuration. Note that this is unlike the barplots shown in the previous section above, for which I color a bar white at the end when the simulations was stopped early.
The results shown below are for an attraction strength of $2.0 k_BT$ between the IN. There are a few things to note/observe from these figures:
- Every collapse goes via the transient rosette state, even for very high concentrations of IN.
- The time spend in the normal state (i.e. before compaction starts) decreases with increasing [IN].
- The time spend in the rosette state decreases with increasing [IN].
- The longest DNA strand starts compacting at a lower [IN] than the other two DNA strands.
- The time spend in the rosette state seems to increase with the DNA length.
Especially the last two observations match with what is observed in the experiments, i.e. that transition to the rosette state shifts to lower [IN] for increasing DNA length and that the rosette state is not observed for the shortest DNA strand.
| 150 beads <br> (1.76 kbp) | 289 beads <br> (3.4 kbp) | 408 beads <br>(4.8 kbp) | 774 beads <br> (9.1 kbp) |
| --- | --- | --- | --- |
|  |  |  |  |
The results shown below are for an attraction strength of $1.5 k_BT$ between the IN. Again we see that the longest DNA strand starts compacting at a lower [IN] than the other two DNA strands and spends a longer time in the transient rosette state.
| 150 beads <br> (1.76 kbp) | 289 beads <br> (3.4 kbp) | 408 beads <br>(4.8 kbp) | 774 beads <br> (9.1 kbp) |
| --- | --- | --- | --- |
|  |  |  |  |
To better see the effect of increasing attraction strength, underneath I show for three different attraction strengths the results for the DNA strand of 408 beads (4.8 kbp).
| $\beta\epsilon_\text{IN}=1.5$ | $\beta\epsilon_\text{IN}=2.0$ | $\beta\epsilon_\text{IN}=2.5$ |
| --- | --- | --- |
|  |  |  |
### Trends in DNA length and IN concentration
Using the classified simulations above, we can calculate rough estimates for the rate of condensation as a function of DNA length and IN concentration. This rate of condenstaion is comparable to the nucleation rate in systems that show crystal nucleation; it tells you how quickly the system will start to nucleate or, in this case, condense. The rate $\lambda$ can be estimated using
$$ \lambda = \frac{m}{\sum_{i=1}^{n} t_i},
$$
where $n$ is the total number of simulations performed for a specific DNA length and [IN], $m$ is the number of simulations in which the DNA strand condensed, and $t_i$ is the time at which the condensation starts. Note that $t_i$ is basically the endpoint of the blue bar in the figures of the previous section. Furthermore, note that the sum is over all $n$ simulations, such that it also includes the time of the simulations in which the DNA strand did not condense.
The resulting rates for the different DNA lenghts are shown underneath as a function of the IN concentration. The first row gives the rates for an attraction strength of $2.0k_BT$ (left) and $1.5k_BT$ (right) for the IN-IN interaction. The second row shows all the rates in with linear axes (left) and with a logarithmic axis (right). The legend is the same as in the first row.
*Note that these rates are quite rough estimates as they are all based on just 12 simulations. Especially for the [IN] were only a few runs collapse is the error in estimated
rate large. I have not estimated the size of the errorbars yet.*
| Rate $(\beta\epsilon_{IN}=2.0)$ | Rate $(\beta\epsilon_{IN}=1.5)$ |
| -------- | -------- |
|  |  |
| **All rates** | **All rates log plot** |
|  |  |
Looking at the figures above, we clearly see that the rate increases with both [IN] and the DNA length. From the log plot, it even seems to be an (approximately) exponential dependence on [IN]. Moreover, the solid lines, i.e. those belonging to $\beta\epsilon_{IN}=2.0$, all seem to have (approximately) the same sloop in the log plot, as do the dashed lines, i.e. those belonging to $\beta\epsilon_{IN}=1.5$.
To further test this observation, underneath I plot the rates divided by the number of DNA beads, i.e. $N$. We see that dividing by $N$ causes the curves to approximately collapse onto a single (master) curve for $\beta\epsilon_{IN}=2.0$ and another single (master) curve for $\beta\epsilon_{IN}=1.5$.
| All rates divided by $N$ | All rates divided by $N$ (log plot) |
| -------- | -------- |
|  |  |
Next, we take a look at the average time the system spends in the rosette configuration, $\tau_\text{rosette}$. Underneath are the combined results for attraction strengths $1.5k_BT$ and $2.0k_BT$. There are a couple of things to note:
- For DNA strands of 150 beads (~1.76 kbp), $\tau_\text{rosette}$ seems to be independent of the concentration.
- For the other DNA lengths, $\tau_\text{rosette}$ sharply increases with decreasing [IN]. This effect is amplified by the DNA length.
- For the higher [IN], the effect of the DNA length is less pronounced (or even important).

#### Typical runs
Based on these computed rates and rosette times, I selected from each set of 12 simulations the most "typical" simulation. With this I mean the simulation of which the time at which condensation starts corresponds the best to the rate of condensations, and the time spend in the rosette state corresponds approximately to the average time spend in the rosette state.
| DNA length | $\beta\epsilon_\text{IN}=1.5$ | $\beta\epsilon_\text{IN}=2.0$ | $\beta\epsilon_\text{IN}=2.5$ |
| --- | --- | --- | --- |
| **150 beads** <br> (1.76 kbp) |  |  | |
| **289 beads** <br> (3.4 kbp) |  |  | |
| **408 beads** <br> (4.8 kbp) |  |  |  |
| **774 beads** <br> (9.1 kbp) |  |  | |
### Unraveling
For the DNA length of 408 beads and attraction of $2.0k_BT$, I picked a handful of rosette configurations and fully condensed configurations (both at [IN]=1200 nM and [IN]=2000 nM) and started new simulations with them, but now with the attractions between the IN turned off. All configurations very quickly unravel, even the fully condensed ones. This is not entirely unexpected as much higher IN concentrations (i.e. 3000 nM or higher) are needed for systems with purely repulsive IN-IN interaction to make the bridging last and lead to a collapsed state. However, it could have been possible that the high energy gain of the (fully) condensed state would "win" from the entropy gain of unraveling. But aparently entropy wins and all configurations unravel.
---
## Loop size distribution
To hopefully get an idea about the size of the integrase, I measured the loop size distribution for various integrase sizes and compared it to the experimental distribution. The experimental distribution was obtained for DNA strands consisting of 1000 bp, which corresponds to simulated DNA of 85 beads.
The resulting distributions are shown underneath. The experimental distribution is given by the gray bars and the distributions from the simulations are given by the colored data points. Although the distributions are not as smooth as I would have hoped, one can still clearly see a trend. For increasing integrase size, the loop size distribution broadens and its peak shifts to a slightly larger loop size. It is, however, difficult to say which distribution matches the experimental distribution best... I would say that the curves of $\sigma_\text{IN}/\sigma=[0.75,1.50]$ all fit reasonably.

To measure the probability distribution I ran 20 simulations each started from a different configuration of an equilibrated DNA strand of 85 beads. I first run each simulation for $10^6$ MC steps to obtain a different configuration from the starting configuration. Next, I randomly select a DNA bead to which an integrase will bind and insert an integrase into the simulation. I ensure that the integrase remains bonded to that bead by rejecting any trial moves of the MC simulation that break the bond. I let the simulation continue and stop/reset the simulation as soon as the integrase binds to another DNA bead making a loop of at least 5 DNA beads in size. By resetting the simulation each time a loop is formed, I can very quickly measure many loop sizes using just a single simulation.
Some notes:
- An integrase and a DNA bead are bonded when the distance between their spherical surfaces is less than $0.12\sigma$, which corresponds to approximately 0.5nm as $\sigma=4$nm.
- I decided on a minimun loop size of 5 DNA beads, as the integrase has the ability to bind to 4 DNA beads and thus easily binds to the neighboring beads of the "chosen" bead without forming an actual loop. However, this also means that often short loops are formed by first forming a loop of size 4 or smaller and then erlarging this loop by means of the integrase "sliding along" the DNA strand. Especially for the larger integrase sizes, this causes a huge artificial peak in the loop size distribution for loops smaller than 10 DNA beads. For this reason, I excluded loop sizes smaller than 10 (corresponding to 40 nm) for the loop size distribution shown above.
---
## 2D projection of DNA
For a worm-like-chain (WLC) with persistence length $l_p$ and total length $L=Nb$, where $N$ is the number of beads and $b$ is the average bond lenght, the end-to-end distance in 3D is given by
$$ \langle R^2 \rangle = 2 l_p N b \left[ 1 - \frac{l_p}{Nb}\left( 1 - e^{-Nb/l_p} \right) \right] .
$$
And the radius of gyration is given by
$$ R_g^2 = \langle R^2 \rangle /6.
$$
Indeed, when we calculate the average radius of gyration for our DNA strands of 150, 289, 408, and 774 beads (corresponding to 1765 bp, 3400 bp, 4800 bp, and 9100 bp), we find that it agrees with the theory.

Next, we project the DNA configurations on the xy-, xz-, and yz-plane, and compute the corresponding average radius of gyration of these projected configurations. Since, projecting onto a plane loses one degree of freedom, you would expect $\langle R^2 \rangle_\text{project}=\frac{2}{3}\langle R^2\rangle$. When we then plot the 2D projected $R_g$ as a function of the 3D $R_g$, we see that it indeed follows the line $y=\sqrt{2/3}x$.

We thus indeed find that the average 2D procted radius of gyration follows $R_g^2 = \langle R^2 \rangle/4$.

Yet, when we compare these numbers to the experimentally measured radii of gyration, they do not agree (see table below). I read the values for the experimental $R_g$ from the double hill plot figures at [IN]=0. Note that the diameter of a DNA bead, $\sigma$, equals 4nm.
| N | bp | 2D $R_g$ (nm) | EXP $R_g$ (nm) |
| ---- | ---- | ---- | --- |
| 150 | 1765 | 64 | - |
| 289 | 3400 | 91 | 58 |
| 408 | 4800 | 112 | 75 |
| 774 | 9100 | 153 | 90 |
---
## Future ideas
- Take a configuration in the rosette state and turn off attractions between the integrase and see what happens. Will the rosette unravel? Same for a fully collapsed state.
- ~~Change the size of the integrases to something like 5-10x as big as the DNA beads? This is what they do in **[Brackley, Marenduzzo et al. (2013)](https://www.pnas.org/doi/abs/10.1073/pnas.1302950110)**. For equal sized integrases they find the sort of zipping up that we also find when $n_\text{max}=\infty$, whereas for 5x larger integrases they find integrase clusters with DNA rosettes.~~
- ~~Change the size of the DNA beads to $l_p/20$ instead of $l_p/10$? This is what they use both in **[Ryu, Dekker et al. (2021)](https://www.science.org/doi/10.1126/sciadv.abe5905)** and **[Brackley, Marenduzzo et al. (2013)](https://www.pnas.org/doi/abs/10.1073/pnas.1302950110)**? We could look it this changes anything.~~
- ~~Restrict the number of bonds per DNA bead? This might make it more physically correct, but I think it will be difficult to implement this (without breaking detailed balance) on top of the restriction of the number of bonds per integrase.~~
---