# Predicting Networks ###### tags: `masters projects` `machine learning` `patchy particles` `glassy dynamics` --- ### Owners (the only one with the permission to edit the main text, others can comment) Duncan, Frank, Laura --- # Update week 2 <details> <summary>Click to expand!</summary> ### Power outage edition This week I worked on the following items: * Finding the mistake in the plot of the number of bonds found via simulation compared to theory. * Finding ways to import and work with the data file stored by the **managelibrary.c** code. * Making my own visualization of the particles and bonds. * Finding the temperature dependence of the bond breaking time. I will elaborate these items after asking two questions I had this week that I have not found the answer to yet. We can look at these questions in the next meeting. ## Questions ### Question one: Patch1/2 In the code of **managelibrary.c** a file is saved with data of a snapshot and the bond breaking and bonding of bonds of that snapshot. In this file however also something called *patch1* and *patch2* is saved. I was wandering what this is? I have added an image of the code with a part of the data in the file. ![](https://i.imgur.com/8umSFnt.png) > [name=Frank] I believe patch1 and patch2 refer to the index of the patch on each particle that is involved in the bond. So in a three-patch system, these can be 0, 1, or 2. Patches are arranged counter-clockwise along the particle edge. ### Question two: broaken visualisation I noticed that the first part of the data file created in **managelibrary.c** looked very similar to the **mov.init...** files. So after seeing this I tried to use the **opengl** software to visualize the data by only inserting the similar part. However I got very strange pictures and I was wandering if you might have an idea what might be causing this? I included an image of the side by side comparison of the data and the images for the bonds and particles I get. I note that the number of digits is different for some of the data of the two data files however the numbers seem similar. ![](https://i.imgur.com/spw6en5.jpg) > [Frank] Yes, in principle the first part of that file is again a snapshot of the system. However, the format of the file does not properly match the one expected by opengl from the extension. If you rename the file to ".patch" instead of ".pat", opengl should be able to read it. (You can also change the extension in the code to ensure this happens in the future, there is no need for it to be .pat). ## Update ### Number of bonds In the theory of the number of bonds, the number of bonds unbound patches of the particles is considered and thus you get the wrong number if you divide the number of total bonds by two since each bond uses up two particle patches. I changed the code so each bond still uses up two patches and found a graph that was in good agreement with theory. I show the changes to the code and resulting graph in the following image. After checking this I changed the code back to what it was in case the variable *nb* is used in other parts of the code where the division by two is needed, even though I could not find any other uses of *nb*. ![](https://i.imgur.com/CLkHzpR.png) > [name=Frank]Perfect. The difference at low $T$ are simply the result of the theory breaking down (and likely some phase separation at low $T$ and high $\rho$). ### Importing new datafiles The data file produced by **managelibrary.c** consists of multiple columns with varying spaces between the columns and text. This gave me some trouble importing it to python since I was using the *numpy* importing functions for other files and these functions expect a consistent number of columns and then I tried *pandas* which is a really nice package for data importing and manipulation however I am not very familiar to it so it took me some time to get used to it. Now I have gotten to a point that I am able to work with these data files, however it took me some time and in order to make progress and have some variation in my work I temporarily used *Mathematica*. In *Mathematica* I was able to easily import the files and work with them. ### Visualising the snapshots Since I was not able to use the **opengl** program to visualize the new data files and I wanted to get to know this data better I made my own visualization of the snapshots in *Mathematica*. I made a visualization of the particles positions and the bonds that are formed between particles, where I did not show the bonds that use the periodic boundary yet. I also made a visualization of the number of breaking and forming events of bonds which I overlay over the snapshot. The snapshot is at *epsilon = 5* and *density = 0.8*. ![](https://i.imgur.com/cMmSn2M.png) I see now as I am checking everything I rotated one of the bond forming images by 90 degrees. > [name=Frank] Looks good! Mathematica should also be fine for this. Not sure what the best way in Python is to deal with variable spacing. Perhaps reading it in as a string, then trimming duplicate spaces, then converting to numbers? Alternatively, feel free to edit the code to always put just one space. ### Bond break times at various temperatures Lastly I looked at the temperature dependence of the bond breaking time. I then compare this to data found in the thesis of Simone Monaco. I did this for the same temperatures and show the histogram I found (right) next to the fit found in the paper (left). Note that for a temperature of 0.12 no bonds had a breaking time of 100 or lower making that temperature not show up in my histogram. To me the results seem to agree relatively well. I would need to collect more data and perform a fit to make it look more clean. ![](https://i.imgur.com/gEKWkUd.png) > [name=Frank] Nice. Good agreement apart from simple statistics, from the looks of it. #### Some remarks I wanted to have a look at some possible properties to feed to a Neural Network and start finding a way to find and save these properties however I have not yet gotten to this point. This update has become a bit longer then I intended and does not look as clean as I would like it to. I need to use proper reduced units and add some legends and labels to some of the figures. I also need to learn how to properly include figures and subfigures with caption and reference in Markdown like how it is done in LaTeX. I also do not know how to get the spelling check that does toggle on or off to give me the right spelling, it just underlines the mistakes. Also some features in my HackMD are dutch for some reason while everything else is English and I can not find any settings to change this. > [name=Frank] Not sure on the language issues or spell check... (my spell check does not seem to work here at all). I've added you to our group HackMD site, you can take a look at some of the other project pages for formatting inspiration. </details> # Snapshot Parameters and Neural Networks for Bond Breaking Events <details> <summary>Click to expand!</summary> This week the goal is to reproduce the parameter list used by Simone and create a first version of a neural network that tries to predict bond breaking times from a snapshot. ## Parameters I managed to import the data in python and thus I continue in python to find the parameters and visualize the particles. ### nn and nnn To find the number of nearest neighbours and number of nearest nearest neighbours I create a function that takes a particle number and returns the number of connected neighbours and the number of neighbours connected to those neighbours. It does this by iterating over all patches of the provided particle and tracking which particles are attached to the patches. Then it counts the number of attached particles to those particles and the number of particles found in the first place. ### Angle deviation To find the angle deviation of connected patches from $2\pi/3$ I create a function that takes three particles (connected to two bonds where one particle shares the two bonds) and returns the angle between the bonds connecting the three particles. It does this by first creating two lines $\bar{a}$ and $\bar{b}$, both having the center particle as one of the endpoints. Then it calculates: $\theta = arccos(\bar{a}\cdot\bar{b}/(|\bar{a}||\bar{b}|))$ and subtracts $2\pi/3$ from it. ### Nearby particles The number of nearby particles, so connected or not, is found with a function that takes a particle and a distance and returns the number of particles that are that distance away from the particle. It does this by first dividing the box in smaller boxes. This way when the number of nearby particles is calculated for a particle, we do not have to calculate the distance to all other particles, only to the particles in nearby boxes. Then I iterate through all particles and find the number of particles that are closer than the provided distance in the small box the particle is in and all adjacent boxes. I do this with a small function that calculated the distance. ![](https://i.imgur.com/uj6I97o.png) Figure of the particle positions where the box is divided in smaller boxes. ### Loops and convexity The first parameter I extract are loops. To do this I created a function that takes a start particle and an end particle that are connected, and returns the size of the loop and a binary number indicating convexity. It does this by taking the patch to which the first particle is connected to the last particle and going counterclockwise through the patches until one is connected. When it has to skip one patch the concavity is set to one. At this point it has found the next particle in the loop and repeats the process. Once the next particle is the final particle it has found the full loop. After 16 steps the function stops and sets the size to 16 since there is either no loop or a very large one. ![](https://i.imgur.com/N9lrye4.jpg) As can be seen in the figure there are still some bugs to be fixed since some of the rings do not follow the shortest ring and some rings stop when there are no dead ends. > [name=Duncan] Turns out I was displaying all existing bonds and the bonds that still had to form so the algorithm that finds the rings does work. ### To do: * [x] Fix ring finding function! * [ ] Create a dictionary of loops for speed. * [x] Improve angle calculation to find larger than $\pi$ solutions. * [x] Remove dead ends from loop length. * [ ] Make my own parameter list with the data I think might be relevant. * Clustering cöefficient maybe. * Other distances (distance 1 results in 0 nearby particles). g(r\) like data maybe. * Choose what to do for the angle when there is no bond. Maybe add info on which neighbour does what. * Distance between most nearby unbonded patches. For bond forming time predictions. * Add temperature as variable. (Only when simulating, training and predicting at various temperatures.) * Maybe normalise some of the numbers, 16 for circles is large compared to some other numbers. (Neural network should be able to easily take care of this with weights, just a thought.) * Maybe look at how many convex/concave angles there are in a circle, not just there is at least one or not. > [name=Frank] With respect to the distances, I think you are now calculating the number of particles in the vicinity of a **bond**, rather than a **particle**. (Mentioning this to avoid future confusing when reading this.) ## Predictive neural network I used *pytorch* to create a neural network that takes the parameters described above and tries to predict a breaking time. I use 90% of the bond breaking events as training data and the other 10% as testing data. I find the breaking bonds from the data by dividing the number of events by the simulation time. During training I keep track of the losses and accuracy of both the training data and validation set. I plot this in the following figure. ![](https://i.imgur.com/JM7IbpN.png) In the figure we see one plot of the amount of predicted breaking times that are within a variation of 5 of the actual breaking times and in the second plot we see the sum of all the absolute differences between the predicted breaking times and measured breaking times. We see this for both the training set and the validation set. Once I see that the accuracy and loss does not change much anymore I predict the breaking times one last time and plot these vs the measured breaking times. I should see the points lie on the line y = x if the model works well. ![](https://i.imgur.com/dP48HoQ.png) As can be seen in the figure the predicted data does not follow the line well. This can be quantized by the R2 score, which is a measure of the correlation between the measured and predicted data. The closer the R2 score is to one, 1 being full linear correlation, the better the model performed. The R2 score found in this instance is: 0.10. This means the predicted data does not represent the real data well at all. ## Auto encoder In the seminar by Emanuele Boattini I heard about auto-encoders and thought this might be good practice. I created a neural network that took the parameters found as input and tried to output the same parameters. One of the layers in the neural network I gave fewer and fewer neurons until the loss of a trained network strayed away from zero. I provided 19 parameters and with 8 neurons in the middle of the network it was not able to get close to reproducing the 19 parameters in the output. ![](https://i.imgur.com/Bz9I5do.png) In this figure we see the result of an auto-encoder with a bottleneck of 8 neurons and 9 neurons and see that with 8 neuron its output is not able to properly replicate its input. I can see that this can be caused by linear and nonlinear correlations between the parameters however I would like to hear more about how auto-encoders can be useful to improve performance in neural networks. (I will also look for some ideas on this myself.) </details> # Finishing Parameter list and Training Neural Networks <details> <summary>Click to expand!</summary> I finished building my parameter list and got it working similar to what Simone made. To check whether it properly works and think about some choices that can be made I made visualizations for each parameter. I then trained my auto-encoder for 1000 epochs on the parameters with a bottleneck equal to the number of parameters. I did this for differently scaled parameters to see whether rescaling the parameters has a large effect on the performance of a neural network. It performed best on rescaled data so I rescaled all my parameters before using it to train a network predicting bond breaking times. I then ran simulations of 25 systems at bond strength $\epsilon = 5$ and density $\rho = 0.8$, or in more proper dimensionless units $\beta\epsilon = 5$ and $\rho\sigma^2 = 0.8$. Finally I trained a neural network for 10000 epochs with various number of layers. The performance was poor in all cases. Finally I trained the deepest of the tried networks for 50000 epochs, 15 hours. The results is still poor. ## Parameter list I tried to keep the parameters similar to what Simone made to see if I can reproduce a working neural network. Once I have a network that is able to somewhat predict bond breaking times I will try to make improvements to some parameter choices. ### Rings We choose the length and concavity as parameters to describe rings in the system. To find the length and concavity of rings I made a function that takes a bond and case number. It then determines a particle to start the ring from and a particle where the ring should end. Then the function walks through the system away from the end particle through the start particle making right turns until it is back at the end particle. After each step it adds the particle it is at to a list. If at any step in the loop the particle turning right has no particle attached to it it tries to turn left and sets the concavity to 1. If it can also not go left it has reached a dead end and turns back. Once it reached the end particle it has found a closed loop. Then the dead ends get removed from the list of particles the system walked through and the length of the list is returned unless the length is larger than 16, in that case 16 is returned as the size of the ring to prevent the sizes blowing up. If the function did not reach the end particle in 30 steps it has not found a closed loop and the size of the loop is set to 16, since fining no loop is similar to fining an infinitely large loop. First I will show the code used to find loops and remove dead ends. #### Code Code that removes dead ends ```python def remove_dead_ends(inlist): inlength=len(inlist) inpos=0 counter = 0 #This counter prevents the program to get stuck in the while loop when something goes wrong while inpos != inlength-4 and counter<30 and inlength>3: #Keep going untill all particles in the circle have been checked for dead ends for ini in range(0,inlength-3): if inlist[ini]==inlist[ini+3]: #If two particles are enclosed by a particle with the same id: [<-, 228, 330, 330, 228, ->] inlist.pop(ini+1) #Remove the two inner particles inlist.pop(ini+1) inlength=len(inlist) break inpos=ini #Keep track of how far in the circle we have checked counter+=1 if counter>25: print("Error removing dead ends: counter hit 15", inlist, inpos, ini, inlength) return inlist inlist = list(set(inlist)) #Remove duplicated from the list. To keep order use: inlist = list(OrderedDict (inlist)) return inlist ``` > [name=Frank] I am slightly surprised that no recursion is needed... Does your code deal okay with a "branching" dead end? Something like 1 - 2 - 3 - 4 - 3 - 5- 3 - 6 - 1? Code that finds ring lists. I can still add a way to store circles to make the code faster. I however have not finalized the code for this yet. It is not high on the priority list since training a NN and creating data takes much longer. ```python indexes=[0, 1, 2] #List to make turning right easy def find_ring(bond_id, loopnr): #It takes the id of a bond and a number for which circle case you want if bondlist[bond_id][4]==1: #If a bond id of a bond forming event is given the code will not run print("Forming") return if loopnr==0: #Setting the starting particle and ending particle of the circle one is interested in startpoint = bondlist[bond_id][0] endpoint = bondlist[bond_id][1] elif loopnr==1: startpoint = bondlist[bond_id][1] endpoint = bondlist[bond_id][0] elif loopnr==2: startpoint = bondlist[bond_id][0] endpoint = int(partlist[bondlist[bond_id][0]][15+indexes[bondlist[bond_id][2]-2]]) elif loopnr==3: startpoint = bondlist[bond_id][1] endpoint = int(partlist[bondlist[bond_id][1]][15+indexes[bondlist[bond_id][3]-2]]) else: print("Loopnr invalid") return circlelist=[] #List that keeps track of which particles are in the list anglecount=0 #Value for concavity circlelist.append(startpoint) lastpoint=endpoint currentpoint=startpoint nextpoint=currentpoint nextcounter=1 if endpoint==-1: #If you look at a circle only involving one of the bond particles it is possible it is only connected to one other particle and thus there is no ring circlelist.append(-2) nextcounter=16 anglecount=1 #print("No end") return circlelist, nextcounter, anglecount while nextpoint!=endpoint and nextcounter<30: #Walk through system unlill it reaches the end or 30 steps are made location=0 for i in range(3): #Find patch connecting last particle and current particle and select patch counterclockwise to this one if int(partlist[currentpoint][15+i])==lastpoint: location=indexes[i-2] break if int(partlist[currentpoint][15+location])==-1: #Move left when you can not move right location=indexes[location-2] anglecount=1 if int(partlist[currentpoint][15+location])==-1: #Turn around when you cant go right again circlelist.append(currentpoint) #To remove dead ends we want the last particle in the dead end in the list twice nextpoint = lastpoint anglecount=1 else: nextpoint = int(partlist[currentpoint][15+location]) else: nextpoint = int(partlist[currentpoint][15+location]) if (nextpoint==bondlist[bond_id][0] or nextpoint==bondlist[bond_id][1]) and loopnr>1: #In this case the circle there is no case 3 or 4 circle but a case 1 or 2 nextcounter=16 circlelist.append(nextpoint) circlelist = remove_dead_ends(circlelist) #print("Dead Line", circlelist) return circlelist, nextcounter, anglecount circlelist.append(nextpoint) #We have made a step and add the position to the list of points we went through nextcounter+=1 lastpoint=currentpoint currentpoint=nextpoint circlelist = remove_dead_ends(circlelist) #Once we are finished running through the system we remove dead ends nextcounter = len(circlelist) #We find the length of circles #print("Circle or long line") if nextpoint!=endpoint or nextcounter==2 or nextcounter>16: #We test for dead ends and too large loops nextcounter=16 anglecount=1 return circlelist, nextcounter, anglecount ``` For each bond there are four rings possible. Each of the four cases is visualized below. #### Normal case ![](https://i.imgur.com/L2AL615.png) Case 1 and 2 are circles containing both particles involved in the bond whereas case 3 and 4 each only contain one particle that is involved in the bond. For this figure case 1 has a length of 6 and is assigned a value of 0 for the concavity. Case 2 has len = 7, vonv = 0, Case 3 has len = 6, conv = 0. Case 4 has a length of 12 and is assigned a value of 1 for the concavity since it can not take only right moves when going through the patches. I also checked some special cases. I show them here with the values the program found for length and concavity to show it works and the choices I made for the special cases. #### Edge case ![](https://i.imgur.com/wnPSp92.png) len = 6, 7, 9, 11 and conv = 0, 0, 1, 1 As can be seen the lengths returned are correct for the case where the loop runs through the boundary to the other side. #### Dead ends, longer than 16 loop and missing loop ![](https://i.imgur.com/EDmy1m3.png) len = 16, 5, 16, 15 and conv = 1, 0, 1, 1 In this figure we see that the length of loop 1 is 17 and thus the function set it to 16. Also the dead end was taken out of this loop. Case 4 also contained a dead end, this time a dead end containing three particles. Here the length without the dead end is 15 which is what the function returned so it seems to work with dead ends. One of the particles involved in the bond has only particles connected to two of its patches. Thus there is no Case 3 loop and it returns 16. #### No loops ![](https://i.imgur.com/jxbpj8L.png) len = 16, 16, 16, 16 and conv = 1, 1, 1, 1 In this case there are no closed loops so it treats everything as infinite loops. To me it feels like setting the length to 16 for all loops here is not the best choice, but for consistency I will keep it this way for now. Also, the neural network might be able to deal with these cases when trained enough and there are so few of these cases at this bond strength and density that it should not be to large of an issue. ### Angles We also look at the deviation in angle between the bond and connected particles. This will result in 4 angles that are negative or positive. It is calculated by code that also takes boundary cases into account. This is shown in the following code and figure. Code to calculate an angle. ```python def point_line(point1, point2): #Find positional difference in particles taking periodic boundary into account pdiff = point1 - point2 if pdiff > boxx/2: pdiff -= boxx if pdiff < -boxx/2: pdiff += boxx return pdiff def calc_angle(particle1, particle2, particle3): #Calsculates angle between three points part1x = partlist[particle1][0] part1y = partlist[particle1][1] part2x = partlist[particle2][0] part2y = partlist[particle2][1] part3x = partlist[particle3][0] part3y = partlist[particle3][1] dxa = point_line(part1x, part2x) #Convert points to vectors that take boundary into account dya = point_line(part1y, part2y) dxb = point_line(part3x, part2x) dyb = point_line(part3y, part2y) a = np.array([dxa, dya]) b = np.array([dxb, dyb]) angle = np.arccos(np.dot(a,b)/(np.linalg.norm(a) * np.linalg.norm(b))) #Angle calculation return angle ``` Code to find all angles of a bond. ```python def find_angles(particle_i_id, particle_j_id): angles=[0, 0, 0, 0] neighbouri=[] neighbourj=[] for i in range(nr_patches): #Finding neighbours to the bond if they exist if int(partlist[particle_i_id][15+i])!=particle_j_id: neighbouri.append(int(partlist[particle_i_id][15+i])) for i in range(nr_patches): if int(partlist[particle_j_id][15+i])!=particle_i_id: neighbourj.append(int(partlist[particle_j_id][15+i])) for i in range(len(neighbouri)): #Calculating all angles between bond and neighbours if neighbouri[i]!=-1: angles[i] = (calc_angle(particle_j_id, particle_i_id, neighbouri[i]) - 2/3 * np.pi) for i in range(len(neighbourj)): if neighbourj[i]!=-1: angles[i+2] = (calc_angle(particle_i_id, particle_j_id, neighbourj[i]) - 2/3 * np.pi) return angles ``` #### Positive angle ![](https://i.imgur.com/wLHw6pI.png) The green and blue particles are part of the bond. #### Negative angle ![](https://i.imgur.com/y8LxaD2.png) #### Boundary case ![](https://i.imgur.com/hQ86H6B.png) In this case the blue particle is actually positioned on the other side of the box but we use the **point_line()** function to move it to a position relative to the green particle. ### Nearest neighbours and nearest nearest neighbours > [name=Frank] "next-nearest" To find the nearest neighbours and nearest nearest neighbours I made a function that takes a particle and goes through its patches and appends any neighbours to a list, *nnlist*, and then does the same for these neighbours now appending all neighbours to a different list, nnnlist. All elements in this list are the nearest nearest neighbours. Then I choose not to remove all duplicates from this list and returning the length of both lists. This does mean that the particles involved with the bond are also counted. The code is shown below. ```python nr_patches=3 def count_neigbours(particleid): nncounter=0 nnncounter=0 nnnlist=[] for i in range(nr_patches): if int(partlist[particleid][15+i])!=-1: #Append all neighbours to a list nncounter+=1 neighbourid = int(partlist[particleid][15+i]) for i in range(nr_patches): if int(partlist[neighbourid][15+i])!=-1: #Append all neighbours neighbours to a list nnncounter+=1 nnnlist.append(int(partlist[neighbourid][15+i])) #nnnlist = list(set(nnnlist)) #Remove duplicates, use list(OrderedDict(nnnlist)) if you want to keep order return nncounter, len(nnnlist) ``` For one particle in a bond the nearest neighbours and nearest nearest neighbours are shown. ![](https://i.imgur.com/i8rQ9vd.png) The bond particles are shown in green, the big red particles are nearest neighbours and the blue particles are nearest nearest neighbours. As you can see the bond particles themselves are also included in the neighbour information. This should not be a problem if done consistently but another choice to make. > [name=Frank]Agree, this should be fine as a choice. No real information is lost/gained by removing the duplicates. ### Nearby particles The last set of parameters I add are nearby particles. These are all particles within a certain distance of one of the particles involved in a bond. If a particle is within this distance of both it is counted double, another choice. To do this I use three functions. One dividing the whole box in smaller boxes so we do not have to calculate the distance of particles with all other particles, but just particles in nearby boxes. The second function calculates the distance between two points, taking the periodic boundary condition into account. The last counting nearby particles. Code to divide the box in smaller boxes. ```python def create_small_boxes(): particle_boxes=[] for i in range(10): #Create 10 X 10 empty matrix emp10=[] for j in range(10): emp10.append([]) particle_boxes.append(emp10) small_boxx = boxx/10 small_boxy = boxy/10 for i in range(N_particles): #Add particles in the right small boxes xcor = int(partlist[i][0]/small_boxx) ycor = int(partlist[i][1]/small_boxy) particle_boxes[xcor][ycor].append(i) return small_boxx, small_boxy, particle_boxes small_boxx, small_boxy, particle_boxes = create_small_boxes() ``` Code to calculate distance between two points. ```python def dis(pos1, pos2): dx2 = (pos1[0]-pos2[0])**2 dy2 = (pos1[1]-pos2[1])**2 if dx2>(boxx/2)**2: #Take boundary condition into account dx2=dx2 + boxx**2 - 2*boxx*np.sqrt(dx2) if dy2>(boxy/2)**2: dy2=dy2 + boxy**2 - 2*boxy*np.sqrt(dy2) sdis = np.sqrt(dx2 + dy2) return sdis ``` Code to find nearby particles. ```python def ndn(ndis): part_neigh = np.zeros(N_particles) for pid in range(N_particles): #Go through all the particles xpos1 = partlist[pid][0] ypos1 = partlist[pid][1] xcor1 = int(xpos1/small_boxx) ycor1 = int(ypos1/small_boxy) xneigbox=[xcor1-1, xcor1, xcor1-9] #List of indexes for which boxes to check yneigbox=[ycor1-1, ycor1, ycor1-9] for i in xneigbox: #Checking all particles in boxes where particles could be close for j in yneigbox: for k in particle_boxes[i][j]: if k!=pid: tempdis = dis([xpos1, ypos1], [partlist[k][0], partlist[k][1]]) #Calculating distance between particles if tempdis<ndis: #Counting particles that are within the distance part_neigh[pid]+=1 return part_neigh ``` In the figure I show one particle and all particles within a distance of 3. ![](https://i.imgur.com/yUBzIPo.png) In creating this figure I found a mistake. I created the small boxes before the loop where I load different snapshots. Meaning the data I used for my longest neural network training session was wrong. This may be (part) of the explanation why the NN performed so poorly. ## Rescaling parameters To see the effect of rescaling the parameters I used my Auto Encoder neural network with a bottleneck of the same number of neurons as the number of parameters. After training the network I plotted the actual parameters with the parameters the network found and calculated R2. For one run I trained and tested the network with unscaled parameters. The second run used the same parameters with the ring sizes divided by 16. In the last run I rescaled the ring sizes as before, divided the number of nearest neighbours and number of nearest nearest neighbours by the maximum amount, and divided the number of nearby particles by 10. ![](https://i.imgur.com/saVuu8D.png) Unscaled parameters: R2_train = 0.91, R2_test = 0.92 Slightly surprised that R2_test is higher. ![](https://i.imgur.com/pN6KOp3.png) Scaled ring size parameters: R2_train = 0.92, R2_test = 0.93 Again slightly surprised that R2_test is higher. ![](https://i.imgur.com/EavQmAn.png) Scaled parameters: R2_train = 0.98, R2_test = 0.98 As can be seen the network performed best with rescaled parameters. So I will use the rescaled parameters to train the predictive neural network. ## Predicting bond breaking time To predict the breaking time I tried two neural networks with different depth. The code of the structure of the neural network looks like the following. ```python class Net(nn.Module): def __init__(self): super().__init__() self.fc1 = nn.Linear(19, 32) self.fc2 = nn.Linear(32, 32) self.fc3 = nn.Linear(32, 32) self.fc4 = nn.Linear(32, 32) self.fc5 = nn.Linear(32, 32) self.fc6 = nn.Linear(32, 32) self.fc7 = nn.Linear(32, 1) def forward(self, x): x = F.relu(self.fc1(x)) x = F.relu(self.fc2(x)) x = F.relu(self.fc3(x)) x = F.relu(self.fc4(x)) x = F.relu(self.fc5(x)) x = F.relu(self.fc6(x)) x = self.fc7(x) return x ``` In this code we define our layers in the **__init__** function. For out forward steps we use the rectified linear activation function. The loss function and optimizer I use are mean square error loss and the Adam optimizer. ```python optimizer = optim.Adam(net.parameters(), lr=0.001) loss_function = nn.MSELoss() ``` Then we split the data in a training set and a test set. I choose to take the first 80% as the training set and the last 20% as the test set. When splitting data this way, if we have at least 5 snapshots at least one snapshot will not be part of the training set. once the data is split I make functions to pass it through the neural network. Every time I push the data through the neural network I return the loss and the R2 value. ```python def fwd_pass(X, y, train=False): if train: net.zero_grad() outputs = net(X) with torch.no_grad(): #Here I get errors when passing data of length<2 acc = r2_score(y, outputs) loss = loss_function(outputs, y) if train: loss.backward() optimizer.step() return acc, loss ``` I use a separate function to do a test run where I take a random slice of data from the test set and return the R2 value and loss. ```python def test(size=32): random_start = np.random.randint(len(test_X-size)) X, y = test_X[random_start:random_start+size], test_y[random_start:random_start+size] with torch.no_grad(): val_acc, val_loss = fwd_pass(X.view(-1, 19), y) return val_acc, val_loss ``` Lastly I made a loop where I train the data for a certain number of epochs and save the model. I do this by feeding a batch of the training data through the neural network. ```python MODEL_NAME = f"model-NNpredict" PATH = "./modelaiManyRunsNoDeep.pth" #Save of the parametervalues def train(): BATCH_SIZE = 32 EPOCHS = 1000 with open("./modelManyRunsNoDeep.log", "w") as f: for epoch in tqdm(range(EPOCHS)): for i in range(0, len(train_X), BATCH_SIZE): batch_X = train_X[i:i+BATCH_SIZE].view(-1, 19) batch_y = train_y[i:i+BATCH_SIZE] # if len(batch_X)>2: acc, loss = fwd_pass(batch_X, batch_y, train=True) if int(i/BATCH_SIZE % 25) == 0: #Every 25 batches I also do a test of the neural network with the test val_acc, val_loss = test(size=32) f.write(f"{MODEL_NAME},{round(time.time(), 3)},{round(float(acc), 2)},{round(float(loss), 4)},{round(float(val_acc), 2)},{round(float(val_loss), 4)}\n") torch.save(net.state_dict(),PATH) retrain=False if retrain: train() else: net.load_state_dict(torch.load(PATH)) ``` After training the network for 10000 epochs with the rescaled data of 25 snapshots at density 0.8 and bond strength of 5 the results are poor for the neural network described above and a neural network which is shallower. Neural network 1: 6 hidden layers ![](https://i.imgur.com/dyeBrW0.png) R2 train: 0.415 R2 test: -0.050 Neural network 2: 3 hidden layers ![](https://i.imgur.com/XGMEXs4.png) R2 train: 0.323 R2 test: 0.066 One thing that caught my eye was that the test data does not contain some of the high values seen in the training data. This could be chance since the test set is only 20% and there are only 11 points of the training data above the highest point of the test data. I should look at the plot of the loss and R2 in time of both the training data and test data to see whether training the network longer will work. Right now however the way I plot it is not very readable. ![](https://i.imgur.com/NWJs1RF.png) This is a figure of how the deep neural network performed during its training. It seems like the value of R2 for the test set is not getting better. I can also plot one point out of every 100 points and plot the number of epochs on the x axis. I then get the following. ![](https://i.imgur.com/t3j1oUV.png) Again I do not see any very promising trends. What I can try is a lower learning variable, different number of hidden layers or different layer width. > [name=Frank] I now realize that the correlation coefficient calculated by Simone is not exactly the same as your r2score. His was simply a linear correlation coefficient (https://en.wikipedia.org/wiki/Pearson_correlation_coefficient), while I believe yours is the coefficient of determination (https://en.wikipedia.org/wiki/Coefficient_of_determination). Both are reasonable choices, but some care should be taken when comparing results. ## Various networks I did a set of tests on 5 networks where I trained them for 1000 epochs and created plots of the predicted value vs real value of the breaking time, the R2 score in time and I recorded the final R2 values. The models I tried had various numbers of hidden layers and layer width: One: 1 hidden layer of 32 neurons, R2 was: 0.22, 0.19 Narrow: 6 hidden layers of 20 neurons each, R2 was: 0.28, 0.11 Pyramid: 5 hidden layers of [45, 35, 25, 15, 5] neurons, R2 was: 0.33, 0.07 PyramidS: 5 hidden layers of [17, 15, 13, 11, 9] neurons, R2 was: 0.24, 0.14 Deep: 6 hidden layers of 32 neurons each, R2 was: 0.42, -0.14 The problem with these R2 values is however that each network was trained for the same number of epochs, however they do not have the same number of neurons and connections thus, some networks were allowed to change more weights than others which might give them an edge in improving R2. | | Model prediction | Training progress | |---------|:---------------------------------:|:---------------------------------:| |One|![](https://i.imgur.com/tyB8rKl.png)|![](https://i.imgur.com/Lea6kjz.png)| |Narrow|![](https://i.imgur.com/O0zkZiX.png)|![](https://i.imgur.com/UHMKMv3.png)| |Pyramid|![](https://i.imgur.com/6iCC5rq.png)|![](https://i.imgur.com/j0FhF21.png)| |PyramidS|![](https://i.imgur.com/O0zkZiX.png)|![](https://i.imgur.com/UHMKMv3.png)| |Deep|![](https://i.imgur.com/yYRaNGl.png)|![](https://i.imgur.com/Cn7TKAc.png)| It should be possible to improve the neural network since Simone was able to reach an R value of 0.62 with the following prediction plot. ![](https://i.imgur.com/TBEK7B5.png) A difference that immediately stands out is that Simone works with relative values for the breaking time. I tried this on the Deep neural network. It does seem to lead to some improvement. Now R2 is: 0.51, -0.09 | | Model prediction | Training progress | |---------|:---------------------------------:|:---------------------------------:| |Deep|![](https://i.imgur.com/5I7xUcX.png)|![](https://i.imgur.com/z5GJvkG.png)| Using the same learning weight and weight decay I get the following result. R2 is: 0.40, 0.07 | | Model prediction | Training progress | |---------|:---------------------------------:|:---------------------------------:| |Deep|![](https://i.imgur.com/nq8RohC.png)|![](https://i.imgur.com/2OpcBuj.png)| ### Comparison ![](https://i.imgur.com/TBEK7B5.png) ![](https://i.imgur.com/nq8RohC.png) </details> # Fixing the network and first graph network <details> <summary>Click to expand!</summary> ## Fix to particles at distance Now I take particles a set distance away from the center of a bond in stead of the center of each particle. ![](https://i.imgur.com/oCKWZzY.png) The black dot is the center between two bond. ![](https://i.imgur.com/vE7UMbD.png) I used this case to check whether the period boundary condition works. ## Comparing parameter distributions (still need to add labels) To see where the differences lie in the choice of parameters I made histograms of the values Simone and I find for our parameters. Duncan ![](https://i.imgur.com/RZpF2ZU.png) Simone ![](https://i.imgur.com/03zYKig.png) The difference I found most clear was the difference in shape of the angle deviations. In my parameters the angles are all symmetric however in Simones parameters two of the angles are different from the other two. This could be caused because Simone chooses to use the first angleslot for missing bonds wherever a bond is missing. Also Simone takes the Cos or Sine of the angle in stead of the raw angle value I believe. Also some of the values are different. For instance the occurrence of convex loops is higher in my parameters. I am not sure why this is the case. Lastly the values of the number of nearby particles are different. Also here I have not been able to find the precise origin for the difference. ## Correlations As in the thesis by Simone I look at the linear correlations between parameters and the breaking time. ![](https://i.imgur.com/McAWWpA.png) I will not go through each of the parameters however, I see no surprising results. ## New removal dead ends There was a bug in my code for removing dead ends that would not properly remove dead ends with multiple pathways. This resulted in finding rings with length 3 in about 1 in every 5 snapshots. ```python def remove_dead_ends(inlist): inlength=len(inlist) for k in range(inlength-2): for i in range(1,inlength-k-1): if inlist[k]==inlist[-i]: for j in range(inlength-k-i-1): inlist.pop(k+1) remove_dead_ends(inlist) inlength=len(inlist) break inlist = list(set(inlist)) return inlist ``` The code now more generally looks for two of the same particle in a list and removes everything in between. ## New neural network results I am now able to reach R values of around 0.5 compared to the results in Simones work of 0.6. ## Graph neural network After reaching reasonable results I moved on to the coding of a graph neural network. I closely followed the code from Simones project, changing it here and there to structures I am familiar with. Then I ran it with the test data. So far it seems to work. It does however take my computer about 3 to 4 hours to train the network since my gpu does not have CUDA. </details> # Graph networks (week 6) <details> <summary>Click to expand!</summary> ## Bond order parameters To use graph networks I need information on the nodes (particles) of the system. The parameters I describe each node with are the number of nearest neighbours and the real, imaginary and absolute values of bond order parameters $q_3$, $q_4$ and $q_5$. I calculate $q_i$ of particle $j$ with the following equation: $q_l(j) = \frac{1}{nnn(j)}\sum\limits_{k=1}^{nnn(j)} \exp(i l \theta_{jk})$ where $nnn(j)$ is the number of nearest neighbours of particle $j$ and $\theta$ is the angle between a reference direction and the particle with each of its neighbours. ## Graph network performance After finishing code for the graphnetwork I tested it on both data I generated myself with my own bond order parameters and a datafile that was provided. For the my own data I generated new data of more snapshots that ran longer giving better statistics on the bond breaking time and providing me with a larger training set. The first results were the following. |R test value| Model prediction | Training progress | |------------|:-------------------------:|:------------------------------:| |0.62|![](https://i.imgur.com/eVcm0kx.png)|![](https://i.imgur.com/gpbRA7d.png)| |0.74|![](https://i.imgur.com/20wNVKp.png)|![](https://i.imgur.com/mZk7r0S.png)| The first result is from my own data and parameters at density 0.8 and the second result is from the provided data at density 0.75. As can be seen the provided data performs much better than my own data. This could be because of the difference in density or statistics (how long was the simulation so how good were the statistics on the bond breaking time) however I think that it is much more likely that it is caused by differences in parameters. > [name=Frank] This is entirely possible. I think Simone also found at least somewhat better results at density 0.75 than 0.8 (but I'm not sure he tried Graph networks at 0.8). It also makes sense to some degree: if we go to densities that are too high, dynamics are more strongly affected by the repulsive interactions of the particle cores, rather than the bonding behavior. Since our order parameters are strongly focused on the bonding pattern, they are likely to have an easier time to predict dynamics at the slightly lower densities. Do you know how the simulation length compares? (I see you drew error lines on both, so I'm guessing you have an estimate at least.) > [name=Duncan] The errorbars are a mistake, I do not know how long his simulations were. It is however very nice to see that, when looking at the R value obtained from the provided date, the GN performs about just as well, if not better, as the GNs mentioned in Simones paper. > [name=Frank] Agree, that is excellent! In any way I tried to improve performance on my own data by first comparing results when dividing the bond breaking time by the mean bond breaking time, changing the ring parameters to be one hot vectors and doing both. The results are shown below. |R test value| Model prediction | Training progress | |------------|:-------------------------:|:------------------------------:| |0.621|![](https://i.imgur.com/fXeXmym.png)|![](https://i.imgur.com/mUDWyUb.png)| |0.639|![](https://i.imgur.com/fmMTGUT.png)|![](https://i.imgur.com/c05JnLs.png)| |0.614|![](https://i.imgur.com/WNN7jI9.png)|![](https://i.imgur.com/u3qMRql.png)| As can be seen normalising the breaking times does not seem to improve performance however changing ring data to one hot vectors does improve the R value. > [name=Frank] Are these differences significant? In other words, if you take one of the training settings and rerun it 5 times, what is the standard deviation? > [name=Duncan] I have not checked this yet. At this point I had a onehot vector representing the rings followed by the ring value I had before. So in a sense I had the information on ringlength double in the parameters. Also I had the number of nearest neighbours in both the edge parameters as the node parameters. So also this information was double. I was interested if the graphnetwork would perform just as good if I removed the double information. First I just removed the ringlength and left the onehot vectors, then I also removed the number of nearest neighbours and the number of second nearest neighbours from the edge parameters and added both to the node parameters. |R test value| Model prediction | Training progress | |------------|:-------------------------:|:------------------------------:| |0.639|![](https://i.imgur.com/Tfdmwjd.png)|![](https://i.imgur.com/Y8tucsF.png)| |0.616|![](https://i.imgur.com/QroKzlx.png)|![](https://i.imgur.com/zVfTtQE.png)| As can be seen removing the integer for the ring length makes no difference, however a little bit to my surprise only saving the number of (second) nearest neighbours in the node information did make the GN perform worse. I thought this might be because we currently pass information from the original edge state to each of the edge updates, while we do not do this for the nodes. So I tried two things. First I wanted to see whether doing only two updates make the GN perform better since in that case the original parameterstates influence the breaking time more and I tried also passing the original node information in each step. |R test value| Model prediction | Training progress | |------------|:-------------------------:|:------------------------------:| |0.603|![](https://i.imgur.com/gFeU8on.png)|![](https://i.imgur.com/E7drJDB.png)| |0.635|![](https://i.imgur.com/9odgnL5.png)|![](https://i.imgur.com/eDDZEdD.png)| As can be seen, only doing two updates performs worse than before. However passing node information does allow our system to get much closer to the result where the neighbour information is stored in the edges. > [name=Frank] Top two images are swapped, I assume. Did you use only two graph layers in both scenarios here? > How long does a training take on Perceval? > [name=Duncan] I used 2 graph layers in the first and 5 in the second where in the second I passed original node information to each following update. I compared these both to the case where I have 5 layers and do not pass information through. (So 5 vs 2 layers and passing vs no passing) > Training in Perceval takes about 20 minutes for 4000 epochs. </details> # Exploring more state points <details> <summary>Click to expand!</summary> The goal this week was to determine the performance of a graph neural network at various state points. Before generating data on many state points however, data for one snapshot was produced on the cluster and several ways of passing data to a GN were tested. Also three different GNs were tested. This allowed us to make a choice for extracting parameters and for a GN that we know works relatively well before generating data on more snapshots. This was also done to make sure many simulations would not be run for nothing if something did not work. One the data and GN were tested, on snapshots at various state point were produced on the cluster. Then they were converted to parameters in the chosen way and fed to the chosen GN. Finally the correlations of GNs trained at various state points were tested at a range of state points that included points they were not trained at. ## Network choice As we saw last week, networks that had integer data points converted to one hot vectors worked best. Also the networks worked better when information on number of (second) nearest neighbours was also passed through the edge information. So we continued using this way of converting snapshot data to parameters and used the network that performed best. The results of last week are summarized in the table below. ![](https://i.imgur.com/yhpP7aY.png) Performance of neural networks using different choices of parameter input or different network structure. The up arrows(^) mean that entry is the same as the one above. ## State points To test the GN at various state points, first data was generated on the cluster. At each state point 200 snapshots (equilibriated for 500 timesteps) of patchy particle systems were simulated for 9500 time steps. This was done at density 0.75 and a range of $\epsilon$ running from 2 to 10 in steps of 1. Also one state point in the regime of phase separation was simulated with a density of 0.4 and $\epsilon$ of 5. The position of the points in the phase diagram by Simone are displayed below. ![](https://i.imgur.com/ZYa4qOF.png) On the left the phase diagram is shown where the blue boxes show the regimes we studied. On the right the snapshots are shown with the bond coloured according to their breaking time. The colour scale is not the same for each snapshot. ## Results ### Expectations After gathering data and generating parameters GNs were trained at each of the state points and one GN was trained using data from all state points at density 0.75. For this last GN in the parameters also the $\epsilon$ of the snapshot was passed as a parameter for each edge. I expected the state points at low epsilon (relative to 5) to perform worse since there are fewer bonds at this density and thus the training set would be smaller. To counteract this I could simulate for more snapshots at the same state point. I also expected state points at high epsilon to perform worse since it takes much longer for bonds to break at this epsilon and I simulated every state point for the same number of simulation steps, this means the uncertainty in the bond breaking time becomes much larger. I could solve this issue by simulating snapshots for a much longer time at high values of $\epsilon$. The one snapshot from a regime of phase separation I was not sure about. I had a feeling it would perform worse but I could not think of a good explanation why this would be so I thought I would just wait and see what the result was. > [name=Frank] As a side note, I very much like the inclusion of an Expectations section in this "ongoing project" format, and am tempted to encourage other students to do this in the future. (I also agree the expectations you wrote down.) ### Results So now for the actual results. In the figure below the correlation values R of all networks at all values of epsilon are shown. The larger circle is the point where the network is trained and the R value at these points are slightly unfair since it is the R value of the whole set of snapshots at this point and not just the test set. The GN trained at density 0.4 is not tested on data from densities 0.75 and this this GN only has one point in the plot. The GN trained at all state points is also tested at all state points, where here the test and training data are properly separated. ![](https://i.imgur.com/0V1Iuu5.png) As can be seen the GN trained at all $\epsilon$ values performed best. As I expected it is hard for the GNs to make predictions at high values of $\epsilon$ since the breaking times here of very uncertain. The GNs trained at low $\epsilon$ perform better then I had expected on the point they were trained but they fall off in performance fast. The GN trained and tested in the point of phase separation performs better then I had expected. Almost as well as the GN trained at a density of 0.75. So it seems that having slightly fewer bonds to train with and being in a regime of phase separation is not the biggest issue for the GN. > [name=Frank] Nitpicking: why the $\epsilon / 10$ label on the horizontal axis? Also, ideally label axes with dimensionless quantities: in this case $\epsilon / k_B T$. (This is admittedly less important for intermediate results, but it's good to get into the habit.) ### Scatter plots I also had a look at scatter plots of some of the GNs at some state points. They are shown with lines indicating 2 times the deviation in the bond breaking time. The first I look at is at the state at density 0.75 and $\epsilon$ = 2. ![](https://i.imgur.com/N8IhAxW.png) Here it can be seen that the uncertainty in the bond breaking time is very small. This is because the standard deviation $\sigma_n$ of the number of breaks $n$ that happen for a bond during a simulation is estimated to be $\sqrt{n}$. And thus when the bonds break often during a simulation (low bond breaking time) the relative deviation is small. Then we look at the GN trained at density 0.75 and $\epsilon$ = 5. ![](https://i.imgur.com/jOTER2D.png) This is a state point we have shown many times before. Most notable is that now the uncertainty starts to show. > [name=Frank] Question: can we estimate a maximum expectation for our correlation based on statistical uncertainty? I can think of a few ways of doing this: > 1) Numerically calculate the correlation coefficient between a dataset of breaking times (e.g. your full set of test data) and that same dataset but with a random Gaussian error added to each point with the same standard deviation as the one you calculate here. > 2) Perhaps you can analytically calculate the correlation coefficient for a simple (Gaussian?) distribution of bond breaking times, or at least write down the relevant equations. I will also try to see if this is solvable... Lastly I look at the state point at density 0.75 and $\epsilon$ = 10. ![](https://i.imgur.com/7ws6Iu0.png) Here we see the uncertainty diverge. This is not a good sign but can be understood by how we define the uncertainty. As said before the uncertainty in $n$ is defined as $\sqrt{n}$. The bond breaking time $\tau$ goes like $\tau \sim t_{sim}/n$. So if we draw lines of 2 times the deviation in $\tau$ we have $\sigma_{\tau\pm} \sim 2*t_{sim}/(n\mp\sqrt{n})-t_{sim}/n$. This means if we have instances where a bond only breaks once during the simulation the upper line of the deviation in $\tau$ diverges. When the scatterplot is displayed without the region of uncertainty the fact that there are low number of breaks of some bonds can by easily seen. ![](https://i.imgur.com/Y8YZPkT.png) The simulations were for 9500 time steps so if a bond broke once it is in the top row, twice the one below it et cetera. This issue could be easily solved by doing simulations of longer and longer times. > [name=Frank] Agree. Note that the expected scaling for bond breaking times is proportional to $\exp(\beta \epsilon)$ (see e.g. Kramer's escape rates for analytical predictions of crossing times for energy barriers). We would expect this to be true in the limit where lowering the temperature does not change the *structure* significantly anymore: in practice, this means when the number of bonds in the system is pretty much at its maximum already. If the structure is still changing, that can affect bond breaking rates as well, on top of the exponential scaling. Hence, we would expect that increasing $\beta \epsilon$ by 1 requires simulating $e$ times as long... so you'd need to simulate quite a bit more to get data at $\beta \epsilon = 10$ vs. 5. > That said... there might be ways around this. Assuming we can properly equilibrate a snapshot at some low temperature (this will also take a long time), we could confirm this exponential scaling in our simulations. Take an snapshot equilibrated at low $T$ and measure the bond breaking times, but while measuring the bond breaking times, set $\epsilon$ to a lower value (let's call it $\epsilon_m$ for "measure"). Do this for different values of $\epsilon_m$ with the same snapshot, and then plot the average bond breaking time for a few specific bonds as a function of $\beta \epsilon_m$. My expectation is that the bond breaking time will scale as $\exp(\beta \epsilon_m)$ as long as $\beta \epsilon_m \gg 1$. If you know where you hit that regime, you can select that value as your $\epsilon_m$ in the future, and use it to rapidly collect statistics. ## Some Ideas I had for next week ### Next week - For next week I think it is a good idea for me to clean up some of my codes and files. Including things like fixing the legend and distribution of the colourplots of bond breaking times. - I had also written down I wanted to write this update, so I can check that one off. - Make a "normal" neural network that predicts bond forming times to identify which difficulties/problems bond forming events pose. >[name=Frank] Agree with all of this. While doing this, I would keep some jobs on the CPU cluster running to gather data at the lower temperatures to collect statistics. </details> # Measuring error from simulation time <details open> <summary>Click to expand!</summary> ## Changed GN code After receiving some tips regarding neural networks I had a look at what some changes to the code would to do the performance and training time of the graph neural network I use. The first tip I got what to shuffle the training data and remove the last batch. The second tip was to pass initial values to the parameters of the graph network in stead of initializing them randomly. And the last tip was to us a different command for resetting the gradient during optimization. ![](https://i.imgur.com/oJz6ILS.png) The effects of the changes can be seen above, where each step to the right adds one of the suggested tips. As can be seen, training times do go down when using the tips and performance goes up. The results of the last change however surprised me a bit since from what I found it should not have changed the outcome. It should only change the outcome when you use different optimizers for different parts of the system. https://discuss.pytorch.org/t/zero-grad-optimizer-or-net/1887 ## More state points ### General GN Then I took another look at the plot of last week. ![](https://i.imgur.com/gAbJPRd.png) Here the blue line is the result of a network trained and tested at all temperatures. However I did not realize that the correlation here will natuarly be high since I pass the temperature to the network which is strongly correlated to the breaking time and thus it will be able to predict variance in breaking times for different temperatures. However for breaking times of systems within one temperature it might not be able to predict relative breaking times between bonds of those systems. The light green line are results of the system tested for bonds at only single temperatures at a time. As can be seen indeed the network now performs much worse. ### density 0.4 I then also did simulations at a density of 0.4. ![](https://i.imgur.com/7fKn77J.png) The plot above shows the results of GN predictions on test sets at the points where they were trained. As can be seen for this low density it has a hard time predicting breaking times at high temperatures. ## Runs of different length The bond breaking time is calculated by how many times a bond breaks during a simulation. Thus if you simulate longer you get better statistics. To see what the effect of the simulation time is on the predictive capabilities of GNs I ran simulations of systems starting at the same equilibrium snapshot for 4500 time steps. I then added the amount of times a bond broke in multiple of those simulations together to get different statistics of the bond breaking time. ![](https://i.imgur.com/dfbtDpt.png) As can be seen for low simulation time the quality of the predictions rise fast and then seem to flatten around 0.755. However at the very high simulation time the quality of predictions seems to go up again. I so not quite understand this, so I might have to take another look at the variance in R when doing multiple training runs. Another factor that influences the quality of predictions is the size of the training set. So to test this I train a GN with 50, 100, 150 and 200 snapshots vor various simulation times. ![](https://i.imgur.com/fi6aAlL.png) As can be seen indeed lower breaking times in general perform worse and fewer training snapshots usually perform worse. You can however combine the two to see what gives the best performance for the number of simulations you do. Because every snapshot is one simulation and every 4500 time steps is also one simulation. To see this I plotted the correlation coefficient R vs the simulation time * the number of graphs. ![](https://i.imgur.com/FyRT13R.png) The lines from different training runs seem to follow a similar expected trend, more data gives better predictions. It is interesting though that using the most amount of graphs is not always best when trying to limit the number of simulations. ## Theoretical correlation For now I just want to lake over what Frank showed me here to go through the steps and have the formulas in the markdown/latex format. $$R = \frac{1}{N}\frac{\sum(x_i-\bar{x})(y_i-\bar{y})}{\sigma_x\sigma_y}$$ (1) where x is the true value of the bond breaking time and y is the noise over the bond breaking time. The average bond breaking time is: $\bar{x}=\bar{t_b}$. If the noise is symmetric around the bond breaking time we also have: $\bar{y}=\bar{t_b}$. Now we define three variances; $\sigma_{true} \equiv \sigma_t$ the variance caused by the different bonds being in different environments, $\sigma_{noise} \equiv \sigma_n$ the variance caused by measuring a limited number of times the bond breaks, and lastly we define $$\sigma_{in} = \sqrt{\sigma_t^2+\sigma_n^2}$$ the variance in the breaking times we measure. When we measure for long enough $\sigma_n$ is small compared to $\sigma_t$ so we say: $\sigma_t\approx \sigma_{in}$. We approximate the distribution of the number of times a bond breaks with the poison distribution: $\sigma_{\lambda}^2\approx\bar{\lambda}$. We also use that the bond breaking time of a bond is calculated from $t_b=\frac{t_{sim}}{n}$ where $t_b$ is the bond breaking time of a bond, $t_{sim}$ is the time the bond was simulated and $N$ is the number of times the bond broke during the simulation. This gives us $\langle n\rangle=\frac{t_{sim}}{t_b}$ and thus $\sigma_n=\sqrt{\frac{t_{sim}}{n}}$ with $\sigma_n\ll\bar{n}$. Now we look at the deviation in the breaking time again and have: $\sigma_{noise}=\sigma[\frac{t_{sim}}{\langle n\rangle}]=t_{sim}\frac{\sigma_n}{\bar{n}^2}=t_{sim} \sqrt{\frac{t_{sim}}{t_b}} (\frac{t_b}{t_{sim}})^2 = \sqrt{\frac{t_b^3}{t_{sim}}}$. We then look at equation (1) and approximate the distributions with continuous distribution. We then get $R = \frac{1}{\sigma_x\sigma_y}\int{dt dt' P(t) P_n(t')(t-\bar{t})(t+t'-\bar{t})}$, where $P(t)$ is the real distribution and $P_n(t)$ is the distribution of the noise. We rewrite this equation: $R = \frac{1}{\sigma_x\sigma_y}\int{dt P(t) (t-\bar{t})\int{dt'P_n(t') (t+t'-\bar{t})}}$. Then we use: $\int{dx x P(x)}=\bar{x}$ and and get $R = \frac{1}{\sigma_x\sigma_y}\int{dt P(t) (t-\bar{t})(t-\bar{t})}=\frac{1}{\sigma_x\sigma_y}\int{dt P(t) (t-\bar{t})^2}=\frac{\sigma_t^2}{\sigma_x\sigma_y}=\frac{\sigma_t}{\sigma_y}$. So now we have: $R = \frac{\sigma_t}{\sqrt{\sigma_t^2+\sigma_n^2(t_b)}}\approx \frac{\sigma_t}{\sqrt{\sigma_t^2+\sigma_n^2(\bar{t_b})}}$ where $\sigma_n^2(\bar{t_b})=\sqrt{\frac{\bar{t_b}^3}{t_{sim}}}$. If we then use the approximation that for long enough simulation: $\sigma_t\approx\sigma_{in}$ we can easily compare it to our data. At density 0.75 and $\epsilon\beta=5$ 200 snapshots were simulated for $t_{sim} = 4500$ 16 times. The data was combined to get data on the number of times a bond breaks in variable amounts of time ranging from 4500 time steps to 4500*16 time steps. Networks were trained using this data and their resulting correlation was plot. From the breaking times the standard deviation was measured and the average breaking time for the different number of time steps. The standard deviation of the data of 4500 * 16 time steps was taken to be $\sigma_t$ and $\bar{t_b}$ was measured for all data sets with variable simulation time. This was used to calculate a theoretical $R$ and then compared to the $R$ the network achieved. The results are shown below. ![](https://i.imgur.com/SgfULCk.png) As can be seen theory predicts that much better results should be possible to reach. However I am weary that it is very close to 1 so I might have made a mistake somewhere. > [name=Duncan] Is $\sigma_y=\sqrt{\sigma_t^2+\sigma_n^2}$ correct? Or should it be $\sigma_y=\sigma_{true}$. And is $\sigma_{noise}=\sqrt{\frac{t_{sim}}{t_b}}$ correct or is it $\sigma_n=\sqrt{n}=\sqrt{\frac{t_{sim}}{t_b}}$ or are they the same. > Also there is a step in the calculation I am not sure about which I will post below. ![](https://i.imgur.com/Az1csXW.png) </details> # Bond forming events ## Testing GN on bonds with forming events The first step I made to getting a neural network to predict bond forming events was to use most of the same parameters as before, only now not looking for rings. To each pair of particles that forms a bond at least once during the simulations I assign those parameters. I then pass the parameters to the same GN as before. The result I get is an R value of the test set of 0.39 however when I look at the scatterplot it does not seem too good yet. ![](https://i.imgur.com/7TR89Nc.png) As can be seen, there are many cases where a bond only breaks a very small number of times (1,2,3,...). This can be seen by the "lines" in the measured breaking times. This means the deviation relative to the actual breaking time is very big. I also see some points at breaking times where I would think they are not possible since I only count integer number of breaks and use $t_b=\frac{t_{sim}}{n}$ and so not every value of $t_b$ is possible. Also when we zoom in on the points with a breaking time of 1000 or less we see the model predicts negative breaking times ![](https://i.imgur.com/nfML9rC.png) ## Choosing bonds to consider for forming events If we want to build a GN that can predict bond forming times from a snapshot without knowing a priory which bonds will form a bond, we have to give the GN a list of bonds to consider. To build a list of bonds to consider, I find patches of particles that are not involved in a bond and find all patches that are not involved in a bond closer to the first patch than a certain distance. I then build a list of information of which particles and patches are involved in the potential bond and save these. In this list I set the number of bond events of all bonds to zero. Once I have a list of bonds I want to consider I take the bond forming events I measured and compare them to the list. Wherever a bond is in both lists I pass the number of times the bond forms to the considered bonds list. In the end I am left with a list of a bunch of bonds that each have a number of times they form during simulations starting at 0. In the making of the list I choose the distance at which I want to consider bonds. If I make this distance large a large amount of the considered bonds will have zero as the number of bond events. This will take simulations take much longer. However if I make the distance small, a lot of bonds that do undergo bond events are not considered. To see what distance I want to use I plot the number of considered bonds with 0 events vs the distance and the fraction of bonds with events considered/total number of events with bonds vs the distance. | Number of bonds considered without events | fraction of bonds with events considered (about 1300 total bonds with events)| |:-------------------------:|:------------------------------:| |![](https://i.imgur.com/DUa28j0.png)|![](https://i.imgur.com/K0Pb85J.png)| As can be seen a large number of considered bonds will have 0 events. This means that if a network is trained it will have a tendency to prefer predicting that bonds have zero events. The distribution of number of events is the following: ![](https://i.imgur.com/51upuIy.png) In this plot I have cut off the x-axis to be able to see the line at 0. You can see that most bonds undergo a very small number of events. This means the deviation caused by the noise will be large. To test however if everything works somewhat I trained a GN with this data and got an R value of 0.43 but when looking at the scatterplot again we can see that it does not perform well yet. ![](https://i.imgur.com/TniIdy0.png) Most points lie at very small number of bond events. ![](https://i.imgur.com/SydFwYU.png) When zoomed in at the 1000-1000 region of the plot we can see that the predictions are all over the place. ## Checking bond forming statistics The statistics of the bond forming events seem to be correct. Only slightly misleading thing was the bin size again... But if I build up the histogram of bond forming events first of one seed and one timerun I get 111 bonds that form at least one time and when I add 15 more timeruns of the same snapshot it increases to 117 bonds but with more bond events per bond. This seems to be what I would expect. ![](https://i.imgur.com/K4Tbv0S.png) Number of bonds with fewer than 100 bond events for one snapshot and one time run ($t_{sim}=4500$). ![](https://i.imgur.com/FfmBLjR.png) Number of bonds y with number of bond events x for one snapshot and one time run ($t_{sim}=4500$). ![](https://i.imgur.com/yWpeiqS.png) Number of bonds with fewer than 1600 bond events for one snapshot and 16 time runs ($t_{sim}=4500*16$). ![](https://i.imgur.com/N7USKeW.png) Number of bonds y with number of bond events x for one snapshot and 16 time runs ($t_{sim}=4500*16$). If I then increase the number of snapshots the number of bonds increases in an expected way. And finally I see that there are many bonds with a small number of events. ![](https://i.imgur.com/aYtRi3O.png) Number of bonds with fewer than 100 bond events for 200 snapshots and 16 time runs ($t_{sim}=4500*16$). ![](https://i.imgur.com/HsHkq2l.png) Number of bonds with fewer than 1600 bond events for 200 snapshots and 16 time runs ($t_{sim}=4500*16$). ![](https://i.imgur.com/DEOXrTD.png) Number of bonds y with number of bond events x for 200 snapshots and 16 time runs ($t_{sim}=4500*16$). I chose large bins here and limited the yrange to (0,10) to see where the bonds lie with a high number of bond events I also visualized some of these bonds and see the following. ![](https://i.imgur.com/DBdXZLh.png) Typical bond that only forms one time during the simulations in yellow. ![](https://i.imgur.com/TBWZRLp.png) Typical bond that forms many times (10010) during the simulations in yellow. ![](https://i.imgur.com/iDtU61z.png) Figure of bonds in yellow that all form bonds fewer times than 20 during simulation. ![](https://i.imgur.com/KDghI30.png) Figure of bonds in yellow that all form bonds more often than 1000 times during a simulation of $t_{sim}=4500$. ![](https://i.imgur.com/yVFbhMA.png) Figure of bonds in yellow that all form bonds more often than 1000 times during a simulation of $t_{sim}=4500*16$. ![](https://i.imgur.com/wUtWHZK.png) Zoom in on some of the bonds shown above. I also checked for one of the simulations if the number of events for each bond seemed to be built up in an expected way. I would expect that bonds early on in the list have a higher chance to be bonds that form often and so will have a higher number of bond events. This seems to somewhat hold. ![](https://i.imgur.com/nWuE9XJ.png) y is the number of bond events and x is the position in the list of bond forming events of one simulation. ## I then took a look at the histogram I found before of the distances of particles that were included and not included and made a change to the code that revealed something slightly worrying. Before I found this and I thought the first peak was wrong. ![](https://i.imgur.com/rUHhSuH.png) What I did here was I loaded a snapshot of a system and then made a list of information of bonds that had patches which were closer than 2 units of distance apart and were not already part of a bond. Then I went through the recorded list of bond forming events, of all simulations corresponding to that snapshot, and added the number of forming events to the created list if the bond was in both. I recorded the distance of all particles of recorded bonds and added them to one of two lists depending on whether that bond was in the created list of potential bonds. The distance however I calculated from the snapshot of the simulation that recorded bond was in, since the snapshot should be the same anyway. Finally I made a histogram of all these distances. I then changed where now I calculate distances only using the snapshot from which I create the potential bond list and see that the outcome is slightly different. ![](https://i.imgur.com/oHlUOSj.png) ![](https://i.imgur.com/SUV39hy.png) So something is going wrong in the time run where I do not look at the same snapshot each time but a very slightly different one. If I plot the snapshots over each other you can see the difference. ![](https://i.imgur.com/xR1iIpE.png) # GN code ## Full GN code In the box below I posted my full GN code in case anyone wants to check out what code I am running. <details> <summary>GN code. Click to expand!</summary> ```python # -*- coding: utf-8 -*- """ Created on Fri Oct 2 10:42:40 2020 @author: Duncan """ #%% #Importing relavant moduels import numpy as np import matplotlib.pyplot as plt from matplotlib import style style.use("ggplot") from matplotlib.lines import Line2D import pandas as pd import os from tqdm import tqdm import pickle import torch from torch.nn import Sequential as Seq, Linear as Lin, ReLU, MSELoss, ModuleList import torch.nn.functional as F import torch.utils.data as Data from torch.autograd import Variable import torch.optim as optim import torch_geometric.transforms as T import torch_geometric as tg from torch_geometric.nn import SplineConv from torch_geometric.nn import MetaLayer from torch_scatter import scatter_mean, scatter_sum, scatter_std #%% #Read savename, datafile and cuda device from text file init_file = open("init.txt","r") savename = init_file.readline()[:-1] data_file_name = init_file.readline()[:-1] cuda_nr = init_file.readline()[:-1] #%% #String that is appended to the name of each saved file to easily save different models # savename = "Clus75_200" print("saving files under name: ", savename) BATCH_SIZE = 20 EPOCHS = 2000 retrain=True number_of_graph_layers = 5 device = torch.device(cuda_nr if torch.cuda.is_available() else 'cpu') print("Running on: ", device) #%% #Loading datafile containing snapshotinformation # data_file_name = "OnePar_e500r75.pickle" data_file = open(data_file_name, "rb") node_feat, edge_pair, edge_feat, break_time = pickle.load(data_file) data_file.close() nr_snapshots = len(node_feat) nr_feat_per_node = len(node_feat[0][0]) nr_feat_per_edge = len(edge_feat[0][0]) print("Number of snapshots: ", str(nr_snapshots)) print("Number of node features: ", str(nr_feat_per_node)) print("Number of edge features: ", str(nr_feat_per_edge)) print(len(node_feat[0]),len(edge_pair[0]),len(edge_feat[0]),len(break_time[0])) print(len(node_feat[0][0]),len(edge_pair[0][0]),len(edge_feat[0][0]),len(break_time[0])) #%% #Parameters for the network converting input data to graph input edge_encoder_layers = [16, 16] node_encoder_layers = [16] edge_decoder_layers = [16, 16] #Parameters for the graph networks edge_layers = [16, 16, 16] node_layers = [32, 16, 16] node_pool_layers = [16, 16] edge_net_out = 16 node_pool_net_out = 16 node_net_out = 16 edge_net_in = 2*node_net_out + 2*edge_net_out node_pool_net_in = node_net_out + edge_net_out node_net_in = 2*node_pool_net_out+node_net_out print("nodes:", nr_feat_per_node, "->", node_net_out, "->", 0) print("edges:", nr_feat_per_edge, "->", edge_net_out, "->", 1) #%% with open("InfoModel"+savename+".txt", "w") as infof: infof.write(f"Model: {savename}\nData file: {data_file_name}\nBATCH_SIZE: {BATCH_SIZE}\nnumber_of_graph_layers: {number_of_graph_layers}\n") infof.write(f"edge_encoder_layers: {edge_encoder_layers}\nnode_encoder_layers: {node_encoder_layers}\nedge_decoder_layers: {edge_decoder_layers}\n") infof.write(f"edge_layers: {edge_layers}\nnode_layers: {node_layers}\nnode_pool_layers: {node_pool_layers}\n") infof.write(f"edge_net_out: {edge_net_out}\nnode_pool_net_out: {node_pool_net_out}\nnode_net_out: {node_net_out}\n") #%% #Turn data for a snapshot into a graph object def makegraph(node_features, edge_features, edge_pairs, edge_targets): return tg.data.Data( x = torch.tensor(node_features).type(torch.FloatTensor), edge_attr = torch.tensor(edge_features).type(torch.FloatTensor), edge_index = torch.t(torch.tensor(edge_pairs)), y = torch.tensor(edge_targets).type(torch.FloatTensor) ) #%% graphs=[] for snapid in range(nr_snapshots): graphs.append(makegraph(node_feat[snapid], edge_feat[snapid], edge_pair[snapid],break_time[snapid])) #Test-set consists of the first snapshot #%% #defining the graph layer class EdgeModel(torch.nn.Module): def __init__(self): super(EdgeModel, self).__init__() self.edge_mpl = ModuleList() hlold = edge_net_in for hl in edge_layers: self.edge_mpl.append(Lin(hlold, hl)) hlold=hl self.edge_mpl.append(Lin(hlold, edge_net_out)) def forward(self, src, dest, edge_attr, u, batch): out = torch.cat([src, dest, edge_attr], 1) for layer in self.edge_mpl[:-1]: out = F.elu(layer(out)) out = self.edge_mpl[-1](out) return out class NodeModel(torch.nn.Module): def __init__(self): super(NodeModel, self).__init__() self.node_mpl1 = ModuleList() self.node_mpl2 = ModuleList() hlold = node_pool_net_in for hl in node_pool_layers: self.node_mpl1.append(Lin(hlold, hl)) hlold=hl self.node_mpl1.append(Lin(hlold, node_pool_net_out)) hlold = node_net_in for hl in node_layers: self.node_mpl2.append(Lin(hlold, hl)) hlold=hl self.node_mpl2.append(Lin(hlold, node_net_out)) def forward(self, x, edge_index, edge_attr, u, batch): row, col = edge_index out = torch.cat([x[row], edge_attr], dim=1) for layer in self.node_mpl1[:-1]: out = F.elu(layer(out)) out = self.node_mpl1[-1](out) out = torch.cat([x, scatter_sum (out, col, dim=0, dim_size=x.size(0)), scatter_mean(out, col, dim=0, dim_size=x.size(0))], dim=1) for layer in self.node_mpl2[:-1]: out = F.elu(layer(out)) out = self.node_mpl2[-1](out) return out # ,scatter_std (out, col, dim=0, dim_size=x.size(0)) # A graph layer is now written as: MetaLayer(EdgeModel(), NodeModel()) #%% #defining the total network class Net(torch.nn.Module): def __init__(self): super(Net, self).__init__() if number_of_graph_layers >= 1: self.graphnet1 = MetaLayer(EdgeModel(), NodeModel()) if number_of_graph_layers >= 2: self.graphnet2 = MetaLayer(EdgeModel(), NodeModel()) if number_of_graph_layers >= 3: self.graphnet3 = MetaLayer(EdgeModel(), NodeModel()) if number_of_graph_layers >= 4: self.graphnet4 = MetaLayer(EdgeModel(), NodeModel()) if number_of_graph_layers >= 5: self.graphnet5 = MetaLayer(EdgeModel(), NodeModel()) if number_of_graph_layers >= 6: self.graphnet6 = MetaLayer(EdgeModel(), NodeModel()) self.node_encoder = ModuleList() hlold = nr_feat_per_node for hl in node_encoder_layers: self.node_encoder.append(Lin(hlold, hl)) hlold=hl self.node_encoder.append(Lin(hlold, node_net_out)) self.edge_encoder = ModuleList() hlold = nr_feat_per_edge for hl in edge_encoder_layers: self.edge_encoder.append(Lin(hlold, hl)) hlold=hl self.edge_encoder.append(Lin(hlold, edge_net_out)) self.edge_decoder = ModuleList() hlold = edge_net_out for hl in edge_decoder_layers: self.edge_decoder.append(Lin(hlold, hl)) hlold=hl self.edge_decoder.append(Lin(hlold, 1)) def forward(self,data): node_attr, edge_index, edge_attr = data.x, data.edge_index, data.edge_attr for layer in self.node_encoder[:-1]: node_attr = F.elu(layer(node_attr)) node_attr = self.node_encoder[-1](node_attr) for layer in self.edge_encoder[:-1]: edge_attr = F.elu(layer(edge_attr)) edge_attr = self.edge_encoder[-1](edge_attr) edge_attr_enc = 1 * edge_attr if number_of_graph_layers >= 1: edge_attr = torch.cat([edge_attr, edge_attr_enc], dim = 1) node_attr , edge_attr, u = self.graphnet1(node_attr, edge_index, edge_attr) if number_of_graph_layers >= 2: edge_attr = torch.cat([edge_attr, edge_attr_enc], dim = 1) node_attr , edge_attr, u = self.graphnet2(node_attr, edge_index, edge_attr) if number_of_graph_layers >= 3: edge_attr = torch.cat([edge_attr, edge_attr_enc], dim = 1) node_attr , edge_attr, u = self.graphnet3(node_attr, edge_index, edge_attr) if number_of_graph_layers >= 4: edge_attr = torch.cat([edge_attr, edge_attr_enc], dim = 1) node_attr , edge_attr, u = self.graphnet4(node_attr, edge_index, edge_attr) if number_of_graph_layers >= 5: edge_attr = torch.cat([edge_attr, edge_attr_enc], dim = 1) node_attr , edge_attr, u = self.graphnet5(node_attr, edge_index, edge_attr) if number_of_graph_layers >= 6: edge_attr = torch.cat([edge_attr, edge_attr_enc], dim = 1) node_attr , edge_attr, u = self.graphnet6(node_attr, edge_index, edge_attr) for layer in self.edge_decoder[:-1]: edge_attr = F.elu(layer(edge_attr)) edge_attr = self.edge_decoder[-1](edge_attr) return edge_attr[:,0] #%% def fwd_pass(X, y, train=False): # X = X.to(device) # y = y.to(device) if train: optimizer.zero_grad() outputs = net(X) loss = loss_function(outputs, y) with torch.no_grad(): acc = np.corrcoef(outputs.cpu(), y.cpu())[0, 1] if train: loss.backward() torch.nn.utils.clip_grad_norm_(net.parameters(), 1) optimizer.step() return acc, loss #%% def test(X, y): with torch.no_grad(): val_acc, val_loss = fwd_pass(X, y) return val_acc, val_loss #%% MODEL_NAME = f"model-Duncan" print(MODEL_NAME) PATH = "GraphNetAI"+savename+".pth" PATH = "GraphNetAI_run_r75_TimeRun16.pth" def train(): bestR=0 with open("model"+savename+".log", "w") as f: for epoch in tqdm(range(EPOCHS)): avacc=0 avloss=0 avcounter=0 for data in loader: acc, loss = fwd_pass(data, data.y, train=True) avcounter+=1 avacc+=acc avloss+=loss avacc/=avcounter avloss/=avcounter avval_loss=0 avval_acc=0 avvalcounter=0 for testdata in testloader: val_acc, val_loss = test(testdata, testdata.y) avvalcounter+=1 avval_loss+=val_loss avval_acc+=val_acc avval_loss/=avvalcounter avval_acc/=avvalcounter f.write(f"{MODEL_NAME},{epoch},{round(float(avacc), 2)},{round(float(avloss), 4)},{round(float(avval_acc), 2)},{round(float(avval_loss), 4)}\n") if val_acc>bestR or epoch==0: bestR = val_acc torch.save(net.state_dict(),PATH) #%% def split_data(test_slice): slice_int = int(test_slice*len(graphs)) testgraphs = [x.to(device) for x in graphs[:slice_int]] global testgraph testgraph=testgraphs[0] testgraph.to(device) traingraphs = [x.to(device) for x in graphs[slice_int:]] global traingraph traingraph = traingraphs[0] traingraph.to(device) loader = tg.data.DataLoader(traingraphs, batch_size=BATCH_SIZE, shuffle=True, drop_last=True) testloader = tg.data.DataLoader(testgraphs, batch_size=BATCH_SIZE, shuffle=False, drop_last=False) return loader, testloader # loader, testloader = split_data(0.1) def init_weights(m): if type(m) == Lin: torch.nn.init.xavier_uniform_(m.weight) m.bias.data.fill_(0.01) net = Net().to(device) net.apply(init_weights) optimizer = torch.optim.Adam(net.parameters(), lr=0.001) loss_function=MSELoss() # net(testgraph) #test if it works: this will give errors if any of the network sizes are not matching up #%% if retrain: train() net.load_state_dict(torch.load(PATH)) else: net.load_state_dict(torch.load(PATH, map_location=device)) #%% xscattertrain=[] yscattertrain=[] xscattertest=[] yscattertest=[] with torch.no_grad(): for data in loader: xscattertrain = xscattertrain + net(data).tolist() yscattertrain = yscattertrain + data.y.tolist() for testdata in testloader: xscattertest = xscattertest + net(testdata).tolist() yscattertest = yscattertest + testdata.y.tolist() fileCor = open("CorScat"+savename+".pickle","wb") pickle.dump((xscattertrain, yscattertrain, xscattertest, yscattertest), fileCor) fileCor.close() print("Train R: ", np.corrcoef(xscattertrain, yscattertrain)[0, 1]) print("Test R: ", np.corrcoef(xscattertest, yscattertest)[0, 1]) ``` </details> # Bond forming events revised ## Adding ring statistics and taking the log(n) After seeing how poorly the bond forming events were predicted last time I added ring statistics to the parameters I pass of bond forming events. Right now I only do this for rings that already exist before bonding involving the particles that will bond. This means there are two bonds possible. I could later also include rings in this information that will exist after bonding. I also take the log of the number of events to pass the GN more reasonable numbers. Doing these two things and training a network results in much better correlations between the predicted bond forming times and the measured bond forming times: |Description |Correlation | |-----------|-----------| |Before: Parameters without ring information and log.|R = 0.39 | |Now: Parameters with ring information and log. |R = 0.76 | Where the scatter plot of the two cases look like this: --- Scatter plot without ring statistics and log: ![](https://i.imgur.com/kbJ0Q9g.png) --- Scatter plot with ring statistics and log: ![](https://i.imgur.com/Dwejj5F.png) --- This is a very big improvement but..., when you change the times back from log to the actual value you get a correlation value of R = 0.35. So I might want to try something different. ## Something different Something different is exactly what I did. To overcome the large differences in event statistics between different bond forming events and to also get forming statistics of bonds that have already formed I changed the way we gather data from the particle simulations. Before we would let the system of patchy particles reach equilibrium and then stop any bonds from being formed or broken. However any time a bond would break or form we would record this event. This means the structure of the system would always stay the same and we would get many events of the same bonds. However this does mean that if a bond is very likely to form, chances are it is already formed during equilibration. To omit this feature I used a new approach. Now instead of disallowing all bonds from forming or breaking I take one bond that has already formed during equilibration and allow the patches involved in this bond to break their bond and form it again until this has happened 50 times. While this one bond is breaking and forming I record how long it is in the broken state and how long in the formed state which gives me a measure for the breaking time and forming time of that bond. After 50 forming events I disallow this bond from breaking and forming again and do the same for a different bond. I can repeat this process multiple times to get statistics on many bonds in the system. If we colour the forming times we get the following. ![](https://i.imgur.com/sab5I4U.png) Here you see that I only look at 9 bonds and the colours stand for how long it takes relatively to reform after they break. The reason I only look at 9 bonds is because I only look at one bond at a time while the whole system is undergoing the simulation so it takes some time (10min) and then I move to the next one. If I look at many bonds the simulation would take a long time, however if I do multiple simulations at the same time where each individual simulation only looks at a small number of bonds, it does not take too long. I also show two systems where some of the bonds looked at are not involved in a ring and you see that they seem to have a higher forming time: ![](https://i.imgur.com/RBgzolG.png) Here the bond with the highest forming time is also the only bond not involved with a ring. ![](https://i.imgur.com/BpOyOKF.png) Here the bonds involved in rings have lower forming times than the bonds not involved with rings. This new way of gathering bond event data can hopefully make it easier to train our GN. ## Training more networks. The first obvious network I wanted to train is a network using the data from our new way of gathering simulation data described above. To do this I generated a set of parameters for the systems and bonds for which I did the new simulations. I also generated a set of parameters for bond breaking events using simulation data from before the changes. I set the parameter of each bond and node to 1 to only pass information on the structure of the system. For a graph network this should still give some level of predictability however from results seen before I expect a correlation value of no higher then: R=0.25. Lastly I made a set of parameters with only order parameters where I have 10 sets of parameters with each only one order parameter ranging from the 3rd up to the 10th order parameter.