# Frequently Used Python Codes
Documenting some of the frequently used Python codes for the usage of office 420, MFM.
All codes are tested on python 3.8.x
[TOC]
## File processing
### find current directory
```python=
if __name__ == "__main__":
path = os.getcwd()
```
### create / delete folder
```python=
work_path = os.path.join(path,'new_folder_name')
if os.path.exists(work_path):
shutil.rmtree(work_path) # if the path exists, delete the folder
os.makedirs(work_path)
```
## Text processing
### Search a sub-string in a string
```python=
'''
is_in(full_str, sub_str)
Input:
- full_str: (type:string) the string where the sub-string might located in
- sub_sub: (type:string) the targeted sub-string
Output:
- True: sub_str is in full_str
'''
def is_in(full_str, sub_str):
try:
full_str.index(sub_str)
return True
except ValueError:
```
### Separate a string into a list using member function `split()`
```python=
# Syntax for using split()
#string.split('seperator', maxsplit)
#Examples
str = 'this is string example...fxxking raw!!'
list1 = str.split() # default: take space as delimiter
print(list1)
# ['this', 'is', 'string', 'example....fxxking', ‘raw!!!]
list2 = str.split('i', 1) # take 'w' as delimiter
print(list2)
# ['th', 's ', 's str', 'ng example...fxxk', 'ng raw!!']
list3 = str.split('i', 1) # take the first 'i' as delimiter
print(list3)
# ['th', 's is string example...fxxking raw!!']
```
It is useful when well-structured filenames are used, e.g., `20-100_l4_h5.csv` for the syntax `<power>-<speed>_<layer>_<clip_width>.csv`, then the following code can be used for locating/filtering out the file:
```python=
## work_path is the place curing csv
# obtain the file list under work_path
file_list = os.listdir(work_path)
# filtering the .csv
csv_list = [f for f in file_list if f.split('.')[-1]=='csv']
# filtering the layer == 4
# notice here f.split('_') will return
# ['20-100', 'l4', 'h5.csv']
layer = 4
csv_list = [f for f in file_list if (f.split('.')[-1]=='csv') and
(f.split('_')[1]=='h%i'%layer)]
```
### Combining a list into a string using `join()`
```python=
list = ['This', 'is', 'an', 'example', 'of', 'a', 'list']
string = "'%s'"%' '.join('%s'%word for word in list1) #Notice the <space> in ' ' before .join()
print(string)
# 'This is an example of a list'
string = "%s"%'_'.join('%s'%word for word in list1)
print(string)
# This_is_an_example_of_a_list
```
### Sorting a list using `sort()`
```python
list = [23, 1, 4, 5, 9]
list.sort()
```
## Data processing (using `Numpy` and `Scipy`)
### Output `dict` as `.npy` files
This method is very useful to save metadata encountered during works into readable/sharable file. Format `.npy` from `numpy` is more flexible and extendable than `.json` in many aspects (e.g., allowed data type), though it is less general and universal than `.json`. In that sense, we can make use of `.npy` in our own works (inward), while sharing `.json` for outward cooperation.
**Save**
```python=
dict = {'type': 'FeNi', 'fraction': np.linspace(0,1,100), 'c': 3}
np.save('dict.npy', dict)
```
**Read**
```python=
load_dict = np.load('dict.npy', allow_pickle=True).item()
print(load_dict)
```
remember here the option `allow_pickle` should be `True` to allow necessary parsing. If `False` or `Default`(not set), there would be error messege.
### fitting with Spline
```python=
# x: numpy:ndarray
# y: numpy:ndarray
#
# create spline
spline = InterpolatedUnivariateSpline(x, y, k=1) # k (1 <= k <= 5) is the degree of smoothing
# afterwards, spline is a function
# plottings
plt.plot(x0, spline(x0))
```
for detailed example, see "Compacted Example > Fitting a given set of data using Spline"
### fitting the array as the piecewise linear-interpolated function
```python=
from scipy.interpolate import interp1d
E_intp = interp1d(T, E, kind='linear')
```
## Table processing (using `Pandas`)
### load all sorts of files as `DataFrame`
```python=
import pandas as pd # load Pandas as alias 'pd'
pd.read_csv(filename)
# load from a .csv
pd.read_table(filename, sep=<delimiter>)
# load from a delimited text file. e.g.,
# sep=',' for ',' as delimiter
# sep='\t' for tab as delimiter
pd.read_excel(filename)
# load from a .excel
pd.read_sql(query, connection_object)
# load from a SQL database
pd.read_json(json_string)
# load from a string in JSON format
pd.read_html(url)
# Parse URLs, strings or HTML files and extract tables from them
pd.read_clipboard() # Get the content from your clipboard and pass it to read_table()
pd.DataFrame(dict)
# load from a dictionary object, Key is the column name, Value is the data
```
### saving `DataFrame` as
```python=
df.to_csv('output.csv', index=False)
# Saves the DataFrame as a .csv file
df.to_clipboard(sep=',', index=False)
# Copies the DataFrame to the system clipboard, seperated by commas (,)
df.to_excel("output.xlsx")
# Saves the DataFrame as an excel file
df.to_latex(index=False)
# Converts the DataFrame to a LaTeX input
```
### data selection / slice
```python=
'''
for DataFrame
'''
df[col]
# According to the column name, and return the column as a Series
df[[col1, col2]]
# Return multiple columns as DataFrame
df.iloc[0,:]
# return the first row
df.iloc[0,0]
# Returns the first element of the first column
df.values[:,:-1]
# Returns all data for all columns except the last column
df.query('[1, 2] not in c')
# Return other datasets that do not contain 1, 2 in column c
'''
For Series
'''
s.iloc[0]
# select data by location
s.loc['index_one']
# Select data by index
```
### Shift column by index
```python=
df.shift(periods=-1)
# Shifts the whole DataFrame up by one period and by default last row takes the value NaN
df.shift(periods=3, fill_value=0)
# Shifts the whole dataFrame by the number of periods and fills the given value in place of the shift
df['col2'] = df['col1'].shift(periods = 1)
# Creates a new column called col2 and takes the values from col1 after shifting down one value and by default, NaN is assigned to the first cell in the newly created column
```
### Append a `DataFrame` to the end of another
```python=
df = df.append(dfn)
# both df and dfn are DataFrame
# similar to the list.append, df can be originally an empty DataFrame
```
It is worth noting that this method is also possible for iteratively collect data using `dict`. In that sense, `ignore_index` option should be on
```python=
df = pd.DataFrame()
for T in Tspace:
X = func(T)
df = df.append({'T': T, 'X': Cs}, ignore_index=True)
```
### Rename `column` titles by `dict`
before:

```python=
df=df.rename(columns={'a':'one', 'c':'three'})
```
after:

### Merge two `DataFrame` by matching column values
df1: 
df2: 
```python=
df = pd.merge(df1,df2, on=['Power', 'Speed'])
```
after: 
## General plot Setups (using `matplotlib`)
Possible to use together with the `plot_essentials` module from `MEMER`
### make the logarithmic scale
```python=
ax.set_xscale('log')
ax.set_yscale('log')
```
### make the scientific notation
```python=
## Notice here the 'scilimits' give the range that would not
## be denoted in the scientific way, e.g., (-1, 2) means only
## the data beyond 10^2 and below 10^-1 will be denoted.
ax1.ticklabel_format(style='sci', scilimits=(-1,2), axis='y')
```
# Compacted Examples
## Numerical applications
### make the backward finite differential
$$
f'(a)\approx\frac{f(a)-f(a-h)}{h}
$$
where shifted columns 'a' and 'f' are required

Now shifting the column 'a' and 'f' to create new columns 'a_n' and 'f_n'.
```python=
import pandas as pd
df = pd.read_csv('table.csv') # read a .csv file named 'table.csv'
df['a_n'] = df['a'].shift(-1)
df['f_n'] = df['f'].shift(-1)
df.head()
```

***Notice*** after the shift the last row will fill with 'NA' and should be removed using
```python=
df = df.dropna(axis=0,how='any')
```
make the finite differential using member function `apply()`
```python=
df['d_f']=df[['a','f','a_n','f_n']].apply(lambda x: (x['f_n']-x['f'])/(x['a_n']-x['a']),axis=1)
df.head()
```

### Finding intersection between two lines
This code uses numpy.linalg.solve() to find where two lines intersect with each other using only their endpoints coordinates
```python=
import numpy as np
# Give the endpoints coordinates
# Line 1 passing through points p1 (x1,y1) and p2 (x2,y2)
p1 = [0, 0]
p2 = [1, 1]
# Line 2 passing through points p3 (x3,y3) and p4 (x4,y4)
p3 = [0, 1]
p4 = [1, 0]
# Line 1 dy, dx and determinant
a11 = (p1[1] - p2[1])
a12 = (p2[0] - p1[0])
b1 = (p1[0]*p2[1] - p2[0]*p1[1])
# Line 2 dy, dx and determinant
a21 = (p3[1] - p4[1])
a22 = (p4[0] - p3[0])
b2 = (p3[0]*p4[1] - p4[0]*p3[1])
# Construction of the linear system
# coefficient matrix
A = np.array([[a11, a12],
[a21, a22]])
# right hand side vector
b = -np.array([b1,
b2])
# solve
try:
intersection_point = np.linalg.solve(A,b)
print('Intersection point detected at:', intersection_point)
except np.linalg.LinAlgError:
print('No single intersection point detected')
```
The example above gives the output:
```python=
Intersection point detected at: [0.5 0.5]
```
The answer can also be confirmed graphically by making a simple plot of lines 1 and 2:

### Fitting a given set of data using Spline
This code uses InterpolatedUnivariateSpline from scipy to fit a given set of data points. Either the whole range of the data or some part(s) of the data can be fitted.
```python=
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import InterpolatedUnivariateSpline
# your data
df = pd.read_csv("quadratic_data.csv") # read a .csv file named 'quadratic_data.csv'
# x-space for the spline
x0 = np.linspace(0, 4)
# create spline
s2 = InterpolatedUnivariateSpline(df['x'], df['y'], k=1) # k (1 <= k <= 5) is the degree of smoothing
# plottings
plt.scatter(df['x'], df['y'])
plt.plot(x0, s2(x0))
```
Example: The scattered points to be fitted;

Fitted graph for k =1:

Fitted graph for k =2:

### Applying a function on a `DataFrame` by iterating the columns with an exception
```python=
labels = ['Time','avg_T', 'avg_vm', 'avg_peeq']
df = pd.read_csv('table.csv')
# Exceptions using `if` and `continue`
for l in labels:
if l == 'Time' : continue
#rest of the loop code, for example
df[l] = df.apply(lambda x: x[l] / x['avg(c)'],axis=1)
```
### Grouping and calculation of average values of `DataFrame`
Applying this takes a `DataFrame`, groups values of all columns and groups them along the keys of one specific column.
``` python=
import pandas as pd
import numpy as np
df = pd.DataFrame({'column1':['key1','key1','key2','key2'],
'column2':[1,6,23,2],
'column3':['value11','value11','value22','value22'],
'column4':['value44','value44','value55','value55']})
display(df)
df1 = pd.DataFrame()
df2 = pd.DataFrame()
df1['grouped1'] = df.groupby('column1')['column2'].apply(list)
df1=df1.reset_index()
display(df1)
df2['grouped2'] = df.groupby('column1')['column4'].apply(list)
display(df2)
```
In a second step the average of values sorted to each key is calculated
```python=
df1['avg1']=df1.apply(lambda x: np.mean(x['grouped1']), axis=1)
df1['avg2']=df1['grouped1'].apply(lambda x: np.mean(x))
display(df1)
```
## HPC application
### Generating a quick `batch.sh` for running multiple simulations in series from the sub-folders
This is very useful for $\mathrm{MuMax^3}$-ish simulations, which has relatively less time consumption for each sim but larger in amount. Due to the limited number of GPU, those sims hugely rely on series calc.
The following code is used to quickly write a `batch.sh` for operating all sims from the sub-folders in series.
This can be solely achieved by `bash`, nontheless the provided code can be integrated into other `python` project, say, a batched inputfile generator or so.
```python=
import os, io
# find current directory
if __name__ == "__main__":
path = os.getcwd()
input_name = 'input.mx3'
command = 'mumax3'
folders = os.listdir(path)
work_path = path # can be changed to the folder you want to work with
main_str = str()
for f in folders:
main_str += 'cd %s \n'%(os.path.join(work_path, f))
main_str += '%s %s \n'%(command, input_name)
main_str += 'cd %s \n'%(work_path)
main_str += '\n'
print(main_str)
bash_path = os.path.join(work_path, 'batch.sh')
if os.path.exists(bash_path):
os.remove(bash_path)
outf=io.open(bash_path,'w',newline='\n')
outf.write(main_str)
outf.close()
```
The output then contains:
