# Nucleosome Research Project
Some resources
---
:::info
- [x] PDB files for the 'news' systems
- [**file link**](https://www.icloud.com/iclouddrive/091dHjzuzOBdpxD2H_GgmQlYw#Cases-octa)
- [x] Some stuctrual fiting and analysis methods
- [AAMD paper, check their **Analysis Method**](https://www.icloud.com/iclouddrive/0f6eHHXLn5-mnfvb3HLZBRHKA#nature-comm-2021-aamd-paper)
- [**do_x3dna**](https://do-x3dna.readthedocs.io/en/latest/index.html); [paper link](https://europepmc.org/article/med/25838463)
- [x] Some papers about coarse-grained modeling of nucleosomes
- [**a nice review paper**](https://www.icloud.com/iclouddrive/030VVX4GqKlRlNfOtnRSe9uSw#review-2020)
- [ ] [**Nuclesome Structure Anaylysis paper**](https://www.nature.com/articles/s41598-018-19875-0)
- github repository: **https://github.com/xinmeng2020/nucleosome-analysis.git**
MD protocol
---
- FIX missing residues using Modeller: https://salilab.org/modeller/wiki/Missing_residues
:::
Meetings
---
:::warning
- [ ] week 35, 30.08.2022 at 2pm, Michele and Manuel
:::
Simulations
---
:::info
- Files
- [ ] EM with DNA extension
:::
Analysis
---
:::info
- How to...?
- do principal component analysis (solely considering C_alpha histones)
- opening mode
- analyze helical parameters
:::
## Raw Protocols
### prepare the initial pdb and fasta
- system: **raw** protein + **full** DNA
- illustration: H3; Mira can finishe the CA case
$ mkdir handmake-h3-rawProtein-fullDNA
$ cd handmake-h3-rawProtein-fullDNA
cp ../h3fixDNA/H3octasome-with465-renamechain-fullDNA.pdb .
$ cp ../h3fixDNA/h3.fasta .
// Now the pdb file is corret for our purpose, but the fasta file contains the full sequene of the protein chains
// So we have to 'chop' the fasta file deliberately to not inlcude the protein tails
// The first thing we have to figure out is what is the sequence contains inside pdb, then we can compare with the fasta file, then throw the part that is not needed in the fasta file
:::question::: How to know the sequence given in a pdb quickly
**Solution**
use services/codes to perform pdb to fasta sequences, e.g. https://zhanggroup.org/pdb2fasta/
:question: In pdb file begin to copy from ATOM lines
Cross check results
results:
>pdb:A
PHRYRPGTVALREIRRYQKSTELLIRKLPFQRLVREIAQDFKTDLRFQSAAIGALQEASEAYLVGLFEDTNLCAIHAKRVTIMPKDIQLARRIRGER
>pdb:B
NIQGITKPAIRRLARRGGVKRISGLIYEETRGVLKVFLENVIRDAVTYTEHAKRKTVTAMDVVYALKRQGRTLYGFGG
>pdb:C
PHRYRPGTVALREIRRYQKSTELLIRKLPFQRLVREIAQDFKTDLRFQSAAIGALQEASEAYLVGLFEDTNLCAIHAKRVTIMPKDIQLARRIRGERA
>pdb:D
RKVLRDNIQGITKPAIRRLARRGGVKRISGLIYEETRGVLKVFLENVIRDAVTYTEHAKRKTVTAMDVVYALKRQGRTLYGFGG
>pdb:E
PHRYRPGTVALREIRRYQKSTELLIRKLPFQRLVREIAQDFKTDLRFQSAAIGALQEASEAYLVGLFEDTNLCAIHAKRVTIMPKDIQLARRIRGER
>pdb:F
NIQGITKPAIRRLARRGGVKRISGLIYEETRGVLKVFLENVIRDAVTYTEHAKRKTVTAMDVVYALKRQGRTLYGFGG
>pdb:G
PHRYRPGTVALREIRRYQKSTELLIRKLPFQRLVREIAQDFKTDLRFQSAAIGALQEASEAYLVGLFEDTNLCAIHAKRVTIMPKDIQLARRIRGERA
>pdb:H
RKVLRDNIQGITKPAIRRLARRGGVKRISGLIYEETRGVLKVFLENVIRDAVTYTEHAKRKTVTAMDVVYALKRQGRTLYGFGG
Now we can search for the obtained sequence (which is the converted pdb to fasta sequence) in the full fasta sequence => find the match and remove the rest
***Here, we may make the A,C,E,G chains all the same, thus keep the last A residue which is missing in chain A and E***
Chain B and F are the same, chain D and H have additional resiudes. Must split the common chains B, D, F, H into two chains.
For chain A, that means to keep ALA 135 in pdb file.
Create new pdb file and remove REMARK 465 residues that also were delted in the fasta file. That is, the new pdb file and the fasta file contain the same residues (also concerning the number of residues).
Chain A: deleted res 0-37 keep res ALA 135
Chain B: deleted res 0-24
Chain C: deleted res 0-37
Chain D: deleted res 0-18
Chain E: deleted res 0-37 keep res ALA 135
Chain F: deleted res 0-24
Chain G: deleted res 0-37
Chain H: deleted res 0-18
Then we can save the fasta file.
:::warning::: The DNA is not processed in here (that is the entire DNA extention is kept), but we also have a protocol to deal with DNA. Just skipped for now.
So if we have the PDB and Fasta files, then we can process with our simualtion protocols.
For CA
$mkdir handmake-ca-rawProtein-fullDNA
$cd handmake-ca-rawProtein-fullDNA/
$cp ../cafixDNA/CA-octasome-with465-renamechain-fullDNA.pdb .
$cp ../cafixDNA/ca-chainrename.fasta .
>pdb:A
RRRQGWLKEIRKLQKSTHLLIRKLPFSRLAREICVKFTRGVDFNWQAQALLALQEAAEAFLVHLFEDAYLLTLHAGRVTLFPKDVQLARRIRGLEEGLG
>pdb:B
RDNIQGITKPAIRRLARRGGVKRISGLIYEETRGVLKVFLENVIRDAVTYTEHAKRKTVTAMDVVYALKRQGRTLYGFGG
>pdb:C
RRRQGWLKEIRKLQKSTHLLIRKLPFSRLAREICVKFTRGVDFNWQAQALLALQEAAEAFLVHLFEDAYLLTLHAGRVTLFPKDVQLARRIRGLEEGLG
>pdb:D
RDNIQGITKPAIRRLARRGGVKRISGLIYEETRGVLKVFLENVIRDAVTYTEHAKRKTVTAMDVVYALKRQGRTLYGFGG
>pdb:E
RRRQGWLKEIRKLQKSTHLLIRKLPFSRLAREICVKFTRGVDFNWQAQALLALQEAAEAFLVHLFEDAYLLTLHAGRVTLFPKDVQLARRIRGLEEGLG
>pdb:F
RDNIQGITKPAIRRLARRGGVKRISGLIYEETRGVLKVFLENVIRDAVTYTEHAKRKTVTAMDVVYALKRQGRTLYGFGG
>pdb:G
RRRQGWLKEIRKLQKSTHLLIRKLPFSRLAREICVKFTRGVDFNWQAQALLALQEAAEAFLVHLFEDAYLLTLHAGRVTLFPKDVQLARRIRGLEEGLG
>pdb:H
RDNIQGITKPAIRRLARRGGVKRISGLIYEETRGVLKVFLENVIRDAVTYTEHAKRKTVTAMDVVYALKRQGRTLYGFGG
Chain A,C,E,G all equal as are chains B,D,F,H
Create new pdb file and remove REMARK 465 residues that also were deleted in the fasta file. That is, the new pdb file and the fasta file contain the same residues (also concerning the number of residues).
Chain A: deleted res 1-41
Chain B: deleted res 0-22
Chain C: deleted res 1-41
Chain D: deleted res 0-22
Chain E: deleted res 1-42
Chain F: deleted res 0-22
Chain G: deleted res 1-41
Chain H: deleted res 0-22
:Warning: Make sure to delete all additional REMARK 465 entries if there are no missing residues at all
## Rename Chains
For pre-porecessing pdb file done in sublime
* Alphabetically order chains
* Rename chains from A to J
For caRawProteinRawDNA
Chain E to C delete 817 atoms
Chain F to D
Cahin K to E delete 817 atoms
Chain L to F delete 639 atoms
Chain O to G 817
Chain P to H 639
h3RawProteinRawDNA
Chain E to C delete 804 atoms
Chain F to D 674
Cahin K to E 798
Chain L to F 620
Chain O to G 804
Chain P to H 674
### Task
- [ ] get the consistent PDB and FASTA files for the H3 and CA cases
### Reference
**Fasta sequence**