# RepeatModeler2 / RepeatMasker ###### tags: `program commands` ## RepeatModeler2 [full documentation](http://repeatmasker.org/RepeatModeler/) RepeatModeler2 identifies repeats and assemble consensus sequences from a genome assembly. It attemps a basic classification based on the DFAM database. genome.fasta (genome sequence) --> RM2 --> genome-families.fasta (TE consensi) Example for a given genome called "genome" #### 1. Build database for RM2 ``` <RepeatModelerPath>/BuildDatabase -name genome genome.fa ``` #### 2. Run RM2 ``` nohup <RepeatModelerPath>/RepeatModeler -database genome -pa 20 -LTRStruct >& run.out & ``` > `LTRStruct` enables the LTR module of RM2 #### 3. Output file to keep is `genome-families.fa` ## RepeatMasker full documentation RepeatMasker will identify repeats on the genome using the library made and annotated by RepeatModeler2 `genome-families.fa`. The default engine is rmblastn (modified version of blastn for RepeatMasker). ``` nohup <RepeatMaskerPath>/RepeatMasker -pa 15 -a -s -gccalc -gff -cutoff 200 -no_is -lib genome-families.fa genome.fa ``` > `-pa`: CPUs **WARNING** RepeatMasker multiplies CPU x 4 using `rmblastn` !!! > `-a`: .align file (needed for TE landscapes) > `-s`: "slow"-search mode (recommended) > `-gccalc`: computes the gc content > `-gff`: produces a gff track > `-cutoff 200`: min size to keep hit (recommended) > `-no_is`: don't look for insertion sequences (prokaryotic TE)