prokka lab report

## Exercise questions: 1. --cpus [N]: This option specifies the number of CPUs to use for the annotation process. The default value is '8'. Modifying this value can speed up or slow down the annotation process depending on the computational resources available. For example, setting a higher number of CPUs on a multi-core machine can significantly reduce the processing time. --kingdom [X]: This option sets the annotation mode based on the organism's kingdom, with choices like Archaea, Bacteria, Mitochondria, or Viruses. The default value is 'Bacteria'. Changing this option is crucial when annotating genomes from different kingdoms, as it influences the selection of gene prediction models and databases used for annotation, ensuring more accurate results specific to the organism's kingdom. 2. this dociment provides some basic information for suspected CDS (coding dna sequences) like their location in bp on the genome, the uniprot database code for some known protein (e.g., the product Protein 7a with uniprot id P59635). The label "hypothetical" is used when a predicted protein-coding region (CDS) does not have a known or well-characterized function. It indicates that while the sequence appears to encode a protein, there's insufficient evidence to determine its function confidently. I confirmed my prediction by searching the code for these CDS on the uniprot database, and I found them there. 3. I think it's because the RNA is transcripted from the 3' to 5' DNA strand, so the RNA is 5' to 3', and the following translation works on this 5' to 3' strand. Therefore, the annotation is on the (+) stard, which corresponds to the RNA direction. 4. The blastx is used here because it's the tool designed to search from a DNA sequence file to known protein sequence database. 5. There are some more annotations in the blastx searching result, like the envelope protein between the membrane glycoprotein and the ORF3A protein. 6. The prokka database are contained with the model, so I don't need to upload any additional databases to analyze my sequence. ## Part 2 ![微信图片_20231207121525](https://hackmd.io/_uploads/ByuEdjJL6.png) The annotation from genemark are about to be same (without names and functiosn of each gene) to the result we got from the prokka. The result from blast is more detailed with some more slices for each protein, and there are some additional proteins, like the product protein 7a I mentioned before, than the other two. But they are still about to be the same.