Kirsten Pastore
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights New
    • Engagement control
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Note Insights Versions and GitHub Sync Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       Owned this note    Owned this note      
    Published Linked with GitHub
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    # Project Packet ## Background Information Burkholderia sensu lato contains pathogenic, phytopathogenic, symbiotic, and non-symbiotic strains (Estrado-de los Santos, 2018). Burkholderia sensu lato is a collection of closely related genera within the family Burkholderiaceae, and includes the genera Burkholderia, Paraburkholderia, Trinickia, Caballeronia, Mycetohabitans, Robbsia and Pararobbsia (Mullins et al. 2021). Our goal is to examine differences between pathogenic and non-pathogenic species within the family Burkholderiaceae. #### What Strains We Chose / Why In order to explore pathogenicity in Burkholderia sensu lato, or within the Burkholderiaceae family, we selected strains from five pathogenic species and five nonpathogenic species across genuses. The pathogenicity was determined using a maximum-likelihood phylogenetic reconstruction by Eberl et. al (2016). The genuses represented in our species include Burkholderia, Ralstonia, Ceballeronia, Trinickia, and Parabulkholderia. We chose to encorporate species from across genuses because we anticipated that species from within the same genus would be more genetically similar than species across genuses, and this would potentially confound genes relating to pathogenicity. With our selection, we could find patterns in pathogenicity across genuses. Furthermore, three of our selected genuses (Bulkholderia, Cabelleronia, and Trinickia) contain pathogenic and nonpathogenic species. This gave us the opportunity to potentially isolate genes relating to pathogenicity between species that were more closely related (within the same genus). | Pathogenic species | Strain | Genus | NCBI ID | | ------------------ | --------- | ------------ | --------------- | | B. pseudomallei | MSHR346 | Bulkholderia | GCF_000182585.2 | | R. pickettii | OR214 | Ralstonia | GCF_000372665.1 | | B. cepacia | AU41368 | Burkholderia | GCF_020419785.1 | | C. concitans | LMG 29315 | Ceballeronia | GCF_001544615.1 | | T. caryophylli | DSM 50341 | Trinickia | GCF_007097545.1 | | Nonpathogenic species | Strain | Genus | NCBI ID | | --------------------- | ---------- | ---------------- | --------------- | | B. thailandensis | 2002721643 | Bulkholderia | GCF_000959425.1 | | P. xenovorans | LB400 | Parabulkholderia | GCF_000013645.1 | | C. sordidicola | LMG 22029 | Caballeronia | GCF_001544455.2 | | T. symbiotica | JPY 581 | Trinickia | GCF_002879935.1 | | P. sprentiae | WSM5005 | Parabulkholderia | GCF_001865575.2 | ## Data Collection ### 1) Download .fna files for all genomes from NCBI First, download NCBI files onto desktop. In finder, use command+K and enter "smb://filer.colby.edu". Find your personal folder (or the folder being used as your working directory) under BI278 and upload the .fna files from your desktop into the folder. Rename the files with the format (genus initial).(species name)(strain).fna so that you can identify each strain in terminal. SN - these files can be found in `/courses/bi278/cgsnow25/project/fna_files` ## GC% and GC-Count #### Why GC% (Benchmark study) Genome reduction is associated with bacterial pathogenicity, and decreased GC% may be a consequence of “host-restricted” lifestyle. Therefore, this can be can be an indicator of pathogenicity. Pathogenic species were found to have a 65-69% GC content, while non-pathogenic species were found to have a lower GC content of 61-65%(Eberl et. al) #### Shell script info 1) Create shell script a) `nano` b) The lines used in the shell script are as follows: - ``#!/bin/bash`` - ``grep -v ">" /courses/bi278/cgsnow25/fna_files/$1 | tr -d -c ATGCatgc | wc $`` - ``grep -v ">" /courses/bi278/cgsnow25/fna_files/$1 | tr -d -c GCgc | wc -c `` 2) Run shell script a)Control+X to exit, name the file ``GC%_genomesize.sh`` b) sh `GC%_genomesize.sh fasta_filename` to run 3) Manually found GC% by importing data into excel and dividing GC content by genome size and applying the equation to the entire series #### GC% Results (Table) | File | Genome Size | GC Content | GC% | | ----------------------------- | ----------- | --------- | ------ | | b.cepaciaAU41368.fna* | 8195038 | 5487852 | 66.965 | | b.pseudomalleiMSHR346.fna* | 7348945 | 4989882 | 67.899 | | b.thailandensis2002721643.fna | 6950257 | 4691788 | 67.505 | | c.concitansLMG_29315.fna* | 6166171 | 3898219 | 63.219 | | c.sordidicolaLMG22029.fna | 6874511 | 4134842 | 60.147 | | p.strentiaeWSM5005.fna | 7840207 | 4956019 | 63.213 | | p.xenovoransLB400.fna | 9731138 | 6094399 | 62.678 | | r.pickettiiOR214.fna* | 5480587 | 3475413 | 63.413 | | t.caryophylli.fna* | 6657033 | 4307218 | 64.717 | | t.symbioticaJPY581.fna | 6752106 | 4251698 | 62.968 | "*" denotes pathogenic species ## GC Content Results/Analysis The GC content does not necessarily support the information described in our benchmark study (65-69 for pethogenic, 61-65 for non-pathogenic.) However, once the average of the GC content is determined the pathogenic species had a 3.015 percent higher average GC content (65.242%) than the nonpathogenic species (63.304%) This would support a potential way that pathogenicity can be indicated through the genome content. Additionally, these averages both fall within the aforementioned predicted ranges. In order to analyze the GC content further, we used an online program called GC Profile to help illustrate the GC level across the genome. GC-Profile provides a quantitative and qualitative view of genome organization and based on the obtained results, the relationships between the G+C content and other genomic features can be deduced. Once the FASTA files for the chromosomal information from each species are downloaded, they can be uploaded to GC Profile and then graphed. Pathogenic species ![](https://i.imgur.com/sdvhxLW.png) ![](https://i.imgur.com/XxjN2qi.png) Non-Pathogenic species ![](https://i.imgur.com/wSQkhiP.png) ![](https://i.imgur.com/7a3jiv1.png) This provides a more visual model to support that the GC content for the pathogenic species is higher than the nonpathogenic species. The X-axis shows the length of the sequence and the Y-axis shows the GC content. ## PROKKA Annotation on Nucleotide FASTA Files #### Why We Are Doing a PROKKA Annotation / How Leads to Objective ### 1) Make New Directory for PROKKA Output mkdir ``PATH``/PROKKA ### 2) Run PROKKA Annotation prokka --force --outdir ``PATH/``PROKKA/``bcepacia`` --locustag ``AU41368`` --genus ``Burkholderia`` --species ``cepacia`` --strain ``au41368`` ``PATH/b.cepaciaAU41368.fna`` **This returns files**: **.faa:** Protein FASTA file of the translated CDS sequences **.ffn:** Nucleotide FASTA file of all the prediction transcripts **.fna:** Nucleotide FASTA file of the input contig sequences **.gff:** This is the master annotation in GFF3 formation, containing both sequences and annotations **.txt:** Statistics relating to the annotated features **.tsv:** Tab-separated file of all features: locus_tag, ftype, len_bp, gene, EC_number, COG, product ### 3) Look at Statistics relating to annotated features cat ``PATH``/PROKKA/*/*.txt #### Visualization in R ## OrthoFinder #### Why OrthoFinder / What Does a Species Tree Tell Us OrthoFinder detects homolgous groups of genes that are descended from a single ancestor gene. The species tree which Orthofinder generates is a STAG species tree inferred from all orthogroups. STAG (Species Tree inference from All Genes) is an algorithm for inferring a species tree from sets of multi-copy gene trees (https://www.researchgate.net/publication/345678178_STAG_Species_Tree_Inference_from_All_Genes) ### 1) Make directory and move all .faa files into folder (10 Files) mkdir ``PATH/``faa_files (in ``PATH/``PROKKA/) mv *.faa ``PATH``/faa_files/ SN - these files can be found in `/courses/bi278/cgsnow25/project/faa_files` ### 2) Run OrthoFinder within this directory orthofinder -f ./ -X SN - results can be found in `/courses/bi278/cgsnow25/project/OrthoFinder/Results_Nov16` ### 3) Create Species Tree cat``PATH``/OrthoFinder/Results_*/Species_Tree/SpeciesTree_rooted_node_labels_txt copy and paste - visualize in iToL ![](https://i.imgur.com/WvV01wF.png) The species tree generated presents a hypothesis for the evolutionary relationships between our selected organisms. The longer branch for R.pickettii depicts a more distant phylogenetic distance from the other organisms. B. pseudomallei and B. thailandensis are more closely related to eachother than either is to B. cepacia despite the former being a pathogen and the latter being non-pathogenic. ## BLAST - looking for T3SS #### What is the VFMD Type III Secretion Systems (T3SSs) are critical virulence factors in Gram negative pathogens of plants (Vander Broek et al. 2017). The T3SS acts as a protein delivery meachine for intra-cellular secretion into eukaryotic cells (Wallner et al. 2021). T3SS are generally considered as a markeer of pathogenicity. Sct proteins are highly conserved across T3SS types. First, we create a blast database composed of our ten organisms. ### 1a) Create a New Directory mkdir ``PATH``/myblastdb SN - this directory with all databases already made is here `/courses/bi278/cgsnow25/project/faa_files/myblastdb` ### 1b) Add Directory to $BLASTDB export BLASTDB="``PATH``/myblastdb" ### 2) Make Protein Databases makeblastdb -in ``PATH/faa_files/bcepacia.faa`` -dbtype prot -title ``bcep_prot`` -out ``myblastdb/bcep_prot`` -parse_seqids *note: Example using B. cepacia, this was performed for each of the species. ### 3) Combine BLAST Databases of the Same Type blast aliastook -dbtype prot -title project_prot -out myblastdb/project_prot -dblist "``bcep_prot btha_prot bpse_prot pxen_prot pspr_prot rpic_prot ccon_prot csor_prot tcar_prot tsym_prot``" ### 4) Subset VFDB to just proteins that contain "Sct" awk 'BEGIN {RS=">"} /Sct/ {print">"$0}' ORS="" VFDB.faa > sct_prot.faa SN - this file and other intermediary files and scripts for the rest of this section are here `/courses/bi278/cgsnow25/project/` ### 5) Use Sct Protein File to Run BLASTP blastp -query ~/project_folder/sct_prot.faa -db project_prot -evalue 0.00001 -outfmt 6 > sct_table.table ### 6) Get List of All Subject Genes awk '{print $2}' sct_table.table > listOfGenes.txt ### 7) Create (& Use) Counter Shell Script The shell script takes the list of genes generated above and generates a tab-separated output describing how many genes for each species are subjects of the blast search. SN - the `wc -c` has been corrected to `wc -l` in the script (same folder as above) ![](https://i.imgur.com/2H0OSee.png) sh count_types.sh listOfGenes.txt > sct_tsvfile.tsv ### 8) Add Pathogenicity Status to Output TSV File ### 8a) Make / Use Pathogenicity Shell Script ![](https://i.imgur.com/0OFSkMT.png) sh NP_or_P.sh > Pathogenicity.tsv ### 8b) Add Row to Gene Count File awk -F "\t" 'NR==FNR{a[$1]=$2;next} {print $1, $2, a[$1]}' OFS="\t" Pathogenic.tsv sct_tsvfile.tsv > sct_output_counts.tsv ### 9) Read Table Into R sct_table <- read.table(file = "/home2/klpast23/project_folder/sct_output_counts.tsv", sep = "\t", header = TRUE) ### 10) Change the Column Names colnames(sct_table) <- c("Species", "Gene Count", "Pathogenicity") ### 11) Assign the Variables to a Dataframe Species <- sct_table$species Gene_Count <- sct_table$'Gene Count' Pathogenicity <- sct_table$Pathogenicity ### 12) Make the Bar Graph sct_bar <- ggplot(data = sct_table, aes(x = species, y = genes, fill = Pathogenicity)) + geom_bar(stat = "identity") ![](https://i.imgur.com/tLIqoSg.png) The bar graph demonstrates that sct genes exist in pathogenic and non-pathogenic species. That being said, T. caryophylli and B. pseudomallei both has the highest count of sct genes and are pathogenic. ## Using COGs to find Orthologous Genes In order to compare our ten genomes against each other, we used rpsblast to compare the protein files against a database of protein profiles (a reverse position-specific blast). We used COGs (clusters of orthologous groups), which defines orthologous groups of proteins. We used a COG database to identify COGs and their functions within our protein files. Ultimately, by creating a list of COGs and their frequencies within the protein file for each species, we are able to compare the frequencies of proteins across the species genomes. By visualizing the data in R, we hope to find a pattern between the frequency of COGs of certain functions as related to pathogenicity. ### 1) Establish location to look for BLAST databases In a folder with all of the protein fasta files (.faa), export BLAST DB (our folder was called "faa_files") with: ``export BLASTDB="/courses/bi278/Course_Materials/blastdb"`` ``echo $BLASTDB`` ### 2) Run rpsblast Run rpsblast on the protein fasta file for a given species. As an example, we used bcepacia (bcepacia.faa). ``rpsblast -query bcepacia.faa -db /courses/bi278/Course_Materials/blastdb/Cog -evalue 0.01 -max_target_seqs 25 -outfmt "6 qseqid sseqid evalue qcovs stitle" > bcepacia.rpsblast.table`` ### 3) Narrow down results We want to keep only the most significant COG matches for each query sequence, so we're only using matches with a query coverage of at least 70. ``awk -F'[\t,]' '!x[$1]++ && $4>=70 {print $1,$5}' OFS="\t" bcepacia.rpsblast.table > bcepacia.cog.table`` ### 4) Categorize COGS #### a) Look up functional categories in ``cognames2003-2014.tab`` ``awk -F "\t" 'NR==FNR {a[$1]=$2;next} {if ($2 in a){print $1, $2, a[$2]} else {print $0}}' OFS="\t" /courses/bi278/Course_Materials/blastdb/cognames2003-2014.tab bcepacia.cog.table > temp`` #### b) Keep first (primary) category for each COG number ``awk -F "\t" '{if ( length($3)>1 ) { $3 = substr($3, 0, 1) } else { $3 = $3 }; print}' OFS="\t" temp > temp2`` #### c) Add full description of the functional category in ``fun2003-2014.tab`` ``awk -F "\t" 'NR==FNR {a[$1]=$2;next} {if ($3 in a){print $0, a[$3]} else {print $0}}' OFS="\t" /courses/bi278/Course_Materials/blastdb/fun2003-2014.tab temp2 > bcepacia.cog.categorized`` ### 5) Remove temp files ``rm temp temp2`` ### 6) Repeat for other .faa files ### 7) Create frequency graph in excel - In finder, open each file and copy and paste each into own tab in excel - In excel, use COUNTIF (ex. ``=COUNTIF(C:C,"A")`` for the COG function associated with "A", RNA processing and modification) to find the frequency of each COG in each file, and make a table in each subsheet - Make a subsheet that compiles the frequency counts for each COG (using ``=VALUE('sheetname'!cell)``) - Add a column classifying each species as pathogenic or nonpathogenic - Save this final subsheet as a .txt file and then to a .tsv file. This is what our final sheet looked like: ![](https://i.imgur.com/hiiDidS.png) SN - the resulting file is this one `/courses/bi278/cgsnow25/project/COG_frequencies3.txt` ## Visualization in R ### 1) Create Workspace Set working directory to ``courses/bi278/cgsnow25/project`` ### 2) Launch the following libraries ``` library(dendextend) library(ggplot2) library(vegan) library(reshape2) library(ggrepel) x<-c("dendextend", "ggplot2", "vegan", "reshape2", "ggrepel") lapply(x, require, character.only = TRUE) ``` ### 3) Upload file Read file as cdata ``cdata <- read.table("COG_frequencies3.txt", h=T)`` and ``head(cdata)`` ### 4) Convert data from wide to long ``` cdata <- read.table("multivariate_data.txt", h=T) path <- c(rep("green",5), rep("blue",4)) cdata$path <- path cdata.long <- melt(cdata[,1:11]) head(cdata.long) ``` ### 6) Create box plots ``ggplot(cdata.long, aes(x=Pathogenicity, y=value, fill=Pathogenicity)) + geom_boxplot() + facet_wrap(.~variable, scales="free")`` The final box plots looked like this: ![](https://i.imgur.com/4k49fDW.jpg) This data reflects significant differences in the frequency of COG functions between pathogenic and nonpathogenic species, especially in G (carbohydrate transport and metabolism), U (intracellular trafficking, secretion, and vesicular transport), and V (defense mechanisms), for example. ## Citations https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6116057/ https://www.frontiersin.org/articles/10.3389/fmicb.2021.726847/full https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5471309/ https://www.frontiersin.org/articles/10.3389/fmicb.2021.761215/full https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4882756/ https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1538862/

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully