# **Network biology approach to study ECF sigma factor regulatory network in _Mycobacterium tuberculosis_** ###### tags: `science` `Mycobacterium` `thesis` `Systems biology` ### Abstract ECF, Extra cytoplasmic function sigma factors help the pathogenic bacteria to surpass stress responses thereby aiding the survival and adaptation to non-harmonious conditions. This thesis is an attempt to reconstruct and analyse the regulatory network of the ECF sigma factors in _Mycobacterium tuberculosis_ and the components that mainly control its cellular concentration: TFs that transcribe them and the genes the sigma factor help to transcribe. Such a network can provide us a glimpse of the master regulators of the _Mycobacterium tuberculosis_ regulatory network that helps in its survival by activating stress response genes. Till date, little efforts have been made to combat bypass mechanisms and systems biology can perhaps provide a new insight into embarking on this problem. Targeting these sigma factors will most likely disturb the adaptive mechanism of bacterium and thus their survival in the host. --- ### Introduction Battle against tuberculosis, caused by the bacteria _Mycobacterium tuberculosis_ started in 1720 when for the first time TB was hypothesised by Benjamin Marten, an English physician (Barberis et al, 2017). TB is included in 10 topmost lethal diseases caused by a single infectious agent (Gosce et al, 2019). Curbing the TB by 2030 is one of the United Nations Sustainable Development Goals (Lönnroth, & Raviglione, 2016). Hence, there is need for effective research strategies which can be implemented to eliminate the disease. In bacteria, a major way of regulation at transcriptional level is through use of an alternative sigma factor in place of the primary sigma subunit of the enzyme RNA polymerase (RNAP). This leads to switching RNAP towards promoters of stress specific genes thus rapidly modulating gene expression. Most species of bacteria have many groups of alternative sigma factors; the largest group among them is of Extra Cytoplasmic Function (ECF) family (Sineva et al, 2017). Pathogenic bacteria are compelled to adapt to the changing environment within the host during the course of infection and fight back the adaptive response of the host (Reddick et. al, 2014). These are named as ECF (Extra Cytoplasmic Function) because bacteria utilize them to regulate their extra cytoplasmic functions. A unique feature of _M. tuberculosis_ is its ability to persist in host as a latent infection, where it lies dormant and hence escapes detection by the host immune system (Gengenbecher et al, 2012). In order to focus on survival strategy of bacteria scientists have come across the ECF sigma factors which are key players of the Mycobacterium in the host (Flentie et al, 2016). ECF sigma factors are alternative sigma factors which are small regulatory proteins.They have divergent sequence compared to most of the sigma factors (Heimann, 2002). They were first appreciated as a definite subclass of σ70-like factors in the year 1994 (Sineva et al,2017). These alternative ECF sigma factors play key roles in coordinating the transcription of genes associated with the process of sensing and responding to changes in bacterial periplasm and extracellular environment (Kazmierczak et al ,2005). _Mycobacterium_ has 10 ECF sigma factors namely, Sigma C (Sun et al, 2004), Sigma D (Raman et al, 2004), Sigma E (Manganelli et al,2001), Sigma G (Lee et al,2008), Sigma H (Manganelli et al, 2002), Sigma I (Gupta et al, 2020), Sigma J (Goutam et al,2017 ; Homerova et al,2008), Sigma K (Verier et al, 2008), Sigma L (Hahn et al, 2005) and Sigma M (Agrawal et al, 2007). The 3 major themes which are common to all of the ECF sigma factors are: They respond to and also regulate the extra cytoplasmic functions. Also they are regulated by anti-sigma (regulates ECF sigma factors) and/or anti-anti-sigma factors (regulates anti-sigma factors of ECFs). Most of them have control over a relatively small regulon (Bashyam & Hasnain , 2004). ECF sigma factor C is involved in information pathways (Lew et al, 2011). Sigma C modulates virulence associated genes (Sun et al, 2004). ECF sigma factor D has role in information pathways, stringent response and during starvation (Sachdeva et al, 2010). ECF sigma E has role in SDS and heat shock response (Bashyam and Hasnain, 2004). Sigma G is required for survival of pathogens in macrophage (Cappelli et al, 2006). Sigma H is required in oxidative stress response (Sachdeva et al , 2010). Sigma I contributes to cold shock adaptation of bacteria (Sachdeva et al, 2010). Sigma J is required for survival in macrophages (Cappelli et al, 2006). Sigma L has an important role in regulation of polyketide synthases and secreted proteins (Hahn et al, 2005). Sigma M helps in adaptation of bacteria in specific environments of the host for long term (Raman et al , 2006). Transcription Factors are known to regulate gene expression by binding to specific regions of the DNA and either promoting or blocking the formation of the transcription unit. In other words, they switch on/off genes and provide the cell with the right set of transcriptome it is in need of. Often, groups of transcription factors are involved in a co-ordinated fashion to make protein that are involves in various vital cellular processes that involve cell division, growth and death. Transcription factors work as a unit, and work by recruiting the RNAP to copy the DNA to mRNA. They possess a DNA-binding domain that specifically binds to a small region in the DNA (enhancer or promoter) to ensure that only required genes are transcribed. They also possess another domain called the activation domain that provides binding sites for various transcription co-regulators. --- ### Structure Due to lack of up-to-date research on _Mycobacterium tuberculosis_, there are not many experimental datasets as well as database submissions both in the form of interactions data as well as structure data. So, the aim is to integrate all types of resources available at hand, including raw data, to devise a Systems Biology approach to study the Sigma regulatory network of _Mycobacterium tuberculosis._ ![](https://i.imgur.com/5u33v4G.png) In all, 13 sigma factors namely sigA, sigB, sigC, sigD, sigE, sigF, sigG, sigH, sigI, sigJ, sigK, sigL, sigM were retrieved from various resources to date and using literature, this network was reconstructed using the Gephi Data Laboratory. * ***Construction of TF-Sigma factor regulatory network*** For this step, TFs were gathered from literature based sources and various ChIP experiments conducted to date and integrated into the already reconstructed Sigma regulatory network by preparing fresh nodes and edges tables and importing them into the Gephi data laboratory. * ***Construction of Sigma factor- gene Regulatory network*** The network was then enriched with the genes that the Sigma factors helped in transcribing. The data gathered was mainly from literature sources with confirmed ChIP experimental datasets, and integrated into the previous TF-Sigma regulatory network by preparing fresh nodes and edges tables and importing them into the Gephi data laboratory. * ***Integration and TF, Sigma factors and Genes into a single network*** After the following has been done, we have successfully obtained a TF-Sigma-Gene regulatory network for _Mycobacterium tuberculosis._ --- ### Results and Discussion The sigma regulatory network of _Mycobacterium tuberculosis_ can be represented in an elegant way as a graph G= (V, E) where V= set of vertices and E= set of edges, much in a similar way as in graph theoretical approaches. The network re-constructed here with the help of various literature sources is a directed regulatory graph which shows a network of sigma factors (nodes marked as 1, 2, 3, ...) and edges between two adjacent nodes shows a direct regulatory control. ![](https://i.imgur.com/1fW2rxL.png) Incorporating TFs and genes are crucial to complete the entire network of ECFs because they are the determining factors of the control of the expression of the ECF sigma factors itself as well as the genes that ECF sigma factors produce as response to stress. ![](https://i.imgur.com/5wqG1BT.png) ***Figure:*** A circular layout showing the complete TF-sigma-gene regulatory network of _Mycobacterium tuberculosis_, sigma factors highlighted in orange and the TFs and genes in blue in order of decreasing nodal degree. The _Mycobacterium tuberculosis_ TF-sigma-gene regulatory network (M) was converted to a Cytoscape equivalent network and various network topological parameters were calculated using the Network Analyzer plugin. The scatter-plots of the following parameters were generated: (a) Closeness centrality (b) Betweenness centrality (c) average clustering coefficient (d) stress centrality (e) average connectivity as shown. These were fitted with power law and required R-squared adjustments as shown (a)![](https://i.imgur.com/boKenhc.png) (b) ![](https://i.imgur.com/v7mgJXX.png) (c )![](https://i.imgur.com/sIQAXjM.png) (d)![](https://i.imgur.com/2kXEeTV.png) (e)![](https://i.imgur.com/pr32C8P.png) ***Figure:*** Cytoscape analysis of the sigma regulatory network using network analyser of Cytoscape (v7) **(a)** Closeness centrality **(b)** Betweenness centrality **(c )** average clustering coefficient **(d)** stress centrality **(e)** average connectivity FAG-EC or, Fast hierarchical agglomerative algorithm was used to find out the clusters shown above, using the default strictest parameter settings given by ClusterViz. Complex size threshold was set to value=2. This gave us two tight clusters in the TF-sigma-gene regulatory network (M) as shown in Figure. The two clusters had modularity of 3.66 and 3.16 respectively. (a)![](https://i.imgur.com/lUUXVI7.png) (b)![](https://i.imgur.com/3Jz54cT.png) ***Figure:*** Finding out strongly connected clusters using ClusterViz. **(a)** Cluster 1 **(b)** Cluster 2 The MCODE algorithm generates modules with given number of nodes. Using MCODE, two modules were found. The first, a feed forward loop comprising Sigma factors 3 and 10 and TF Rv0405. The second module comprised 4 nodes of Sigma factors 3, 10, 1 and TF Rv0405; as shown below. ![](https://i.imgur.com/stDQezu.png) --- ### Discussions * **Thirteen sigma factors** were found to regulate each other, forming a tightly regulated, robust network. These included three “**housekeeping**” sigma factors and **10 ECF Sigma factors.** Several Sigma factors not only regulated others, but also were auto-regulatory. We created a **directed network** comprising these sigma factors to elucidate their regulatory network in **_Mycobacterium tuberculosis._** * **Sigma factors C, H, I** were found to impart **downstream control** of the largest number of genes and transcription factors. Sigma factor E was found to be targeted by the largest number of transcription factors, making both these groups recognised as keys of the network. * Network topological analysis was performed on this regulatory network using Network Analyzer to find out various centralities like **Betweenness centrality, Clustering coefficient, Closeness centrality.** The network showed an average clustering coefficient of **0.1** and a parabolic graph with only a few nodes having high clustering coefficient. These nodes tend to form tight clusters or ‘**communities**’ in the network. * **_In silico_ deletion** of each sigma factors in the network showed that this network is indeed **robust**, and even the deletion of highly clustered nodes did not shear the network greatly so as to make the system of ECF sigma factors inactive. Although slight changes in parametric values occurred, the network as a whole did not completely break up; showing a vital characteristic in ‘**bypass**’ mechanisms of **survival** in micro-organisms. * A **clusterViz** analysis that used the **FAG-EC** algorithm to find out strongly connected communities gave two sub-networks as shown in Fig. 5. These clusters were found to have 80, 70 nodes respectively, 115, 83 edges respectively and modularity values of 3.633, 3.16 respectively. This means that there exists some edges, or a combination of edges, which broken can generate these two communities but it will take a large number of deletions to break them into smaller parts. * MCODE analysis resulted in **two FFL motifs** in the network that involved combinations of TFs and sigma factors of 3, 4 nodes respectively. These nodes involved **Sigma factors 1,3,10 and TF Rv0405.** ---