Titus Brown, titus@idyll.org
(Linked to CompareM/issues/16.)
(You'll need homebrew installed already.)
brew install prodigal
brew install diamond
brew install coreutils
Note, comparem is not (yet) Python 3 compatible.
pip install comparem
Mac OS X has issues with regular sort
and so I am using GNU sort installed with homebrew coreutils (above) as gsort
.
Also, sed
is different on Mac OS X so that has to be fixed too. Whatevs.
diff --git a/comparem/similarity_search.py b/comparem/similarity_search.py
index 6a4b579..33b3559 100644
--- a/comparem/similarity_search.py
+++ b/comparem/similarity_search.py
@@ -90,8 +90,8 @@ class SimilaritySearch(object):
"""
self.logger.info('Sorting table with hits (be patient!).')
- os.system("LC_ALL=C sed -i 's/~/\t/g' %s" % input_hit_table)
- os.system("LC_ALL=C sort --parallel=8 -o %s -k1,1 -k3,3 %s" % (input_hit_table, input_hit_table))
+ os.system("LC_ALL=C sed -i '' 's/~/\t/g' %s" % input_hit_table)
+ os.system("LC_ALL=C gsort --parallel=8 -o %s -k1,1 -k3,3 %s" % (input_hit_table, input_hit_table))
os.system('mv %s %s' % (input_hit_table, output_hit_table))
def _run_self_blastp(self, query_gene_file,
git clone https://github.com/dib-lab/sourmash.git
cd sourmash/data
mkdir comparem
cd comparem
cp ../GCF*.fna.gz .
gunzip *.fna.gz
Now, put all of the genomes in a list file:
ls -1 *.fna > comparem.list
and run comparem aai_wf
:
comparem aai_wf comparem.list out
I see the following output:
[2018-02-19 11:52:47] INFO: CompareM v0.0.23
[2018-02-19 11:52:47] INFO: comparem aai_wf comparem.list out
[2018-02-19 11:52:47] INFO: Identifying genes within genomes:
Finished processing 3 of 3 (100.00%) genomes.
[2018-02-19 11:54:36] INFO: Identified genes written to: out/genes
[2018-02-19 11:54:36] INFO: Appending genome identifiers to query genes.
[2018-02-19 11:54:36] INFO: Creating DIAMOND database (be patient!).
[2018-02-19 11:54:37] INFO: Performing self similarity sequence between genomes (be patient!).
[2018-02-19 11:54:55] INFO: Sorting table with hits (be patient!).
[2018-02-19 11:54:55] INFO: Sequence similarity results written to: out/similarity
[2018-02-19 11:54:55] INFO: Calculating length of genes.
[2018-02-19 11:54:55] INFO: Indexing sorted hit table.
[2018-02-19 11:54:55] INFO: Calculating AAI between all 3 pairs of genomes:
Finished processing 3 of 3 (100.00%) pairs.
[2018-02-19 11:54:55] INFO: Summarizing AAI results.
[2018-02-19 11:54:55] INFO: AAI between genomes written to: out/aai/aai_summary.tsv
Then (assuming it's all successful…) display things with tabs lined up nicely:
column -t -s $'\\t' out/aai/aai_summary.tsv
…or load into Excel or R or Python or something…
The output is (N * (N-1)) / 2 rows, i.e. the upper half of a distance matrix.
sourmash compare
-style numpy matrix#! /usr/bin/env python
from __future__ import print_function, division
import sys
import numpy
import argparse
def main():
parser = argparse.ArgumentParser()
parser.add_argument('aai_summary_tsv')
parser.add_argument('-o', '--output')
args = parser.parse_args()
assert args.output, 'please specify --output'
indices = {}
with open(args.aai_summary_tsv, 'rt') as fp:
lines = fp.readlines()
lines = [ x.strip().split('\t') for x in lines ]
lines = lines[1:]
# assign unique indices to the thing
filenum = 0
for row in lines:
g1 = row[0]
g2 = row[2]
if g1 not in indices:
indices[g1] = filenum
filenum += 1
if g2 not in indices:
indices[g2] = filenum
filenum += 1
print('...got {} genomes.'.format(filenum))
D = numpy.zeros((filenum, filenum))
for row in lines:
g1 = row[0]
n1 = indices[g1]
g2 = row[2]
n2 = indices[g2]
sim = row[5]
D[n1,n2] = D[n2,n1] = float(sim) / 100.
for i in range(filenum):
D[i,i] = 1.0
x = []
for k, v in indices.items():
x.append((v, k))
x.sort()
labeloutname = args.output + '.labels.txt'
print('saving labels to: {}'.format(labeloutname))
with open(labeloutname, 'w') as fp:
fp.write("\n".join([ tup[1] for tup in x ]))
print('saving distance matrix to: {}'.format(args.output))
with open(args.output, 'wb') as fp:
numpy.save(fp, D)
if __name__ == '__main__':
main()
or
or
By clicking below, you agree to our terms of service.
New to HackMD? Sign up
Syntax | Example | Reference | |
---|---|---|---|
# Header | Header | 基本排版 | |
- Unordered List |
|
||
1. Ordered List |
|
||
- [ ] Todo List |
|
||
> Blockquote | Blockquote |
||
**Bold font** | Bold font | ||
*Italics font* | Italics font | ||
~~Strikethrough~~ | |||
19^th^ | 19th | ||
H~2~O | H2O | ||
++Inserted text++ | Inserted text | ||
==Marked text== | Marked text | ||
[link text](https:// "title") | Link | ||
 | Image | ||
`Code` | Code |
在筆記中貼入程式碼 | |
```javascript var i = 0; ``` |
|
||
:smile: | ![]() |
Emoji list | |
{%youtube youtube_id %} | Externals | ||
$L^aT_eX$ | LaTeX | ||
:::info This is a alert area. ::: |
This is a alert area. |
On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?
Please give us some advice and help us improve HackMD.
Do you want to remove this version name and description?
Syncing