Slicing and dicing with k-mers¶
(Note, this won’t work with amplified data.)
Extra resources:
—
At the command line, create a new directory and extract some data:
cd /mnt
mkdir slice
cd slice
We’re going to work with half the read data set for speed reasons –
gunzip -c ../mapping/SRR1976948.abundtrim.subset.pe.fq.gz | \
head -6000000 > SRR1976948.half.fq
In a Jupyter Notebook (go to ‘http://‘ + machine name + ‘:8000’), password ‘davis’, create new Python notebook “conda root”, run:
cd /mnt/slice
and then in another cell:
!load-into-counting.py -M 4e9 -k 31 SRR1976948.kh SRR1976948.half.fq
and in another cell:
!abundance-dist.py SRR1976948.kh SRR1976948.half.fq SRR1976948.dist
and in yet another cell:
%matplotlib inline
import numpy
from pylab import *
dist1 = numpy.loadtxt('SRR1976948.dist', skiprows=1, delimiter=',')
plot(dist1[:,0], dist1[:,1])
axis(ymax=10000, xmax=1000)
Then:
python2 ~/khmer/sandbox/calc-median-distribution.py SRR1976948.kh \
SRR1976948.half.fq SRR1976948.readdist
And:
python2 ~/khmer/sandbox/slice-reads-by-coverage.py SRR1976948.kh SRR1976948.half.fq slice.fq -m 0 -M 60
Assemble the slice¶
(Re)install megahit:
cd
git clone https://github.com/voutcn/megahit.git
cd megahit
make
Go back to the slice directory and extract paired ends:
cd /mnt/slice
extract-paired-ends.py slice.fq
Assemble!
~/megahit/megahit --12 slice.fq.pe -o slice
The contigs will be in slice/final.contigs.fa
.
LICENSE: This documentation and all textual/graphic site content is licensed under the Creative Commons - 0 License (CC0) -- fork @ github. Presentations (PPT/PDF) and PDFs are the property of their respective owners and are under the terms indicated within the presentation.