SpaRC: Scalable Sequence Clustering using Apache Spark

Rich Brueckner

from on 2018-02-26 15:40 (#3GT4Y)

Zhong Wang from the Genome Institute at LBNL gave this talk at the Stanford HPC Conference. "Whole genome shotgun based next generation transcriptomics and metagenomics studies often generate 100 to 1000 gigabytes (GB) sequence data derived from tens of thousands of different genes or microbial species. Here we describe an Apache Spark-based scalable sequence clustering application, SparkReadClust (SpaRC) that partitions reads based on their molecule of origin to enable downstream assembly optimization."

The post SpaRC: Scalable Sequence Clustering using Apache Spark appeared first on insideHPC.

Source	RSS or Atom Feed
Feed Location	http://insidehpc.com/feed/
Feed Title
Feed Link	http://insidehpc.com/