Article 3GT4Y SpaRC: Scalable Sequence Clustering using Apache Spark

SpaRC: Scalable Sequence Clustering using Apache Spark

by
Rich Brueckner
from High-Performance Computing News Analysis | insideHPC on (#3GT4Y)
wang-143x150.jpg

Zhong Wang from the Genome Institute at LBNL gave this talk at the Stanford HPC Conference. "Whole genome shotgun based next generation transcriptomics and metagenomics studies often generate 100 to 1000 gigabytes (GB) sequence data derived from tens of thousands of different genes or microbial species. Here we describe an Apache Spark-based scalable sequence clustering application, SparkReadClust (SpaRC) that partitions reads based on their molecule of origin to enable downstream assembly optimization."

The post SpaRC: Scalable Sequence Clustering using Apache Spark appeared first on insideHPC.

External Content
Source RSS or Atom Feed
Feed Location http://insidehpc.com/feed/
Feed Title High-Performance Computing News Analysis | insideHPC
Feed Link https://insidehpc.com/
Reply 0 comments