SpaRC: Scalable Sequence Clustering using Apache Spark
by Rich Brueckner from High-Performance Computing News Analysis | insideHPC on (#3GT4Y)
Zhong Wang from the Genome Institute at LBNL gave this talk at the Stanford HPC Conference. "Whole genome shotgun based next generation transcriptomics and metagenomics studies often generate 100 to 1000 gigabytes (GB) sequence data derived from tens of thousands of different genes or microbial species. Here we describe an Apache Spark-based scalable sequence clustering application, SparkReadClust (SpaRC) that partitions reads based on their molecule of origin to enable downstream assembly optimization."
The post SpaRC: Scalable Sequence Clustering using Apache Spark appeared first on insideHPC.