Arvados: A Free Platform for Big Data Science
This talk will introduce the Arvados (http://arvados.org) platform for data science. Arvados is a software system for managing compute clusters built around a scale-out content-addressed distributed file system (Arvados Keep) for storage, a cluster job queuing system designed for reproducibility (Arvados Crunch), and a user and group permission system for controlling and sharing access to those resources. Arvados provides web-based and command line tools for transferring, managing, sharing, and computing on very large data sets.
In working with a diverse set of researchers, physicians, and patients that are all examining sequencing data, we have identified a need for a consistent naming scheme for parts of the genome. As an application within the Arvados platform, we invented tiling – a technique that divides the genome into about 10 million overlapping, variable-length sequences, or "tiles", each with a unique 24-base tag at each end. We use examples from public data to show that tiling supports simple and consistent names, annotation, queries, machine learning, and clinical screening. We support tiling with Arvados Lightning, software which will scale to millions of genomes in a few racks of off-the-shelf hardware.
Speakers: Alexander Wait Zaranek and Jonathan Steffi, Curoverse
Register at weblink
Room: S360
Thursday, 03/10/16
Contact:
Website: Click to VisitCost:
FreeSave this Event:
iCalendarGoogle Calendar
Yahoo! Calendar
Windows Live Calendar
