» » »

Arvados: A Free Platform for Big Data Science

This talk will introduce the Arvados (http://arvados.org) platform for data science. Arvados is a software system for managing compute clusters built around a scale-out content-addressed distributed file system (Arvados Keep) for storage, a cluster job queuing system designed for reproducibility (Arvados Crunch), and a user and group permission system for controlling and sharing access to those resources. Arvados provides web-based and command line tools for transferring, managing, sharing, and computing on very large data sets.

In working with a diverse set of researchers, physicians, and patients that are all examining sequencing data, we have identified a need for a consistent naming scheme for parts of the genome. As an application within the Arvados platform, we invented tiling – a technique that divides the genome into about 10 million overlapping, variable-length sequences, or "tiles", each with a unique 24-base tag at each end. We use examples from public data to show that tiling supports simple and consistent names, annotation, queries, machine learning, and clinical screening. We support tiling with Arvados Lightning, software which will scale to millions of genomes in a few racks of off-the-shelf hardware.

Speakers: Alexander Wait Zaranek and Jonathan Steffi, Curoverse

Register at weblink

Room: S360

Thursday, 03/10/16

Contact:

Website: Click to Visit

Cost:

Free

Save this Event:

iCalendar
Google Calendar
Yahoo! Calendar
Windows Live Calendar

James H. Clark Center (Bldg 340)

Stanford University
318 Campus Dr
Stanford, CA 94305