» » »

GraphLab: Machine Learning for Big Data in the Cloud

Carlos Guestrin

Today, machine learning (ML) methods play a central role in  industry and science. The growth of the Web and improvements in sensor data collection  technology have been rapidly increasing the magnitude and complexity of the ML  tasks we must solve. This growth is driving the need for scalable, parallel ML  algorithms that can handle "Big Data."

Unfortunately, implementing efficient parallel ML algorithms is  challenging. Existing high-level parallel abstractions such as MapReduce and  Pregel are insufficiently expressive to achieve the desired performance, while  low-level tools such as MPI are difficult to use, leaving ML experts repeatedly  solving the same design challenges.

In this talk, I will also describe the GraphLab framework,  which naturally expresses asynchronous, dynamic graph computations that are key  for state-of-the-art ML algorithms. When these algorithms are expressed in our  higher-level abstraction, GraphLab will effectively address many of the  underlying parallelism challenges, including data distribution, optimized  communication, and guaranteeing sequential consistency, a property that is  surprisingly important for many ML algorithms. On a variety of large-scale  tasks, GraphLab provides 20-100x performance improvements over Hadoop. In  recent months, GraphLab has received thousands of downloads, and is being  actively used by a number of startups, companies, research labs and universities.

This talk represents joint work with Yucheng Low, Joey  Gonzalez, Aapo Kyrola, Jay Gu, Danny Bickson, and Joseph Bradley.

Speaker: Carlos Guestrin, Univ. of Washington

Room 306

Wednesday, 11/14/12

Contact:

Website: Click to Visit

Cost:

Free

Save this Event:

iCalendar
Google Calendar
Yahoo! Calendar
Windows Live Calendar

UC Berkeley

Soda Hall
Berkeley, CA 94720

Categories: