Scalable Machine Learning at Yahoo

At Yahoo, our Hadoop clusters manage almost all data about our users and contents. From these data on HDFS or event pipelines, machine learning (ML) techniques have been applied to discover mathematical models for search ranking, ad click prediction, and many more.
Recently, Yahoo developed a distributed server for scalable ML on Hadoop grid with billions of training examples and input parameters. It provides several built-in ML operations such as MPI style AllReduce, and allows customized operations defined in Scala or Java. It supports MapReduce for parameter analysis and model conversion.
Yahoo implemented several massively scalable ML algorithms including Decision Trees, Logistic Regression, and Ad-Query Vectors. These algorithms are now training ML models in minutes even for billions of parameters. In this talk, we will provide a technical overview of Yahoo's scalable ML solutions with use cases, and share our experience on leveraging strengths of Hadoop and Spark for machine learning.
Speaker: Andy Feng, Yahoo
Monday, 09/28/15
Contact:
Website: Click to VisitCost:
FreeSave this Event:
iCalendarGoogle Calendar
Yahoo! Calendar
Windows Live Calendar
