Serving a large number of ML models at low latency

Serving machine learning models is a scalability challenge at many companies. Most of the applications require a small number of machine learning models (often <100) to serve predictions. On the other hand, cloud platforms that support model serving, though they support hundreds of thousands of models, provision separate hardware for different customers. Salesforce has a unique challenge that only very few companies deal with, Salesforce needs to run hundreds of thousands of models sharing the underlying infrastructure for multiple tenants for cost-effectiveness.

In this talk we will explain how Salesforce hosts hundreds of thousands of models on a multi-tenant infrastructure, to support low-latency predictions.

Agenda:

11:40 am - 11:50 am Arrival and socializing
11:50 am - 12:00 pm Opening
12:00 pm - 1:50 pm Manoj Agarwal, "Serving a large number of ML models at low latency"
1:50 pm - 2:00 pm Q&A

Speaker: Manoj Agarwal is a Software Architect in the Einstein Platform team at Salesforce.

Zoom link.

Webinar ID: 811 6053 3641