» » »

Massive Speedups for Policy Simulation with applications to Inventory Management

Vivek Farias

We consider the task of producing a single trajectory of a dynamical system under some state dependent policy. This ‘policy simulation’ task is often the core computational bottleneck in modern Reinforcement Learning algorithms. The multiple, inherently serial, policy evaluations that must be performed in one such simulation constitute the bulk of this bottleneck. As a concrete example, simulating a fulfillment optimization policy on a month’s worth of demand at a moderately large retailer is a task that can take several hours rendering granular RL infeasible at scale.

We present a class of iterative algorithms we dub Picard Iteration. Our scheme carefully allocates policy evaluation tasks across independent GPU ‘processes’. Within each iteration a single process only evaluates the policy on its assigned tasks while assuming a certain ‘cached’ evaluation for other tasks. This cache is updated at the end of the iteration. A single iteration is ideally suited for the type of ’single program multiple data’ parallelism offered by a GPU. We prove that the structure afforded by many inventory management problems allows Picard iteration to converge in a small number of iterations independent of the horizon. As one practical consequence, we demonstrate a 500x speedup in policy simulation for large-scale fulfillment optimization. Picard iteration offers a blueprint for similar speedups in related policy simulation and sequential inference tasks.

Speaker: Vivek Farias, Massachusetts Institute of Technology

Monday, 03/17/25

Contact:

Website: Click to Visit

Cost:

Free

Save this Event:

iCalendar
Google Calendar
Yahoo! Calendar
Windows Live Calendar

Etcheverry Hall

UC Berkeley
Berkeley, CA 94720

Categories: