Interactive Language Agents: Training, Evaluation, and Interface

The increasing capability of Large Language Models (LLMs) makes them appealing for adoption in labor-intensive human tasks. For example, significant efforts have recently focused on developing agents -- systems that map observations and instructions to executable actions -- and their benchmarks in real-world tasks like web navigation. In this talk, I will discuss recent work in training and improving such models through interactions with human users, and developing better evaluations for these agents, which in turn can be used to automatically improve agent performance without requiring any demonstration data or human annotation. However, in developing systems like this, and in applying LLMs and other large pre-trained models to real-world problems, we should be aware of their fundamental limitations; for example, their sensitivity to design considerations like prompt formatting. I will detail recent work where we find that LLMs can be incredibly sensitive to arbitrary design decisions, like choices of separators or multiple choice labels.
Speaker: Alane Suhr, UC Berkeley
Thursday, 03/06/25
Contact:
Website: Click to VisitCost:
FreeSave this Event:
iCalendarGoogle Calendar
Yahoo! Calendar
Windows Live Calendar
Sonoma State Dept. of Engineering Science
Cerent Engineering Science Complex, Salazar Hall Room #2009A
Rohnert Park, CA 94928
Phone: (707) 664-2030
Website: Click to Visit
