Excavating “Ground Truth†in AI: Epistemologies and Politics in Training Data - Livestream

The last decade has seen a dramatic capture of digital material for machine learning production. This data is the basis for sense-making in AI, not as classical representations of the world with individual meaning, but as mass collections: ground truth for machine abstractions and operations. OpenAI’s GPT-3 language model is trained on a corpus of 1 billion words, ImageNet contains over 14 million images, and Tencent’s ML Images contains more than 17.5 million annotated images - predominantly scraped from the internet. Training datasets shape the epistemic boundaries governing how machine learning operates, and thus are an essential part of understanding socially significant questions about AI. But when we closely investigate the benchmark training sets widely used in NLP and computer vision systems, we find complex social, political, and epistemological challenges. What happens when data is seen as an aggregate, stripped of context, meaning, and specificity? In what ways does training data limit what and how machine learning systems interpret the world? And most importantly, what forms of power do these approaches enhance and enable? In this lecture, Kate Crawford will share new work that reflects on what’s at stake in the architecture and contents of training sets, and how they are increasingly part of our urban, legal, logistical, and commercial infrastructures.
Speaker: Kate Crawford, USC Annenberg and Microsoft Research
Thursday, 03/03/22
Contact:
Website: Click to VisitCost:
FreeSave this Event:
iCalendarGoogle Calendar
Yahoo! Calendar
Windows Live Calendar
