Our bi-monthly DiPLab online seminar will welcome Milagros Miceli (PhD candidate at the Weizenbaum Institute, TU Berlin) and Julian Posada (PhD candidate, Massey College, University of Toronto) for a special session on a Foucaldian approach to online data annotation. The seminar will be held in hybrid form, both at Telecom Paris and online on the platform Big Blue Button. To receive the link, please register by sending an email with your full name and affiliation at: contact@diplab.eu.

Milagros Miceli is a doctoral researcher at the Weizenbaum Institute and TU Berlin. She Investigates work practices of data creation and focuses on power dynamics and their effects on machine learning datasets. She is interested in questions of meaning creation and symbolic power encoded in training data. Her work comprises ethnographic fieldwork with data annotators, collectors, and scientists in different sites around the world. In 2020, she received the Best Paper Award at CSCW Conference.

Julian Posada is a PhD candidate at the Faculty of Information, University of Toronto, and a Fellow at the Schwartz Reisman Institute and Massey College. His research investigates the experiences of workers in Latin America who annotate data for artificial intelligence through digital labour platforms. Previously, he worked for the French National Centre for Scientific Research (CNRS). He holds degrees from EHESS Paris and Sorbonne University.

Title: The Data-Production Dispositif

Abstract : Machine learning (ML) depends on data to train and verify models. Very often, companies and research institutions outsource processes related to data work, i.e., collecting and annotating data as well as evaluating algorithmic outputs, through business process outsourcing (BPO) companies and crowdsourcing platforms. This paper investigates outsourced ML data work in Latin America. We study three platforms present in Venezuela and a BPO located in Argentina. We lean on the Foucauldian notion of dispositif to define the data- production dispositif as an ensemble of discourses, actions, and objects strategically disposed to (re)produce power/knowledge relations. Our dispositif analysis comprises the examination of 210 data-work instruction documents, 55 interviews with data workers, managers, and requesters, and participant observation. Our findings show that specific discourses encoded in task instructions reproduce and normalize the worldviews of requesters. Precarized working conditions and economic dependency alienates workers, making them obedient to instructions. Furthermore, discourses and social contexts materialize in artifacts, such as interfaces and performance metrics, constituting barriers for workers to voice concerns and ensure the normalization of specific ways of interpreting data and the realities they comprise. We conclude by stressing the importance of counteracting the data-production dispositif by fighting the alienation and precarization of outsourced data workers, and empowering them to become assets in the quest for high-quality data.

Categories: Events