Our bi-monthly DiPLab online seminar will welcome Gemma Newlands (PhD candidate, University of Amsterdam) for a session entitled Discovering Dataset Provenance: Social Sustainability in the AI-as-a-Service (AIaaS) Supply Chain.

The seminar will be held in hybrid form, both at Telecom Paris and online on the platform Big Blue Button. To receive the link, please register by sending an email with your full name and affiliation at: contact@diplab.eu.

Discovering Dataset Provenance: Social Sustainability in the AI-as-a-Service (AIaaS) Supply Chain

Researchers have identified the often haphazard methods of generating and using datasets for AI training, referring to a ‘laissez-faire’ attitude towards data collection which trades rigour for speed and accessibility. As with global outsourcing elsewhere, the labour-intensive activity of cleaning, curating, and annotating datasets is outsourced to lower labour cost countries and thus outside of the organisation’s boundaries. Because of this, the exact provenance of training datasets often remains unknown, both to the organisation and its external stakeholders. This problem is only exacerbated if organisations are not training their own AI, but using AI-as-a-Service (AIaaS). AIaaS empowers individuals and organisations to access AI on-demand, in either tailored or ‘off-the-shelf’ forms. However, institutional separation between development, training and deployment can lead to critical opacities, such as obscuring the level of human effort necessary to produce and train AI services.

AI dataset supply chains can be conceptualised as the chain of collection, curation, and custody of data from source to model. Taking a supply-chain perspective broadens our understanding of AI production as a global production network reliant on low-paid workers predominantly situated in the Global South. Supply chains are increasingly globally ‘disaggregated’, with few organizations able to identify and engage with sub-suppliers. Yet, as ongoing research into sustainable supply chains has demonstrated, sub-suppliers are subject to least oversight, while simultaneously being responsible for the most egregious environmental and social impacts. In this presentation I will introduce AIaaS, discuss the sustainability issues inherent in AIaaS production, and discuss four mechanisms by which organisations can try to discover AIaaS dataset provenance and to discover whether AI is being developed in a socially sustainable manner: Dataset Documentation, Multi-Stakeholder Initiatives, Informal Governance, and Market Incentives.

Gemma Newlands is a PhD candidate at the University of Amsterdam and a Doctoral Stipendiary Fellow at the Nordic Centre for Internet and Society, BI Norwegian Business School. As an organisational sociologist, she is currently investigating the social valuation and occupational prestige of work in the digital economy, as well as the social sustainability of AI production. Gemma is currently an Assistant Editor at Big Data & Society and her research has been published in leading journals including Organization Studies, Sociology, Environment and Planning A, Big Data & SocietyNew Media & SocietyNew Technology, Work and Employment and Research in the Sociology of Organizations.