Ethics in Data Science MSDS course
Class
When: Thursdays, 10-11:50am, March 19-May 17
Where: online
Instructor: Rachel Thomas
Feel free to email me at rachel@fast.ai, or to post questions on the forums (if you have a question, it’s likely that someone else is wondering the same thing and will be interested in the answer too).
Focus
As data ethics covers a broad and varied group of topics, there is no way that an 8-week (or even full semester length) course can be comprehensive. A meta-analysis of over 100 syllabi on tech ethics, What Do We Teach When We Teach Tech Ethics? A Syllabi Analysis, found: “a great deal of variability across tech ethics courses in terms of the content taught. However, the lack of consistency of content is not surprising considering the lack of standards in this space, and the disciplinary breadth of the syllabi we covered. This is not a bad thing.” (Fiesler et al 2019)
In this course, we will focus on topics that are both urgent and practical (impacting real people right now). In keeping with my teaching philosophy, we will begin with two active, real-world areas (disinformation and bias) to provide context and motivation, before stepping back in Unit 3 to dig into foundations of data ethics.
Communication: forums.fast.ai
I will primarily use forums.fast.ai/c/data-ethics (you will not be able to view link until you have created an account on forums.fast.ai and are added to the private data ethics group) for class communication: to share updates to the readings and for all of us to keep discussion going throughout the week.
Please create an account here and then add your username to this spreadsheet.
Weekly Assignment
There will be a short weekly assignment, usually consisting of reflective questions about that weeks’ reading.
Final Exam: MSDS 633 Ethics in Data Science FINAL EXAM
Tuesday, May 12, 10:00am – 12:00pm
Optional- Blog post on a topic of your choosing: You can go deeper on a topic we covered in class, or research and write about something outside the scope of what we cover. If you are interested, I will give you feedback on a draft before you post it. I encourage everyone to consider blogging and recommend the posts below:
- Why you (yes, you) should blog by Rachel Thomas
- Advice for Better Blog Posts by Rachel Thomas: slightly more advanced advice, based on reviewing lots of student blog posts
- Your own blog with GitHub Pages and fast_template (4 part tutorial) by Jeremy Howard: a free solution by my co-founder for creating a blog in which you don’t need to code or use the command line, and you retain all ownership of your material
Outcomes
The proposed learning outcomes of this course are:
- Understand the impacts of data misuse, including unjust bias, surveillance, disinformation, and feedback loops. Understand the contributing factors to these impacts. Identify different types of bias.
- Develop literacy in investigating how data and data-powered algorithms shape, constrain, and manipulate our commercial, civic, and personal experiences.
- Analyze new scenarios and potential products to try to identify and mitigate potential risks.
- Have a toolkit of ethical techniques and practices to implement in their workplaces
Tentative Syllabus
Disinformation
From deepfakes being used to harass women, worries about the role disinformation could play in the 2020 election, and news of extensive foreign influence operations, disinformation is in the news frequently and is an urgent issue. It is also indicative of the complexity and interdisciplinary nature of so many data ethics issues: disinformation involves tech design choices, bad actors, human psychology, misaligned financial incentives, and more.
Required Reading:
- Will Oremus, The Simplest Way to Spot Coronavirus Misinformation on Social Media
- Guillaume Chaslot, How Algorithms Can Learn to Discredit the Media: Chaslot is a former Google/YouTube engineer and founder of the non-profit watch group AlgoTransparency. He has done a lot to bring attention to issues with recommendation systems. For a counter view on the role of recommendation systems, see Rebecca Lewis’s work below.
- Renee DiResta, Mediating Consent: DiResta is a top expert on computational propaganda, who led one of the two teams that analyzed the dataset about Russian interference in the 2016 election for the Senate Intelligence Committee, and now works at the Stanford Internet Observatory
Optional Reading:
- Stanford Internet Observatory Evidence of Russia-Linked Influence Operations in Africa
- Rachelle Hampton, The Black Feminists Who Saw the Alt-Right Threat Coming
- Rebecca Lewis, “This Is What the News Won’t Show You”: YouTube Creators and the Reactionary Politics of Micro-celebrity
- Gordon Pennycook et al, Understanding and reducing the spread of misinformation online
- Manuel Velasquez et al, “What is ethics?”: We will talk more foundations of ethics in week 3, after we’ve seen some case studies, but wanted to share this now.
Optional Lab for Coders:
Intro to Language Modeling & Text Generation: video lecture and jupyter notebook (from my NLP course)
Week 2: Bias & Fairness
Unjust bias is an increasingly discussed issue in machine learning and has even spawned its own field as the primary focus of Fairness, Accountability, and Transparency (FAT*). We will go beyond a surface-level discussion and cover questions of how fairness is defined, different types of bias, and steps towards mitigating it.
Required Reading/Watching:
Arvind Narayan, 21 Definitions of Fairness
Timnit Gebru et al, Datasheets for Datasets
Harini Suresh and John Guttag, A Framework for Understanding Unintended Consequences of Machine Learning
Samir Passi and Solon Barocas, Problem Formulation and Fairness
Optional Reading:
Ulrich Aivodji et al, Fairwashing: the risk of rationalization
Alice Xiang and Deborah Raji, On the Legal Compatibility of Fairness Definitions
Optional Lab (involves code, but was geared to audience that included beginners):
Word Embeddings, Bias in ML, Why You Don’t Like Math, & Why AI Needs You and the jupyter notebooks
Week 3: Ethical Foundations & Practical Tools
Now that we’ve seen a number of concrete, real world examples of ethical issues that arise with data, we will step back and learn about some ethical philosophies and lenses to evaluate ethics through, as well as considering how ethical questions are chosen. We will also cover the Markkula Center’s Tech Ethics Toolkit, a set of concrete practices to be implemented in the workplace.
Required Reading
Shannon Vallor et al, Conceptual Frameworks in Technology and Engineering Practice: Ethical Lenses to Look Through
Ian Bogost, Enough With the Trolley Problem
Zeynep Tufekci, Sociological Storytelling vs. Psychological Storytelling
Langdon Winner, Do Artifacts Have Politics?
Shannon Vallor, An Ethical Toolkit for Engineering/Design Practice
Optional Reading
Meg Young et al, Toward inclusive tech policy design: a method for underrepresented
voices to strengthen tech policy documents
Margaret Mitchell et al, Model Cards for Model Reporting
Eric P. S. Baumer and M. Six Silberman, When the Implication Is Not to Design (Technology)
Mark White, Superhuman Ethics Class With The Avengers Prime
Week 4: Privacy and surveillance
The huge amounts of data collected by the apps we use, as well as the growing use of facial recognition and tracking data, have made privacy and surveillance particularly relevant issues right now.
Required Reading
Jennifer Valentino-DeVries et al (NYT), Your Apps Know Where You Were Last Night, and They’re Not Keeping It Secret
Phillip Rogaway, The Moral Character of Cryptographic Work
Alvaro Bedoya, Privacy and Civil Rights in the Age of Facebook, ICE, and the NSA
Maciej Ceglowski, The New Wilderness
Optional Reading
Chris Gilliard, Caught in the Spotlight
Forget about “privacy”: Julia Angwin and Trevor Paglen on our data crisis
Lindsey Barrett, Our collective privacy problem is not your fault
Zeynep Tufekci, The Latest Data Privacy Debacle
Tim Wu, How Capitalism Betrayed Privacy
Week 5: How did we get here? Our Ecosystem
News stories understandably often focus on one particular ethics issue of one particular company. Here, I want us to step back and consider some of the broader trends and factors that have resulted in the types of issues we are seeing. These include our over-emphasis on metrics, the inherent design of many of the platforms, venture capital’s focus on hypergrowth, and more.
Required Reading:
Zeynep Tufekci, How social media took us from Tahrir Square to Donald Trump
James Grimmelman, The Platform is the Message
Rachel Thomas, The Problem with Metrics
Tim O’Reilly, The fundamental problem with Silicon Valley’s favorite growth strategy
Ali Alkhatib, Anthropological/Artificial Intelligence & the Institute for Human-centered AI
Week 6: Algorithmic Colonialism and Next Steps
When corporations from one country develop and deploy technology in many other countries, extracting data and profits, often with little awareness of local cultural issues, a number of ethical issues can arise. Here we will explore algorithmic colonialism. We will also consider next steps for how students can continue to engage around data ethics and take what they’ve learned back to their workplaces.
Required Reading:
Abeba Birhane, The Algorithmic Colonization of Africa
Amy Maxmen (Nature), Can tracking people through phone-call data improve lives?
Adrienne Lafrance, Facebook and the New Colonialism
Optional Reading:
Joe Parkinson et al, Huawei Technicians Helped African Governments Spy on Political Opponents
Davey Alba, How Duterte Used Facebook To Fuel The Philippine Drug War
Rumman Chowdhury, Algorithmic Colonialism
Daniel Greene, et al. Better, Nicer, Clearer, Fairer: A Critical Assessment of the Movement for Ethical Artificial Intelligence and Machine Learning
Jess Whittlestone et al, The role and limits of principles in AI ethics: towards a focus on tensions
Sareeta Amrute, Tech Colonialism Today