Class Policies and Syllabus

Ethics in Data Science MSDS course


When: Thursdays, 10-11:50am, March 19-May 17

Where: online

Instructor: Rachel Thomas

Feel free to email me at, or to post questions on the forums (if you have a question, it’s likely that someone else is wondering the same thing and will be interested in the answer too).


As data ethics covers a broad and varied group of topics, there is no way that an 8-week (or even full semester length) course can be comprehensive. A meta-analysis of over 100 syllabi on tech ethics, What Do We Teach When We Teach Tech Ethics? A Syllabi Analysis, found: “a great deal of variability across tech ethics courses in terms of the content taught. However, the lack of consistency of content is not surprising considering the lack of standards in this space, and the disciplinary breadth of the syllabi we covered. This is not a bad thing.” (Fiesler et al 2019)

In this course, we will focus on topics that are both urgent and practical (impacting real people right now). In keeping with my teaching philosophy, we will begin with two active, real-world areas (disinformation and bias) to provide context and motivation, before stepping back in Unit 3 to dig into foundations of data ethics.


I will primarily use (you will not be able to view link until you have created an account on and are added to the private data ethics group) for class communication: to share updates to the readings and for all of us to keep discussion going throughout the week.

Please create an account here and then add your username to this spreadsheet.

Weekly Assignment

There will be a short weekly assignment, usually consisting of reflective questions about that weeks’ reading.

Final Exam: MSDS 633 Ethics in Data Science FINAL EXAM

Tuesday, May 12, 10:00am – 12:00pm

Optional- Blog post on a topic of your choosing: You can go deeper on a topic we covered in class, or research and write about something outside the scope of what we cover. If you are interested, I will give you feedback on a draft before you post it. I encourage everyone to consider blogging and recommend the posts below:


The proposed learning outcomes of this course are:

  1. Understand the impacts of data misuse, including unjust bias, surveillance, disinformation, and feedback loops. Understand the contributing factors to these impacts. Identify different types of bias.
  2. Develop literacy in investigating how data and data-powered algorithms shape, constrain, and manipulate our commercial, civic, and personal experiences.
  3. Analyze new scenarios and potential products to try to identify and mitigate potential risks.
  4. Have a toolkit of ethical techniques and practices to implement in their workplaces

Tentative Syllabus


From deepfakes being used to harass women, worries about the role disinformation could play in the 2020 election, and news of extensive foreign influence operations, disinformation is in the news frequently and is an urgent issue. It is also indicative of the complexity and interdisciplinary nature of so many data ethics issues: disinformation involves tech design choices, bad actors, human psychology, misaligned financial incentives, and more.

Required Reading:

  • Will Oremus, The Simplest Way to Spot Coronavirus Misinformation on Social Media
  • Guillaume Chaslot, How Algorithms Can Learn to Discredit the Media: Chaslot is a former Google/YouTube engineer and founder of the non-profit watch group AlgoTransparency. He has done a lot to bring attention to issues with recommendation systems. For a counter view on the role of recommendation systems, see Rebecca Lewis’s work below.
  • Renee DiResta, Mediating Consent: DiResta is a top expert on computational propaganda, who led one of the two teams that analyzed the dataset about Russian interference in the 2016 election for the Senate Intelligence Committee, and now works at the Stanford Internet Observatory

Optional Reading:

Optional Lab for Coders:

Intro to Language Modeling & Text Generation: video lecture and jupyter notebook (from my NLP course)

Week 2: Bias & Fairness

Unjust bias is an increasingly discussed issue in machine learning and has even spawned its own field as the primary focus of Fairness, Accountability, and Transparency (FAT*). We will go beyond a surface-level discussion and cover questions of how fairness is defined, different types of bias, and steps towards mitigating it.

Required Reading/Watching:

Arvind Narayan, 21 Definitions of Fairness

Timnit Gebru et al, Datasheets for Datasets

Harini Suresh and John Guttag, A Framework for Understanding Unintended Consequences of Machine Learning

Samir Passi and Solon Barocas, Problem Formulation and Fairness

Optional Reading:

Ulrich Aivodji et al, Fairwashing: the risk of rationalization

Alice Xiang and Deborah Raji, On the Legal Compatibility of Fairness Definitions

Optional Lab (involves code, but was geared to audience that included beginners):

Word Embeddings, Bias in ML, Why You Don’t Like Math, & Why AI Needs You and the jupyter notebooks

Week 3: Ethical Foundations & Practical Tools

Now that we’ve seen a number of concrete, real world examples of ethical issues that arise with data, we will step back and learn about some ethical philosophies and lenses to evaluate ethics through, as well as considering how ethical questions are chosen. We will also cover the Markkula Center’s Tech Ethics Toolkit, a set of concrete practices to be implemented in the workplace.

Required Reading

Shannon Vallor et al, Conceptual Frameworks in Technology and Engineering Practice: Ethical Lenses to Look Through

Ian Bogost, Enough With the Trolley Problem

Zeynep Tufekci, Sociological Storytelling vs. Psychological Storytelling

Langdon Winner, Do Artifacts Have Politics?

Shannon Vallor, An Ethical Toolkit for Engineering/Design Practice

Optional Reading

Meg Young et al, Toward inclusive tech policy design: a method for underrepresented

voices to strengthen tech policy documents

Margaret Mitchell et al, Model Cards for Model Reporting

Eric P. S. Baumer and M. Six Silberman, When the Implication Is Not to Design (Technology)

Mark White, Superhuman Ethics Class With The Avengers Prime

Week 4: Privacy and surveillance

The huge amounts of data collected by the apps we use, as well as the growing use of facial recognition and tracking data, have made privacy and surveillance particularly relevant issues right now.

Required Reading

Jennifer Valentino-DeVries et al (NYT), Your Apps Know Where You Were Last Night, and They’re Not Keeping It Secret

Phillip Rogaway, The Moral Character of Cryptographic Work

Alvaro Bedoya, Privacy and Civil Rights in the Age of Facebook, ICE, and the NSA

Maciej Ceglowski, The New Wilderness

Optional Reading

Chris Gilliard, Caught in the Spotlight

Forget about “privacy”: Julia Angwin and Trevor Paglen on our data crisis

Lindsey Barrett, Our collective privacy problem is not your fault

Zeynep Tufekci, The Latest Data Privacy Debacle

Tim Wu, How Capitalism Betrayed Privacy

Week 5: How did we get here? Our Ecosystem

News stories understandably often focus on one particular ethics issue of one particular company. Here, I want us to step back and consider some of the broader trends and factors that have resulted in the types of issues we are seeing. These include our over-emphasis on metrics, the inherent design of many of the platforms, venture capital’s focus on hypergrowth, and more.

Required Reading:

Zeynep Tufekci, How social media took us from Tahrir Square to Donald Trump

James Grimmelman, The Platform is the Message

Rachel Thomas, The Problem with Metrics

Tim O’Reilly, The fundamental problem with Silicon Valley’s favorite growth strategy

Ali Alkhatib, Anthropological/Artificial Intelligence & the Institute for Human-centered AI

Week 6: Algorithmic Colonialism and Next Steps

When corporations from one country develop and deploy technology in many other countries, extracting data and profits, often with little awareness of local cultural issues, a number of ethical issues can arise. Here we will explore algorithmic colonialism. We will also consider next steps for how students can continue to engage around data ethics and take what they’ve learned back to their workplaces.

Required Reading:

Abeba Birhane, The Algorithmic Colonization of Africa

Amy Maxmen (Nature), Can tracking people through phone-call data improve lives?

Adrienne Lafrance, Facebook and the New Colonialism

Optional Reading:

Joe Parkinson et al, Huawei Technicians Helped African Governments Spy on Political Opponents

Davey Alba, How Duterte Used Facebook To Fuel The Philippine Drug War

Rumman Chowdhury, Algorithmic Colonialism

Daniel Greene, et al. Better, Nicer, Clearer, Fairer: A Critical Assessment of the Movement for Ethical Artificial Intelligence and Machine Learning

Jess Whittlestone et al, The role and limits of principles in AI ethics: towards a focus on tensions

Sareeta Amrute, Tech Colonialism Today

Weeks 7 & 8: TBA


Hi @rachel, I think in the last class you mentioned that the Google Doc was up to date but while the summary in the front is indeed up to date, the actual readings are still with the weeks 4 and 5 flipped.

(The readings are okay, I think it’s just the title?)

Hi @rearaujovillagra I think both the title and the readings are correct for weeks 4 (privacy) and week 5 (ecosystem). I will created the thread with the week 4 readings just to make sure it is clear.

1 Like