Fastbook Chapter 3 questionnaire solutions (wiki)

ilovescience · April 8, 2020, 1:16am

Here are the questions:

Does ethics provide a list of “right answers”?

There is no list of do’s and dont’s. Ethics is complicated, and context-dependent. It involves the perspectives of many stakeholders. Ethics is a muscle that you have to develop and practice. In this chapter, our goal is to provide some signposts to help you on that journey.

How can working with people of different backgrounds help when considering ethical questions?

Different people’s backgrounds will help them to see things which may not be obvious to you. Working with a team is helpful for many “muscle building” activities, including this one.

What was the role of IBM in Nazi Germany? Why did the company participate as they did? Why did the workers participate?

IBM supplied the Nazis with data tabulation products necessary to track the extermination of Jews and other groups on a massive scale. This was driven from the top of the company, with marketing to Hitler and his leadership team. Company President Thomas Watson personally approved the 1939 release of special IBM alphabetizing machines to help organize the deportation of Polish Jews. Hitler awarded Watson a special “Service to the Reich” medal in 1937.

But it also happened throughout the organization. IBM and its subsidiaries provided regular training and maintenance on-site at the concentration camps: printing off cards, configuring machines, and repairing them as they broke frequently. IBM set up categorizations on their punch card system for the way that each person was killed, which group they were assigned to, and the logistical information necessary to track them through the vast Holocaust system. IBM’s code for Jews in the concentration camps was 8, where around 6,000,000 were killed. Its code for Romanis was 12 (they were labeled by the Nazis as “asocials”, with over 300,000 killed in the Zigeunerlager , or “Gypsy camp”). General executions were coded as 4, death in the gas chambers as 6.

The marketers were just doing what they could to meet their business development goals. Edwin Black, author of “IBM and the Holocaust”, said: “To the blind technocrat, the means were more important than the ends. The destruction of the Jewish people became even less important because the invigorating nature of IBM’s technical achievement was only heightened by the fantastical profits to be made at a time when bread lines stretched across the world.”

What was the role of the first person jailed in the VW diesel scandal?

It was one of the engineers, James Liang, who just did what he was told.

What was the problem with a database of suspected gang members maintained by California law enforcement officials?

A database of suspected gang members maintained by California law enforcement officials was found to be full of errors, including 42 babies who had been added to the database when they were less than 1 year old (28 of whom were marked as “admitting to being gang members”). In this case, there was no process in place for correcting mistakes or removing people once they’d been added.

Why did YouTube’s recommendation algorithm recommend videos of partially clothed children to pedophiles, even although no employee at Google programmed this feature?

The problem here is the centrality of metrics in driving a financially important system. When an algorithm has a metric to optimise, as you have seen, it will do everything it can to optimise that number. This tends to lead to all kinds of edge cases, and humans interacting with a system will search for, find, and exploit these edge cases and feedback loops for their advantage.

What are the problems with the centrality of metrics?

When an algorithm has a metric to optimise, as you have seen, it will do everything it can to optimise that number. This tends to lead to all kinds of edge cases, and humans interacting with a system will search for, find, and exploit these edge cases and feedback loops for their advantage.

Why did Meetup.com not include gender in their recommendation system for tech meetups?

Meetup had observed that men expressed more interest than woman towards attending Tech meetups.

They were concerned that including gender in the recommendation algorithm would create a self-reinforcing feedback loop where it would recommend Tech meetups mainly to men.

To avoid this situation and continue to recommend Tech meetups to their users regardless of the gender, they simply decided to not include gender in the recommendation algorithm.

What are the six types of bias in machine learning, according to Suresh and Guttag?

Historical bias
Bias that our datasets and models inherit from the real world. People are biased, processes are biased and society in general is biased.

Representation bias
When the model emphasize some property of the data as it seemingly has the closest correlation with the prediction, even though that might not be the truth.

An example is the gender property in the occupation prediction model where the model only predicted 11.6% of surgeons to be women whereas the real number was 14.6%.

Measurement bias
When we are measure the wrong thing, measure it in the wrong way or incorporate the measurement inappropriately.

An example is the stroke prediction model that includes information about if a person went to a doctor in it’s prediction if a patient had a stroke.

Aggregation bias
When data is aggregated to the extent where it is does not take the differences in the heterogeneous population of data into account.

An example is that effectiveness of treatments in medicine for some diseases differs on gender and etnicity, but where those parameters are not present in the training data as they have been “aggregated away”

Deployment bias

Evaluation bias

Give two examples of historical race bias in the US

When doctors were shown identical files, they were much less likely to recommend cardiac catherization (a helpful procedure) to Black patients.

An all-white jury was 16% more likely to convict a Black defendant than a white one, but when a jury had at least one Black member, it convicted both at the same rate.

Where are most images in Imagenet from?

The US and other Western countries.

This leads to models trained on the ImageNet dataset performing worse for other countries and cultures that doesn’t have as much representation in the dataset.

In the paper “Does Machine Learning Automate Moral Hazard and Error” why is sinusitis found to be predictive of a stroke?
What is representation bias?

When the model emphasize some property of the data as it seemingly has the closest correlation with the prediction, even though that might not be the truth.

An example is the gender property in the occupation prediction model where the model only predicted 11.6% of surgeons to be women whereas the real number was 14.6%.

How are machines and people different, in terms of their use for making decisions?

Humans use people and algorithms differently when getting advice on decisions.

People assume that algorithms are objective or/and error-free.

Algorithms are more likely to be implemented with a no-appeals process in palce.

Algorithms are often used at scale.

Algorithmic systems are cheap.

Is disinformation the same as “fake news”?

Disinformation has a history stretching back hundreds or even thousands of years. It is not necessarily about getting someone to believe something false, but rather often used to sow disharmony and uncertainty, and to get people to give up on seeking the truth.

To do that disinformation often contain exaggerations, seeds of truth or half-truths taken out of context rather than just “fake news”.

Why is disinformation through auto-generated text a particularly significant issue?

Disinformation through autogenerated text is a particularly significant issue, due to the greatly increased capability provided by deep learning.

What are the five ethical lenses described by the Markkula Center?

The objective of looking through different ehtical lenses when making a decision is to uncover conrete issues with the different options. The lenses are:

The rights approach
Which option best respects the rights of all who have a stake?

The justice approach
Which option trats people equally or proprtionally?

The utilitarian approach
Which option will produce the most good and do the least harm?

The common good approach
Which option best serves the community as a whole, not just some members.

The virtue approach
Which option leads me to act as the sort of person I want to be?

Where is policy an appropriate tool for addressing data ethics issues?

Policies are an appropriate tool for addressing data ethics issues when is likely that design fixes, self regulation and technical approaches to addressing problems, involving ethical uses of Machine Learning are not working.

While such measures can be useful, they will not be sufficient to address the underlying problems that have led to our current state. For example, as long as it is incredibly profitable to create addictive technology, companies will continue to do so, regardless of whether this has the side effect of promoting conspiracy theories and polluting our information ecosystem. While individual designers may try to tweak product designs, we will not see substantial changes until the underlying profit incentives changes.

Because of the above it is almost certain that policies will have to be created by government to address these issues.

ref: 03_ethics - Jupyter Notebook

ilovescience · April 8, 2020, 1:16am

@muellerzr Please wiki-fy

bagge · December 29, 2020, 11:05am

Added answers to 8, 9, 10, 11, 13, 14, 15, 16, 17