Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Submitting suggestion for new data set. #228

Closed
wants to merge 4 commits into from
Closed

Conversation

OwenG7
Copy link

@OwenG7 OwenG7 commented Aug 19, 2019

Could be turned into a template in the future.

@OwenG7 OwenG7 requested a review from dalelane as a code owner August 19, 2019 14:40
@kevinrobinson
Copy link

kevinrobinson commented Aug 19, 2019

This seems like it's an example for classifying young people by gender based on image examples. If so, when introducing this concept to young people it might make sense to also introduce them to issues related to subgroup inaccuracy in binary gender classifiers, particularly related to skin color. http://gendershades.org/ has a good five-minute video that illustrates the issues, and has worked well with middle school groups. Alternately, this binary classification might not be appropriate for folks in communities where non-binary gender identifies are an important consideration. Here's more on that perspective as well, if you're curious to learn more in tweet or essay form. Or if there's some scenario where it would make sense to classify the gender of young children under 13, maybe you could share that with the example and students could explore ethical and sociotechincal dimensions as part of the scenario.

Regardless, I think it's important for us to teach young people awareness that these kinds of uses of machine learning are politically contested. I'm from the US, but in my state, a recent ACLU survey found that 67% of people were concerned about facial recognition surveillance technology that classifies people based on their perceived gender, skin color or age:

image

Some other municipal governments like San Fransisco have passed ordinances banning use of this kind of technology by police systems, and some US states have proposed legislation requiring audits of ML systems doing this kind of gender classification:

Screen Shot 2019-08-19 at 10 59 32 AM

So if this is a template for future examples, I'd suggest broadening the perspective to include the sociotechnical considerations in designing AI systems as well. I think those are important for engaging and teaching all young folks about ML. For digging into sociotechnical issues, there are some other examples of possible datasets here: #201

For avoiding sociotechnical issues, using plant or bird or dog breed classifiers might be a better first example.

@dalelane
Copy link
Member

@kevinrobinson Hiya.

@OwenG88 had discussed this with me before submitting the pull request which is why he didn't put much detail into the PR notes.

This dataset is in support of a project worksheet he's working on that is explicitly looking to demonstrate the limitations and challenges in doing gender classification. The dataset is intentionally biased and flawed. At a very high level, the project he's working on would be to give students a biased, flawed dataset as a starting point, and guide them through identifying what is wrong with it, and experiment with ways to improve it.

@kevinrobinson
Copy link

@dalelane @OwenG88 got it, thanks for sharing! That sounds super interesting, and I'll be excited to check it out.

FWIW, I'd also be really excited to see work that's broadening the scope beyond just "the classifier is not accurate." When I read through the work of researchers who have studied this, or people who have engaged politically like above, or even AI principles from companies like IBM, sociotechnical issues are far more complex and low accuracy isn't the sole issue. It's also about how systems are designed and used.

In that spirit, I myself would struggle to come up with a realistic scenario where we'd want to model for young people that it's a good idea to do make ML systems that classify the gender of young people based on their photos, just as long as the dataset is balanced and the classifier is accurate for various subgroups.

For me, teaching young people to make a system like that would be quite clearly unethical unless it focused on the consent procedures for collecting those training images, and centered questions about how the system would be used, and what kinds of decisions the system would influence downstream. Accuracy issues are important, but a small detail in the broader sociotechnical context.

I think you hear me though :) so I'll just look forward to seeing what comes next, and thanks for listening and for sharing your work in the open as always. 👍

@OwenG7
Copy link
Author

OwenG7 commented Aug 22, 2019

This is a preliminary version of the worksheet to go along with this dataset: worksheet-boyorgirl.docx

I would be interested in hearing your thoughts @kevinrobinson and possible additions which could improve the exercise. I have tried to introduce the larger issues faced when designing a model like this whilst keeping enough simplicity to make the exercise accessible.

@kevinrobinson
Copy link

kevinrobinson commented Aug 22, 2019

Hi @OwenG88! Sure, I'm happy to share some feedback, although I think it will be a bit similar to what I've already written above. My assumption is that you might be busy and not able to engage with some of the short videos, tweets or longer essays above so I'll try to adjust the feedback style to be really specific and concrete. Let me know if this is helpful or if there are other ways I can try to help as you're working through what makes sense :)

Also, I think this lesson is really great at getting at "algorithms as opinions." Personifying that in a story or scenario could help further. In other words, give a name to the person that created that data set. And give a real role to the person creating that ML system, in a relationship with the person who collected the data set. I also think that doing that could make this a great lesson in and of itself, if it were with something less complicated than classifying gender from an image. Like you could adapt it to "cute dogs" to highlight the influence of collecting the data set, and the subjective nature of the resulting system.

Assuming you want to work through the complexity of gender classification, here's my best attempt at some specific suggestions below!

1. Include some realistic contexts or situations where someone would create a system to classify someone else's gender from an image

To use IBM's AI principles as one example, its first area of focus highlights that:

A tech-centric focus that solely revolves around improving the capabilities of an intelligent system doesn’t sufficiently consider human needs. An ethical, human-centric AI must be designed and developed in a manner that is aligned with the values and ethical principles of a society or the community it affects.

The lesson here starts with "In this project you will make a Python project that learns to sort
photos." Are there ways to reframe that focus on why students can benefit from learning to make systems like this, and what responsibilities they might have to consider when making systems like this? I understand the desire towards simplicity - it's so great! - but here I struggle to come up with a scenario where someone would make and use a system like this (versus allowing the other person to classify themselves).

It's hard for me to come up with a concrete suggestion here, but maybe you could find examples from law enforcement or other surveillance systems? Schools often have video surveillance systems, maybe there's some scenarios there where a system like this could make sense? Students might engage differently if the system were being used on them, or on people they knew.

Oh! Actually this tweet got me thinking of the Geena Davis Inclusion Quotient, which used a classification system like this to do an empirical analysis of gender representation in film. Perhaps a scenario could involve asking students to design a small part of such a system in the ML for Kids lesson. Then the final part of the lesson can 'reveal' a chart or two of data that the actual study found.

Specific suggestion: Frame the lesson as being about using ML tools to detect issues of gender representation, and the close it with asking students to guess what came up in the study above, and then showing them the results in a chart.

2. Include the awareness that gender classification systems based on appearance have a long and ethically troubled history

See http://gendershades.org/ for a five minute video, which has previously worked well in middle school contexts (see more about those lessons here). You mention hair length here, but classification accuracy is often different across more concerning dimensions like race and skin color. For more, AI Now also has a report on this with more historical context, including other recent examples of AI systems created in the last few years for unethical purposes like detecting sexuality or criminality from face images:

The histories of ‘race science’ are a grim reminder that race and gender classification based on appearance is scientifically flawed and easily abused. Systems that use physical appearance as a proxy for character or interior states are deeply suspect, including AI tools that claim to detect sexuality from headshots, predict ‘criminality’ based on facial features, or assess worker competence via ‘micro-expressions.’ Such systems are replicating patterns of racial and gender bias in ways that can deepen and justify historical inequality.

Specific suggestion: Add the five minute video above with one discussion question.

3. Include that use of this technology is politically contested

I'm most familiar with this in the US state where I live, where surveys show most people are concerned about these kinds of systems, and there is proposed legislation specifically about gender classification systems like this. In the city I live in, the municipality passed an ordinance prohibiting the city government from using a system like the one created in this lesson. Of course ML for Kids should engage with a diversity of opinions and perspectives, but these kinds of concerns don't seem represented in the lesson as it is now.

One idea is similar to this tweet about doing this with middle schoolers in the US:

What made them care even more was watching jovialjoy testify before Congress about biases in facial recognition technology and come to the realization that they were more of experts in these topics than their congresspeople. #ai #ethics #ed

Specific suggestion: Add a short clip showing Joy from the previous video testifying about this to the US Congress (or something similar in the UK or elsewhere).

4. Consider including a scenario of how classification systems like this impact folks who are not so neatly classified

I've excerpted the most relevant two paragraphs from the essay I linked above. Something like this might help students visualize and understand how a gender classification system can impact someone even if it's "accurate" in a certain frame. This has direct and emotional language, but it's important to what the person is saying:

The TSA agent motions me to step into the millimeter wave scanner. I raise my arms and place my hands in a triangle shape, palms facing forward, above my head. The scanner spins around my body, and then the agent signals for me to step forward out of the machine and wait with my feet on the pad just past the scanner exit. I glance to the left, where a screen displays an abstracted outline of a human body. As I expected, bright fluorescent yellow blocks on the diagram highlight my chest and groin areas. You see, when I entered the scanner, the TSA operator on the other side was prompted by the UI to select ‘Male’ or ‘Female.’ Since my gender presentation is nonbinary femme, usually the operator selects ‘female.’ However, the three dimensional contours of my body, at millimeter resolution, differ from the statistical norm of ‘female bodies’ as understood by the dataset and risk algorithm designed by the manufacturer of the millimeter wave scanner (and its subcontractors), and as trained by a small army of clickworkers tasked with labelling and classification (as scholars Lilly Irani and Nick Dyer-Witheford, among others, remind us). If the agent selects ‘male,’ my breasts are large enough, statistically speaking, in comparison to the normative ‘male’ body-shape construct in the database, to trigger an anomalous warning and a highlight around my chest area. If they select ‘female,’ my groin area deviates enough from the statistical ‘female’ norm to trigger the risk alert, and bright yellow pixels highlight my groin, as visible on the flat panel display. In other words, I can’t win. I’m sure to be marked as ‘risky,’ and that will trigger an escalation to the next level in the TSA security protocol.  

This is, in fact, what happens: I’ve been flagged, the screen shows a flourescent yellow highlight around my groin. Next, the agent asks me to step aside, and (as usual) asks for my consent to a physical body search. Typically at this point, once I am close enough to the agent, they become confused about my gender. This presents a problem, because the next step in the security protocol is for either a male or female TSA agent to conduct a body search by running their hands across my arms and armpits, chest, hips and legs, and inner thighs. The agent is supposed to be male or female, depending on whether I am male or female. As a nonbinary trans femme, I present a problem not easily resolved by the algorithm of the security protocol. Sometimes, the agent will assume I prefer to be searched by a female agent; sometimes, male. Occasionally, they ask whether I prefer a search by a male or female agent. Unfortunately, ‘neither’ is an honest but not an acceptable response. Today, I’m particularly unlucky: a nearby male agent, observing the interaction, loudly states “I’ll do it!” and strides over to me. I say “Aren’t you going to ask me what I prefer?” He pauses, seems angry, and begins to move towards me again, but the female agent stops him. She asks me what I would prefer. Now I’m standing in public, surrounded by two TSA agents, with a line of curious travelers watching the whole interaction. Ultimately, the aggressive male agent backs off and the female agent searches me, making a face as if she’s as uncomfortable as I am, and I’m cleared to continue on to my gate.

The two-tweet version is this and this, and I imagine folks could help revise the language here to keep the emotional impact and feeling of vulnerability, while minimizing the awkwardness for young people in talking about specific body parts by name. Alternately, we could also find other stories of how principles of "data minimization" apply to people working in healthcare, clinical and other care settings - a general guideline is "don't ask if you don't have an immediate clinical reason to know" since the design of binary gender classification systems excludes some people in ways that they find particularly troubling.

Specific suggestion: If using the scenario in #1, add this story at the end to show how using ML for advocacy can inadvertently harm other people as well based on how that advocacy is framed.

Thanks for listening, and for doing the hard work to figure out how to help engage young people with the full complexity of designing and creating ML systems! 👍

@dalelane
Copy link
Member

closing as this was reworked into https://github.com/IBM/taxinomitis-docs/blob/master/project-worksheets/pdf/worksheet-fooled.pdf

@dalelane dalelane closed this Feb 21, 2020
@kevinrobinson
Copy link

FWIW, related news on this topic, and a discussion on other ways to audit for gender bias.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants