-
-
Notifications
You must be signed in to change notification settings - Fork 233
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conflating poor performance with structural bias #8
Comments
I agree with this - there is a difference between systems that have been designed for malign purposes and those that have weaknesses that can be exploited, or are subject to unintended consequences. Both Tay and the Google image classification algorithm are examples of the latter - their designers were not aware of the shortcomings of the model or learning system that would generate these outcomes (neither the designers of Tay nor the Google folk set out to create discrimination). So it may be worth making this distinction. |
That's a very good point regarding Tay! I'm not quite sure, however, if this distinction is really that easy to make in many other cases of the list (intended and non-intended). HireVue, for example, claims to prevent algorithmic bias and the company's stated goal is to remove human biases from the hiring process (also PredPol claims similar things). However, algorithmic bias is not completely preventable, for example, a better performance wouldn't have solved Google's auto tag system. So it really depends what the goal of this list should be. Personally, I have two possible goals in mind:
Feel free to comment and criticise! |
I agree that it's hard to come up with hard definitions here. At the same time there should be a minimum technical and organizational complexity to make it to the list: any random guy on the internet can come up with a conversational model running in a twitter bot that insults random people according to their race, sexual preferences and so on. Should we care about it? Probably not, if it's really just an idiot on the internet. If it's a released product of a company that has an economical interest in it, then probably we should care because the system has potential to spread and scale, together with its impact. The public should be aware of this and the list would help funneling this information. Does Tay falls into this category? I don't think so: it's a failed PR stunt from some R&D team in Microsoft that immediately pulled the bot off the internet and republished it weeks after with a better profanity filter. |
I like your point but I'm not sure if we should remove Tay on the premise of a "failed PR stunt". Tay, without any doubt, had impact in how we think about AI accountability. It is a very shiny and clear example of the dangers of sexist and racist chatbot technology. It helped to raise awareness in accountability within Microsoft and others. |
Not only poor performance - I don't think programs that reveal uncomfortable truth should be included at all. A program that predicts the risk of recidivism isn't biased against blacks, it's just that they have a higher risk of recidivism statistically, so the program reveals that. A program that tests people crossing the EU border is predicted to have an 85% accuracy rate - why does the page state that it's "likely going to have a high number of false positives"? And so on. |
@nukeop I strongly disagree! As someone who works in AI & statistics for a while, a training distribution is in general never equal to the true distribution. You are calculating statistics not on the real world but what is experienced through a human-biased collector. And what's worse, in my opinion, is that such a training distribution, in fact, can be collected in extremely socially unjust and discriminating ways. A model which is then trained on a discriminating training set will reflect this bias (when not properly taken care of). The underlying motivation of this awful list is following: A model is not an omniscient oracle which we should blindly trust - but simply a mere reflection of data collected by imperfect humans. That's why we need to fix algorithmic discrimination like we have to fix social discrimination. Someone building and working with an AI application that scales and influences possibly millions of people, such as predictive policing, should always keep that in mind. Discriminating models can influence not only behaviour but also the world view of people working with it. Policemen might mistakenly interpret discriminating AI predictions as truths. This is dangerous. |
However I agree with you that we need a set of formal community-curated criteria for being listed in the |
In the list I see examples like Tay that are mixed together with systems that have problems on a design and structural level (PredPol for example). Tay was just a bad marketing idea, probably from some manager very confused about how the internet works and what are the results of letting it influence your product/model without proper counter-measures. A lot of the articles on that event were fear-mongering against AI as a whole and I think this list should serve better its purpose by staying clean of non-news like the one related to Tay and actually focus on problematic usages of machine lerning systems.
The text was updated successfully, but these errors were encountered: