How to create a racist AI, without even trying. Part 1

The other day, based on another article devoted to the problem of racism in speech recognition, I participated in a big controversy about who is to blame. Some people were sure that this was a conspiracy of programmers. In fact, the truth is in the data that the AI uses for its training. I decided to conduct an experiment to clearly prove it. It turned out that Rob Speer had already done everything for me.

I want to share with you the translation of his material, which clearly shows that even the most default version of AI will be thoroughly imbued with racism. In the first article we will conduct an experiment, in the second we will try to figure out how to overcome the monster that we have created.

Maybe you heard about the experimental Tay chat bot , which Microsoft experts launched on Twitter. In just one day, his notes became so provocative that Microsoft had to turn off the bot and never mention its name. You probably think that this does not threaten you, because you are not doing any strange things (in particular, do not give any idlers the opportunity to train your AI on Twitter).
')
In this tutorial, I want to show the following: even if you use the most standard natural language processing algorithms, popular data sets and methods, the result can be a racist classifier, which should not exist in nature.

Good news: this can be avoided. To eliminate the appearance of racist ways in your classifier, you will need to make a little extra effort. In this case, the revised version may be even more accurate. But to fix the problem, you need to know what it is, and not grab the first working option.

Let's make a text tone classifier!

Tonality analysis is a very common task of NLP , which is not surprising. A system that is able to understand whether a person has left a positive or negative comment has many uses in business. Such solutions are used to monitor social media publications, track customer reviews, and even securities trading (for example , bots who bought Berkshire Hathaway shares after actress Anne Hathaway received good critics feedback).

This is a simplified (sometimes too simplistic) approach, but it is one of the easiest ways to get quantitative estimates of human-generated texts. In just a few steps, you can prepare a system that processes texts and provides positive and negative evaluations. In this case, you do not have to deal with complex data presentation formats, such as parse trees or entity diagrams.

We will now compile a classifier that is familiar to any NLP specialist. Moreover, at each stage we will choose the easiest option to implement. Such a model, for example, is described in the article Deep Averaging Networks . It is not the main subject of the article; therefore, this reference should not be considered a criticism of the results obtained. There, the model is given simply as an example of a well-known way of using vector representations of words.

Here is our action plan:

Get somewhere widely used vector representations of words.
Take data for training and testing, containing the most standard words of positive and negative tonality.
Gradient descent method to train the classifier to recognize other positive and negative words.
Calculate estimates of the tonality of text sentences using this classifier.
To be terrified by the monster we created.

After that, you will know how to unintentionally make a racist AI.

I would like to avoid such a final, so then we will do the following :

Let's perform a statistical assessment of the problem in order to be able to recognize it in the future.
Improve the data so as to get a more accurate and less racist semantic model.

Required software

This manual is written in Python, all libraries are listed below.

import numpy as np import pandas as pd import matplotlib import seaborn import re import statsmodels.formula.api from sklearn.linear_model import SGDClassifier from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score %matplotlib inline seaborn.set_context('notebook', rc={'figure.figsize': (10, 6)}, font_scale=1.5)

You can replace scikit-learn with TensorFlow, Keras, or any other component that contains a gradient descent algorithm.

Step 1. Vector word representations

Vector word representations are often used to convert words to a format that is conveniently processed by machine learning systems. Words are represented as vectors in multidimensional space. The smaller the distance between the vectors, the closer the meaning of the corresponding words. Vector representations of words make it possible to compare words not by letter, but by (approximate) meaning.

To get good vector word representations, you need to process hundreds of gigabytes of text. Fortunately, many groups of machine learning experts have already done this work and shared the finished materials.

There are two well-known sets of vector representations of words: word2vec (data from Google News were used as educational material for their creation) and GloVe (educational material: web pages processed by Common Crawl). The end results will be similar for both sets. GloVe is based on a more transparent data source, so we will use it.

Three GloVe archives are available for download: 6, 42 and 840 billion records. 840 billion is a lot, but to extract more value from this archive than from the $ 42 billion set will require complex post-processing. The 42 billion version is quite functional and contains a round number of words - 1 million. We are on the path of least resistance, so we’ll use the 42 billion version.

So, we download the glove.42B.300d.zip archive from the GloVe website and unpack the data/glove.42B.300d.txt . Next, we need to create a function that will read vector representations of words in a simple format.

 def load_embeddings(filename): """ Load a DataFrame from the generalized text format used by word2vec, GloVe, fastText, and ConceptNet Numberbatch. The main point where they differ is whether there is an initial line with the dimensions of the matrix. """ labels = [] rows = [] with open(filename, encoding='utf-8') as infile: for i, line in enumerate(infile): items = line.rstrip().split(' ') if len(items) == 2: # This is a header row giving the shape of the matrix continue labels.append(items[0]) values = np.array([float(x) for x in items[1:]], 'f') rows.append(values) arr = np.vstack(rows) return pd.DataFrame(arr, index=labels, dtype='f') embeddings = load_embeddings('data/glove.42B.300d.txt') embeddings.shape # (1917494, 300)

Step 2. Standard Lexical Lexicon

We need somewhere to take information about which words have a positive tone and which ones have a negative one. There are many lexicons of tonality, but, as usual, we will choose one of the simplest. Download the archive from the Bin Liu website and extract the lexicon files, data/positive-words.txt and data/negative-words.txt .

Next, we need to set a way to read these files and read their contents into the variables pos_words and neg_words .

 def load_lexicon(filename): """ Load a file from Bing Liu's sentiment lexicon (https://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html), containing English words in Latin-1 encoding. One file contains a list of positive words, and the other contains a list of negative words. The files contain comment lines starting with ';' and blank lines, which should be skipped. """ lexicon = [] with open(filename, encoding='latin-1') as infile: for line in infile: line = line.rstrip() if line and not line.startswith(';'): lexicon.append(line) return lexicon pos_words = load_lexicon('data/positive-words.txt') neg_words = load_lexicon('data/negative-words.txt')

Step 3. Learning a model for predicting the tonality of words

Some words are missing in the GloVe dictionary. If the vector value is absent, then as a result of reading we get a vector from the values of NaN. Remove such vectors.

 pos_vectors = embeddings.loc[pos_words].dropna() neg_vectors = embeddings.loc[neg_words].dropna()

Next, create arrays of the desired input and output data. Input: vector word meanings; output: 1 for positively colored words and -1 for negatively colored. We also need to save the words themselves in order to be able to interpret the results.

 vectors = pd.concat([pos_vectors, neg_vectors]) targets = np.array([1 for entry in pos_vectors.index] + [-1 for entry in neg_vectors.index]) labels = list(pos_vectors.index) + list(neg_vectors.index)

Give me a sec! But after all, some words are neutral, they are deprived of any tonality. Don't we need a third grade for neutral words?

I think examples of neutral words would be useful to us, especially since the problems that we face arise from attributing a tone to neutral words. If we could reliably identify neutral words, then the complication of the classifier (the addition of the third class) would be justified. To do this, we need a source of examples of neutral words, because in the set we have chosen there are only positive and negatively colored words.

Therefore, I created a separate version of this notebook, added 800 neutral words as examples and set a large weighting factor for the words to be neutral. But the results were almost identical to those presented below.

How did the creators of the list share the words of positive and negative tonality? Doesn't tonality depend on the context?

Good question. A general analysis of the tonality of the text is not as simple as it seems. The border we are trying to find is not always straightforward. In the list that we chose, the word “impudent” is marked as bad, and “ambitious” - as good. “Comical” is bad, “funny” is good. “Reimbursement” is good, although situations in which you or you demand reimbursement are rarely pleasant.

I think everyone understands that the tonality of a word depends on the context, but if we implement a simple approach to the analysis of tonality, we assume that the averaged values of the tonality of words will allow us to get a generally correct answer without considering the context.

We will divide the input vectors, output values, and labels into sets of training and test data. For testing we will use 10% of the data.

 train_vectors, test_vectors, train_targets, test_targets, train_labels, test_labels = \ train_test_split(vectors, targets, labels, test_size=0.1, random_state=0)

Next, we compose a classifier and start training for it - 100 iterations of processing training vectors. As a loss function, we use a logistic function. So our classifier will be able to calculate the probability that a given word is positive or negative.

 model = SGDClassifier(loss='log', random_state=0, n_iter=100) model.fit(train_vectors, train_targets)

Now check the classifier on the test vectors. It turns out that he correctly recognizes the tonality of words outside the training set in 95% of cases. Not bad at all.

 accuracy_score(model.predict(test_vectors), test_targets) # 0,95022624434389136

We also define a function that will show the tonality of individual words predicted by the classifier. Our classifier is able to assess the tonality of words that are not included in the training set.

 def vecs_to_sentiment(vecs): # predict_log_proba gives the log probability for each class predictions = model.predict_log_proba(vecs) # To see an overall positive vs. negative classification in one number, # we take the log probability of positive sentiment minus the log # probability of negative sentiment. return predictions[:, 1] - predictions[:, 0] def words_to_sentiment(words): vecs = embeddings.loc[words].dropna() log_odds = vecs_to_sentiment(vecs) return pd.DataFrame({'sentiment': log_odds}, index=vecs.index)

Step 4. Get an assessment of the tonality of the text

There are many ways to evaluate text tonality based on tonality values for vector representations of individual words. We will continue to follow the path of least resistance and just average them.

 import re TOKEN_RE = re.compile(r"\w.*?\b") # The regex above finds tokens that start with a word-like character (\w), and continues # matching characters (.+?) until the next word break (\b). It's a relatively simple # expression that manages to extract something very much like words from text. def text_to_sentiment(text): tokens = [token.casefold() for token in TOKEN_RE.findall(text)] sentiments = words_to_sentiment(tokens) return sentiments['sentiment'].mean()

What can be improved here?

Calculate weights for words that are inversely proportional to their frequency so that the most common words (for example, the or I) do not have a strong effect on the assessment of tonality.
Modify the averaging formula so that it does not get the largest modulo tonality estimates for short sentences.
Take into account the context, that is, the whole phrase.
Use a more functional sentence-word splitting algorithm that correctly processes apostrophes.
Consider negatives, i.e., correctly handle phrases such as not happy.

But for all this, you need to write additional code, and the results below will not change fundamentally. At the very least, we can roughly compare the relative emotional color of various sentences.

 text_to_sentiment("this example is pretty cool") # 3.889968926086298 text_to_sentiment("this example is okay") # 2.7997773492425186 text_to_sentiment("meh, this example sucks") # -1.1774475917460698

Step 5. Fear the monster we created

Some sentences will not contain words with a single key. Let's see how our system handles several analogs of the same neutral offer.

 text_to_sentiment("Let's go get Italian food") # 2.0429166109408983 text_to_sentiment("Let's go get Chinese food") # 1.4094033658140972 text_to_sentiment("Let's go get Mexican food") # 0.38801985560121732

Approximately the same thing happened to me in other experiments that analyzed reviews of restaurants using vector word meanings. Then it turned out that all Mexican restaurants get a lower tonality mark without any objective reason.

If you process words with context, the vector meanings of words can reflect subtle nuances of meaning. So, they allow to detect and more pronounced phenomena, such as social prejudices.

Here are some more neutral suggestions.

 text_to_sentiment("My name is Emily") # 2.2286179364745311 text_to_sentiment("My name is Heather") # 1.3976291151079159 text_to_sentiment("My name is Yvette") # 0.98463802132985556 text_to_sentiment("My name is Shaniqua") # -0.47048131775890656

Well well.

The name change alone greatly changes the assessment of the tonality that the system gives out. This and many other examples show that when using names that are associated with white people, the predicted tone is on average more positive than with stereotypical names for people with dark skin.

So, making sure that even the most basic implementation of AI is terribly biased, I suggest taking a short pause to think about it. In the second article, we will return to the topic and will correct the mistakes of the non-intelligent AI.

Source: https://habr.com/ru/post/336358/

All Articles