SAN FRANCISCO – Last fall, Google unveiled a breakthrough artificial intelligence technology called BERT that changed the way scientists build systems that teach people how to write and speak.
But BERT, which is now being implemented in services such as Google web search. motor, there is a problem: It can raise biases in the way a child imitates their parents' bad behavior.
BERT is one of a number of A.I. systems that learn from lots and lots of digitalized information, as diverse as old books, Wikipedia entries and news. Decades and even centuries of bias ̵
BERT and its peers are more likely to associate men with computer programming, for example, and generally do not give women enough credit. One program decided that almost everything written about President Trump was negative, even if the actual content was flattering.
As a new, more sophisticated A.I. entering a wider range of products, such as online advertising services and business software, or talking digital assistants such as Apple Siri and Amazon's Alexa, technology companies will be pressured to guard against unexpected biases that are being discovered.
But scientists are still learning how technology like BERT, called "universal language models," works. And they are often surprised by the mistakes their new A.I.
Recently in San Francisco, while researching a book on artificial intelligence, computer scientist Robert Munroe gave 100 English words in English: "jewelry," "baby," "horses," "house," "money," "action." In 99 out of 100 cases, BERT is more likely to associate words with men than with women. The word "mom" was further.
"This is the same historical inequality we have always seen," says Dr. Munro, who holds a PhD. in computer linguistics and previously oversaw the natural language and translation technologies at Amazon Web Services. "Now, with something like BERT, this bias can continue."
In a blog post this week, Dr. Munroe also describes how he researches cloud computing services from Google and Amazon Web Services that help other businesses are adding language skills to new applications. Both services failed to recognize the word "her" as a pronoun, although they correctly spelled "his."
"We are aware of the problem and are taking the necessary steps to resolve it," said a Google spokesman. "Mitigating the biases of our systems is one of our A.I. principles and is a top priority. "Amazon, in a statement, said it" devotes considerable resources to ensuring that our technology is highly accurate and reduces biases, including rigorous benchmarking, testing and investing in diverse training data. "
Researchers have long warned of AI bias that learns from large-scale data, including facial recognition systems used by police and other government agencies, as well as popular internet services by their giants like Google and Facebook. In 2015, for example, the Google Photos app was caught labeling African Americans as "gorillas." The services Dr. Munroe checks also show prejudice to women and people of color.
BERT and similar systems are much more complex – also complicated for anyone to predict what they will ultimately do.
"Even the people who build these systems don't understand how they behave," says Emily Bender, a professor at Washington University who specializes in computational linguistics.
BERT is one of the many universal language models used in industry and academia. Others are called ELMO, ERNIE and GPT-2. As a kind of inside joke among A.I. researchers, they are often referred to as Sesame Street characters. (Bert is short on two-way representations of transformer encoders.)
They learn the nuances of language by analyzing vast amounts of text. A system built by OpenAI, the San Francisco Artificial Intelligence Laboratory analyzes thousands of self-published books, including romance novels, mysteries, and science fiction. BERT analyzes the same book library along with thousands of Wikipedia articles.
Analyzing all this text, each system learned a specific task. The OpenAI system has learned to predict the next word in a sentence. BERT has learned to identify the missing word in a sentence (for example, "I want to ____ this car because it's cheap").
By studying these tasks, BERT comes to understand in a general way how people put words together. Then she can learn other tasks by analyzing more data. As a result, it allows A.I. apps to improve at a speed that wasn't possible before.
"BERT changed everything completely," says John Bohanan, director of science at Primer, a San Francisco-based startup specializing in natural language technology. "You can learn one pony in all the tricks."
Google itself uses BERT to improve its search engine. Previously, if you wrote "Aesthetics are a lot of work?" In Google's search engine, she didn't understand exactly what you were asking. Words like "stand" and "work" can have multiple meanings, serving either as nouns or verbs. But now, thanks to BERT, Google is correctly answering the same question with a link describing the physical needs of life in the skin care industry.
But tools like BERT are biased, according to a recent research paper from by a team of computer scientists at Carnegie Mellon University. For example, the document showed that BERT is more likely to associate the word "developer" with men than with women. Language bias can be a particularly difficult problem in conversational systems.
As these new technologies spread, deviations can occur almost anywhere. Today, Dr. Bohannan and his engineers have used BERT to build a system that allows businesses to automatically judge the mood of headlines, tweets, and other online media streams. Businesses use such tools to inform stock market deals and other sharp decisions.
But after learning his instrument, Dr. Bohanon noticed consistent biases. If a tweet or title contains the word "Trump", the tool almost always evaluates it to be negative, no matter how positive the sentiment is.
"This is difficult. You need a lot of time and care, "he said. "We found an obvious prejudice. But how many others are there? ”
Dr. Bohannon said computer scientists need to develop the skills of a biologist. As a biologist seeks to understand how a cell works, software engineers need to find ways to understand systems such as BERT.
Unveiling the new version of their search engine last month, Google executives have acknowledged the phenomenon. And they said that they are testing their systems closely to eliminate any bias.
Researchers are only now beginning to understand the effects of addiction in systems such as BERT. But as Dr. Munroe has shown, companies are already slowly recognizing even obvious biases in their systems. After Dr. Munroe pointed out the problem, Amazon corrected it. Google said it was working to fix the problem.
Primer CEO, Sean Gurley, said that testing the behavior of this new technology would become so important, it would create an entirely new industry in which companies pay professionals to audit their algorithms for all kinds addictions and other unexpected behavior.  "It's probably a billion-dollar industry," he said.