An artificial intelligence can decode words and sentences from brain activity with surprising but still limited accuracy. Using just a few seconds of brain activity data, the AI guesses what a person has heard. It lists the correct answer among its top 10 possibilities up to 73 percent of the time, researchers found in a preliminary study.
The “performance of the AI was above what many people thought was possible at this stage,” says Giovanni Di Liberto, a computer scientist at Trinity College Dublin who was not involved in the research.
Developed at Facebook’s parent company Meta, the AI could eventually be used to help thousands of people around the world. unable to communicate through speech, writing, or gestures, the researchers report August 25 at arXiv.org. That includes many patients in minimally conscious, locked-down, or “vegetative” states, now generally known as unresponsive wakefulness syndrome (Serial number: 8/2/19).
Most of the existing technologies to help such patients communicate require risky brain surgeries to implant electrodes. This new approach “could provide a viable path to help patients with communication deficits… without the use of invasive methods,” says neuroscientist Jean-Rémi King, a Meta AI researcher currently at the École Normale Supérieure in Paris.
King and his colleagues trained a computational tool to detect words and sentences in 56,000 hours of voice recordings from 53 languages. The tool, also known as a language model, learned to recognize specific language features both at a fine-grained level (think letters or syllables) and at a broader level, such as a word or a sentence.
The team applied an AI with this language model to databases from four institutions that included the brain activity of 169 volunteers. In these databases, the participants listened to various stories and phrases from, for example, the work of Ernest Hemingway. The old man and the sea and Lewis Carroll Alicia‘s Adventures in Wonderland while people’s brains were scanned using magnetoencephalography or electroencephalography. Those techniques measure the magnetic or electrical component of brain signals.
Then, with the help of a computational method that helps explain physical differences between real brains, the team tried to decode what the participants had heard using just three seconds of brain activity data from each person. The team instructed the AI to align the speech sounds from the story recordings with patterns of brain activity that the AI calculated as corresponding to what people were listening to. It then made predictions about what the person might have been listening to during that short time, given more than 1,000 possibilities.
Using magnetoencephalography, or MEG, the correct answer was among the AI’s 10 best guesses up to 73 percent of the time, the researchers found. With electroencephalography, that value dropped to no more than 30 percent. “[That MEG] the performance is very good,” says Di Liberto, but he is less optimistic about its practical use. “What can we do with it? Any. Absolutely nothing.”
The reason, he says, is that MEG requires a bulky and expensive machine. Bringing this technology into clinics will require scientific innovations that make the machines cheaper and easier to use.
It’s also important to understand what “decoding” really means in this study, says Jonathan Brennan, a linguist at the University of Michigan in Ann Arbor. The word is often used to describe the process of deciphering information directly from a source, in this case, speech from brain activity. But the AI was able to do this only because it was given a finite list of possible correct answers to make its guesses.
“With language, that’s not enough if we want to scale for practical use, because language is infinite,” says Brennan.
Also, Di Liberto says, the AI decoded information from participants passively listening to the audio, which is not directly relevant to nonverbal patients. For it to become a meaningful communication tool, scientists will need to learn to decipher from brain activity what these patients are trying to say, including expressions of hunger, discomfort, or a simple “yes” or “no.”
The new study is “the decoding of speech perception, not production,” agrees King. Although speech production is the ultimate goal, for now, “we’re pretty far along.”