Deb Roy
Associate Professor of Media Arts and Sciences, MIT
Talking Like a Human
Deb Roy has been building robots since the age of six. At first they were just cosmetic, primitive beings constructed from a young boy's imagination. Later, they would become complicated, entrusted with robot 'brains' that listen and recognize how words relate to one another and to the external world.
As Director of the MIT Media Lab's Cognitive Machines group, Roy designs machines that learn to communicate with people and creates new tools to study how children learn to communicate. He's also fundamentally challenging and extending our notions of what it means to be human.
Roy received his PhD in cognitive science from MIT in 1999. His first research robot, a toucan named Toco, was built to explore whether a machine could learn words from sights and sounds just as infants do. He programmed the robot to learn from "show and tell"—by uttering simple phrases like "look at the red ball" while showing the robot a ball, Toco would learn to associate spoken words with their visual meanings.
Roy's work then took an unusual turn when he embarked on an experiment to use the robot's learning system as an instrument for studying mother-child language acquisition. He fed recorded interactions between mothers and their infants into Toco and surprisingly, the robot was able to learn again. What Toco couldn't do was grasp deeper concepts related to social interaction that is the bedrock of child-caregiver interaction. "There's a social reality out there to which symbols relate and Toco couldn't relate," says Roy. Human beings, it seems, are a complicated species.
In order to teach a robot to converse with people, Roy knew he first needed to figure out how human beings acquire language and fortuitously, the birth of Roy's son provided just the right opportunity for study. The Human Speechome Project is Roy's 3-year experiment to collect both video and audio recordings of his son's language development.
Through a network of ceiling-mounted video cameras and microphones, the project is generating approximately 200 gigabytes of observational data each day. The study is documenting almost every sound his son makes, from the babble of infancy to the formation of words, from building a vocabulary to the dance of sentence structure. And at the same time, Roy's team is studying the social and physical learning contexts that shape his son.
Roy says one of the most fascinating findings of the project has been observing the rich interaction among people that underlies language acquisition. He uses the example of what he calls "the cross-cultural phenomenon" of Peek-a-Boo to illustrate the strong link between words and actions. "You don't need language to play it because it's about objects being hidden – something every child understands pre-linguistically. But interestingly, the hider tends to talk a lot by describing what's happening. Where did it go? Oh—there it is!" Parents are hardwired to communicate through words, Roy explains, so the game of Peek-a-Boo enables children to figure out what different speech acts mean. The child can learn the meaning of words because the words are embedded in meaningful contexts.
"We take that kind of analysis and apply it everywhere," he says. "We think of life as a series of multiple, overlapping games and try to discover how meaning emerges out of this unbelievably rich mesh of everyday activity."
One of the ultimate goals of the project is to expose a machine to the same natural environment in which Roy's son began to talk. If the machine can learn language from this data, it opens the door to more socially intelligent learning machines. Yet the implications of his research are more than just about teaching robots to talk. "There are some interesting trajectories emerging from the data analysis," he says. "From retail behavioral analysis to understanding learning disorders such as Autism, I envision game-changing new approaches that translate the speechome technologies into other areas and applications." In his new role as founding Director of the MIT Center for Future Banking, Roy is in a unique position to drive innovation in the financial services sector in part by leveraging this research.
As Roy gradually hones in on the most critical aspects of human communication, he says that the more data he collects, the greater the possibilities become. "We talk by bringing together the realms of symbols and physics," he says. "It's like Peek-a-Boo – it's all about interaction."