Picture yourself going to an unfamiliar supermarket for the first time. If you are a person who can see, you can simply look around to guide yourself and identify objects and obstacles. However, blind people must use other senses to find their way through a new space.
Soon, the blind might have some navigational help, thanks to Caltech researchers who have combined augmented reality hardware and computer vision algorithms to develop software that enables objects to "talk." Worn as a portable headset, the technology translates the optical world into plain English audio. The device could one day be made available in banks, grocery stores, museums, and other locations, to help blind people make their way through unfamiliar spaces.
The work was done in the laboratory of Markus Meister (PhD '87), Anne P. and Benjamin F. Biaggini Professor of Biological Sciences and executive officer for neurobiology, and is described in a paper appearing in the November 27 issue of the journal eLife. Meister is an affiliated faculty member of the Tianqiao and Chrissy Chen Institute for Neuroscience at Caltech.
"Imagine you are in a world where all the objects around you have voices and can speak to you," says Meister. "Wherever you point your gaze, the different objects you focus on are activated and speak their names to you. Could you imagine getting around in such a world, performing some of the many tasks that we normally use our visual system for? That is what we have done here—given voices to objects."
Led by graduate student Yang Liu, the team of scientists developed a system they call CARA, or the Cognitive Augmented Reality Assistant. CARA was developed for a wearable headset computer called a HoloLens, developed by Microsoft, that can scan a user's environment and identify individual objects such as a laptop or a table. With CARA, each object in the environment is given a voice and will "say" its name upon the user's command. The technology utilizes so-called spatialized sound, which causes objects to sound different depending on their location within a room. If the object is, for example, far to the left of the user, its voice will sound like it is coming from the left. Additionally, the closer the object, the higher the pitch of its "voice."
To avoid a cacophony of objects speaking at once, Liu and his team programmed CARA with several different modes. In the first, called spotlight mode, an object only says its name when the user is facing it directly. As the user turns their head to face various objects, the objects each say their name, and the pitch of the object's voice provides an auditory cue about its relative distance from the user. In this way, a vision-impaired user can "look around" to explore their environment. In a second mode, called scan mode, the environment is scanned from left to right with objects saying their names accordingly. The third mode is target mode, where the user can select one of the objects to talk exclusively and use that as a guide to navigate.
To test CARA's practical applications, the researchers designed a test route for blind volunteers through the Beckman Behavioral Biology building at Caltech. To prepare the task, the scientists first developed a route through the building and followed it while wearing the HoloLens, which scanned the environment and saved it to memory. Then, the blind volunteers were asked to navigate the route using CARA as a guide. As each volunteer began, a voice, seemingly emanating from a location ahead on the route, called out "Follow me," while also telling the user about stairs, hand rails, corners to turn, and the like. Led by CARA, all seven of the volunteers completed the task successfully on the first try.
CARA is still in its early stages, but will benefit from the rapid development of algorithms for computer vision. The researchers are already implementing new schemes for real-time identification of objects and pedestrians. Eventually, they hope that places like banks, hotels, and shopping malls will offer CARA devices for use by their blind customers.
In addition to their work with CARA, the team also developed a new "standardized test" to evaluate the performance of various assistive devices for the blind. The method provides a virtual reality environment that researchers anywhere in the world can use to benchmark their devices, without having to reconstruct real physical spaces.
The paper is titled "Augmented Reality Powers a Cognitive Assistant for the Blind." In addition to Liu and Meister, USC postdoctoral scholar Noelle Stiles (PhD '16) is a co-author. The work was funded by the Shurl and Kay Curci Foundation.