MIT researchers at the Computer Science and Artificial Intelligence Lab (CSAIL) have created a predictive AI that allows robots to link multiple senses in much the same way humans do.
“While our sense of touch gives us a channel to feel the physical world, our eyes help us immediately understand the full picture of these tactile signals,” writes Rachel Gordon, of MIT CSAIL. In robots, this connection doesn’t exist. In an effort to bridge the gap, researchers developed a predictive AI capable of learning to “see by touching” and “feel by seeing,” a means of linking senses of sight and touch in future robots.
Using a KUKA robot arm with a tactile sensor called GelSight (another MIT creation), the team recorded nearly 200 objects with a web cam. These included tools, fabrics, household products and other every day materials humans come into contact with regularly.
The team used the robotic arm to touch the items more than 12,000 times, breaking each of these video clips into static frames for further analysis. All told, researchers ended up with more than three million visual/tactile paired images in its dataset.
“By looking at the scene, our model can imagine the feeling of touching a flat surface or a sharp edge,” said Yunzhu Li, CSAIL PhD student and lead author on a new paper about the system. “By blindly touching around, our model can predict the interaction with the environment purely from tactile feelings.”
For humans, this comes naturally. We can touch an item once, even years prior, and have a sense of how it feels when we come into contact with it at a later date. In robots, it could help to reduce human input for mechanical tasks, such as flipping a switch, or deciding where the safest place to pick up a package is.
According to Li:
Bringing these two senses together could empower the robot and reduce the data we might need for tasks involving manipulating and grasping objects.
By referencing images from a dataset, future robotic arms — like those used to assemble automobiles or mobile phones, for example — could make on-the-fly predictions by comparing the object in front of it to those in the dataset. Once operational, the arm could easily identify the best place to lift, bend, or otherwise manipulate the object.
MIT’s current dataset was built on interactions within a controlled environment. The team hopes to improve on this by collecting data in more unstructured areas to further build on the dataset.