Computer algorithms have gotten much better at recognizing patterns, like specific animals or people’s faces, allowing software to automatically categorize large image collections. But we’ve come to rely on some things that computers can’t do well. Algorithms can’t match their image recognition to semantic meaning, so today you can ensure a human’s present by asking them to pick out images of street signs. And algorithms don’t do especially well at recognizing when familiar images are distorted or buried in noise, either, which has kept us relying on text-based CAPTCHAs, the distorted text used to verify a human is interacting with Web services.
Or we had relied on them ’til now, at least. In today’s issue of Science, a Bay Area startup called Vicarious AI describes an algorithm they created that is able to take minimal training and easily handle CAPTCHAs. It also managed general text recognition. Vicarious’ secret? They modeled the structure of their AI on information we’ve gained from studying how the mammalian visual cortex processes images.
Thinking visually
In the visual cortex, different groups of neurons recognize features like edges and surfaces (and others identify motions, which aren’t really relevant here). But rather than viewing a scene or object as a collection of these parts, the neurons start communicating among each other, figuring out by proximity which features are part of a single object. As objects are built up and recognized, the scene is built hierarchically based on objects instead of individual features.