Updated: Feb 10, 2019
One of the hottest topics in Silicon Valley is “Machine Learning.” It is everywhere. As we wrote in a previous blog post, having a machine learning component to your software is no longer a nice to have, but a must have.
But the idea of machine learning has been around for much longer than most people realize. In 1952 Arthur Samuel, a researcher at IBM, created the first machine learning program using an IBM 701. The program allowed the 701 to learn how to play the game Checkers. The idea that a machine could learn was so revolutionary that IBM’s stock jumped 15% after the program was publicly demonstrated.
Since that simple beginning, machine learning has grown to encompass many types including (but not limited to):
Decision tree learning
Association rule learning
Artificial neural networks
Inductive logic programming
Support vector machines
Similarity and metric learning
Sparse dictionary learning
Rule-based machine learning
Learning classifier systems
Feature selection approach
Each has a specific strength (and weakness) and work better with differing applications.
One of the common machine learning approaches is called “Convolutional Neural Networks” (a type of artificial neural network). Convolutional networks are used in image recognition, video analysis, natural language processing and drug discovery.
CNNs are especially useful in image recognition. But they have a significant problem. CNNs are good at identifying different elements in an image, but they are not able to model spatial relationships internal to the image.
Look at this image. A CNN would be able to identify the mouth, the eyes and the nose, but not the relationship between the elements. Both images are “the same” to a CNN.
The problem is more significant when you want the software to be able recognize items in images from multiple viewpoints, such as recognizing a face from the left side, the right side, the front, from an elevated view point or from below. The human brain can look at the images below and understand that all of them are the Statue of Liberty. But traditional CNNs would fail at this task.
The typical solution to the problem is to increase the size of the training set – add images that show the same thing from many different angles.
In 2017, Geoffrey Hinton, known as the “Godfather of Deep Learning” and a researcher at Goolge Brain published two papers that introduced the concept of a Capsule Network. Without going into great detail, a Capsule Network is a specific type of CNN that adds another layer to the architecture of A CNN.
So why is everyone in the Deep Learning community talking about this? Early prototypes of capsule networks have reduced the error rate by as much as 45% compared to the previous state of the art. Hinton’s papers have triggered a vast amount of additional work and experimentation. There is still much work to be done before Capsule Networks become mainstream, but it is one of the hot topics in Silicon Valley right now.