At Cubica, we use Deep Learning approaches extensively for a range of applications and have an active horizon scanning team who monitor the latest developments from the academic and industrial domains. We also conduct low-level research into Deep Learning algorithms and techniques, examining things like robustness to adversarial inputs, privacy, and speed and efficiency.
In this blog we will explore the historical context of Deep Learning and share one of the most challenging applications of Deep Learning we’ve worked on, Automated Lip Reading. Here, we developed a computer vision pipeline which can analyse and predict speech from silent video alone!
What we'll cover
What is Deep Learning?
Deep Learning continues to attract a great deal of hype, and this is well deserved, but it is important to have some perspective about what it is and what it can and cannot do. Deep Learning is an umbrella term for a group of machine learning algorithms that exploit deep Neural Networks (NNs), i.e. NNs with lots of “layers” – more on that in a moment.
As a branch of machine learning, Deep Learning can be considered to be a form of Artificial Intelligence, and there are some (basic) links to human neural anatomy. However, we don’t tend to use the term “Artificial Intelligence” at Cubica because it’s a massively over-hyped term that often doesn’t really mean anything, and the parallels with the human brain are somewhat loose.
At the moment, Deep Learning techniques are almost always applied in a task-specific manner; whilst they may form part of a future generalised Artificial Intelligence capability, there is no need to run to the hills from fear of machine consciousness just yet.
So, let’s focus on what Deep Learning really is – a set of amazingly powerful learning algorithms that can learn very complicated patterns.
Deep Learning algorithms can automatically learn multiple representations of your data at increasing levels of complexity and abstraction, to a level that has never been seen before. In doing so they help address one of (if not the most) significant challenges in Machine Learning – how to decide on “features” for your data.
This feature extraction process is often the biggest hurdle to building a successful approach; with the right features, its often the case that any reasonably high-performance classification technique will work well. Deep Learning algorithms have unlocked or massively advanced a huge number of hitherto almost impossible (or certainly extremely challenging) machine learning applications involving processing of images, video, audio and text. There are, of course, lots of different types and flavours of Deep Learning algorithms and we will mention a few of these in this blog.
It has been fascinating to see this area of machine learning research explode; it has opened up a wealth of new applications for machine learning to target, and it has armed an entire generation of practitioners with extremely powerful algorithms and tools. Many staff at Cubica have been working in the machine learning domain since before the advent of Deep Learning and remember the challenges faced with building object detection and classification algorithms with more conventional techniques – notably using manual or basic automated feature extraction methods. We still use traditional techniques where appropriate, as Deep Learning is not always the solution to every problem!
The History of Deep Learning, to present
Neural Network theory has been around since the 1800s. The famous “single layer perceptron”, arguably the first and most simple network was invented by Rosenblatt in 1959. The “multi-layer perceptron (MLP)” came to prominence in the mid-1980s; this sought to encode more complex representations within additional hidden layers in the network [image] where each layer is parameterised by a set of values or “weights”. A group of researchers including Geoff Hinton published a paper where they showed that multi-layer NNs could be effectively trained via a simple process called back-propagation where each layer could be trained from loss derivatives flowing from the layer above it. This made NNs capable of learning nonlinear functions and, in fact, it was shown that these networks could learn any function (the “Universal Approximation Theorem”).
However, despite these advances, NNs were soon rejected by much of the academic community because they proved very hard to train when scaled up. This was due to the “vanishing gradient problem” within the back-propagation algorithm. As a result, NNs went through a dark period in the 90s and other methods such as the Support Vector Machine (SVM) became more popular.
Deep Learning arrived properly in the 2000s/2010s following Geoff Hinton’s breakthrough paper in 2006. Hinton proposed a method that allowed multi-layered NNs to be trained effectively with an intelligent method for initialising the NN’s parameters or “weights”. This, along with modern parallel processing computing hardware (i.e. GPUs) and the availability of large datasets, fuelled research that led to practitioners finally being able to train very deep neural networks with many layers (some sources say a NN has to have at last 6 layers to qualify as “Deep”). In 2012, an image recognition competition saw the publication of another Hinton co-authored work where the ideas of “drop-out” (a technique for avoiding the “over-fitting” problem) and the Rectified Linear Unit (ReLu), removing the vanishing gradient problem, were presented. This approach set a new benchmark for the challenge.
These developments, as well as more recent advances heralded a new dawn in NNs and they have dominated much of the machine learning domain ever since, with groundbreaking results and super-human performance in various applications such as face recognition and object detection, as well as self-driving vehicles, and autonomous drones.
Types of Deep Learning Algorithms
We use a range of different types of Deep Learning algorithms at Cubica. A very common example is the Convolutional Neural Network (CNN); this is arguably the most famous Deep Neural Network. CNNs are designed for processing images; and combine layers of multiple learnable convolutional kernels, with pooling operations and non-linearities such as ReLu. They provide the backbone to state-of-the-art image classifiers, object detectors and segmentation networks.
Other Deep Learning algorithms include (but are in no way limited to):
- Auto-Encoders (unsupervised Deep NNs for encoding or compressing data)
- Recurrent Neural Networks (designed for processing temporal data)
- Geometric Deep Learning networks (designed for processing data with a non-grid structure, such as point clouds and graphs)
- Generative Adversarial Networks (GANs)
GANs involve training two networks, one acting as a generator (which learns to produce its own data – e.g. images) and one acting as a discriminator (which learns to discriminate real examples from generated examples). GANs are the basis for many recent “Deep Fake” techniques and are received a great deal of attention at the moment. Yann LeCunn (2016) described GANs as “the most interesting idea in the last ten years in Machine Learning”. Look out for a future blog post from us on GANs.
Deep Learning in Application: Automated Lip Reading from Cubica
A particularly interesting application of Deep Learning is Automated Lip Reading (ALR). This probably ranks as one of the hardest computer vision problems – trying to estimate what someone is saying purely from video imagery (i.e. no sound).
Here at Cubica, we’ve spent over 2 years researching and experimenting with this problem and has developed a full ALR pipeline that can analyse and predict speech from video alone. Our ALR pipeline starts with detecting faces in images, locating the mouth region and tracking it through the video. Then we extract the mouth movements and run them through a Deep Learning architecture which we have trained to recognise speech patterns from the movement of the mouth.
“Working on ALR has been really exciting – we have been indirectly competing with some very prominent university research groups and “big-tech” companies on this really challenging problem.
It’s been fascinating to see ALR technology evolve so fast in such a short period of time, and to be part of the community working on it. Cubica is also an environment where everyone can add value – everyone’s views are taken into account and it’s great to learn from everyone’s different experience and perspectives.”
Senior Engineer, Cubica
Society is already riding the wave of Deep Learning technology and we will probably see many more years of interesting applications and developments. The jury is out on what the next “big step” might be, but with active horizon scanning activities, and a team enthused to experiment with new developments, we’re confident we’ll be in the vanguard.
Cubica is a government-recognised expert in machine learning research and development. We build state-of-the-art algorithms, real-time software, and highly specialised turn-key systems for analysing and exploiting digital data and media content. Get in touch with us today to know more about our expertise, or explore our team’s open vacancies here.