On the 8th July, 2016 the British Machine Vision Association held a workshop titled: “Deep Learning In Computer Vision”. The event was chaired by Kai Arulkumaran and Dr Anil A. Bharath from BICV. Kai and Anil invited an impressive line up of speakers from the University of Cambridge, University of Oxford, University of Warwick, NVIDIA, Google DeepMind, Twitter Cortex, Facebook AI Research and Samim.io. The event was heavily oversubscribed, with only hundred places available and tickets selling out far before the event.
The event included talks and poster presentations. Antonia Creswell, also from BICV presented a poster titled: “Adversarial Training For Sketch Retrieval”, which applied deep generative models to solve a sketch retrieval problem. A report of the event is given below.
Deep learning is an exciting driving force behind state-of-the-art image recognition systems, and is now fully penetrating the field of computer vision. The aim of this meetup was to look further into the areas of computer vision that could be advanced by deep learning and how computer vision has advanced deep learning.
The event kicked off with a talk from Alison Lowndes – Deep Learning Solutions Architect at NVIDIA – about the hardware that makes deep learning possible – the GPU, as well as their SDK to support deep learning frameworks.
Deep learning not only requires processing power, but also large amounts of labeled data. Sander Dieleman – from Google DeepMind, explained how he won several Kaggle competitions classifying galaxies and plankton by using a series of innovative approaches to augment the datasets available. He also described the large ensemble of networks used to achieve the winning result, and further used test data as a regulariser – which was an interesting approach.
Unlabeled data is often easier and cheaper to acquire than labeled data. Soumith Chintala – Facebook researcher – introduced generative adversarial networks, which are able to learn concepts primarily from unlabeled data, and require very few labels to achieve good classification. This could allow deep learning to be applied to tasks that would otherwise not be possible due to lack of data.
It can take 20 years to design a new text font, but deep generative models can learn a design space that can be easily explored by anyone, making design accessible to a wider user base. Samim Winiger, founder of Samim.io showed us how generative models, specifically generative adversarial networks, can be used to learn to interpolate between images in high dimensions to create a smooth, explorable design space and find new designs. The design spaces described were not limited to fonts, but extended to music and patterns. This still left the question of how to visualise these high dimensional spaces in only 2 or 3 dimensions.
One of the missions of Twitter Cortex is to make sense of content, be it a photo, a tweet or a video. Content on Twitter is ever evolving, new concepts and ideas are born, and as a consequence the taxonomy required to make sense of content cannot be static. Nicolas Koumchatzky described the challenges and approaches involved in developing a taxonomy for understanding photographic content on Twitter and some of the software contributions Twitter has made to mitigate these challenges.
Deep learning can seem like a “black box”; visualising what’s going on in this “black box” can be highly insightful. Andrea Vedaldi from Oxford University, demonstrated visualisations that maximally activate neurons in deep networks, identifying concepts that networks learn. Visualisations reveal key features captured by the networks as well as when the network learns to be invariant. These visualisation techniques can be a powerful tool for understanding what a network is learning, and potentially how to improve learning.
Some visualisations show that trained CNNs learn a set of hierarchical features. Nick Kingsbury drew parallels between trained CNNs and his work on multi-layer filter banks. Using banks of pre-defined, lossless wavelet transforms at different scales throughout the network, to describe images.
Ben Graham from the University of Warwick turns the curse of dimensionality on its head, by taking advantage of datasets that contain spatially sparse examples, where the more dimensions increase the likelihood of a sparse sample. Ben explains how a sparse image can be more efficiently processed by only considering non-zero parts of the data, with applications in chemistry, biochemistry and robotics.
In the poster competition sponsored by NVIDIA and Cortexica, Wenzhe Shi from Twitter’s recently acquired deep learning startup, Magic Pony Technology, won first prize for his poster: Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network. Second prize went to Adeline Paiement from University of Bristol, for her poster: Skeleton-free Body Pose Estimation from Depth Images for Movement Analysis.