Self-Supervised MultiModal Versatile Networks
29 Jun 2020We learn representations for vision, audio and language in a self-supervised manner using millions of videos from YouTube.
We learn representations for vision, audio and language in a self-supervised manner using millions of videos from YouTube.
We look at predicting the progression of an eye to the sight-threatening form of AMD within 6 months. We demonstrate a deep learning system that is able to outperform five out of six experts on this task and importantly also overcomes the high interobserver variability from the experts.
We demonstrate an AI system that can detect breast cancer better than human experts on a large, representative UK dataset and a dataset from the US, in the context of breast cancer screening programmes.
Beating previous state of the art in self-supervised learning for ImageNet by almost 3% absolute with less parameters (71.5% vs 68.6% top1) and outperforming purely supervised approaches in all data regimes.
We hierarchically stack discrete autoencoders to allow likelihood models to capture long-range structure in images. This allows us to generate realistic images at resolutions of 128x128 and 256x256 with autoregressive models, which had not been shown before!
We propose a model for segmenting the organs at risk in 3D Computed Tomography (CT) scans for the planning of radiotherapy treatment. Improvements in such a segmentation could result in improved and faster treatment.
We propose a two-stage architecture that consists of first mapping the original (noisy) 3D Optical Coherence Tomography (OCT) scan to multiple tissue-segmentation hypotheses, and consequently using a classification network on these tissue maps to infer diagnosis and referral probabilities. On these tasks we achieve expert-level results or better. One benefit of the two-stage architecture is that it allows for much quicker transfer to different device types, as demonstrated in the paper.
We propose some simple, plug-and-play operations for convolutional neural networks that allows them to be partially equivariant or invariant to rotations.
The past almost four months I have been competing in a Kaggle competition about diabetic retinopathy grading based on high-resolution eye images. In this post I try to reconstruct my progression through the competition; the challenges I had, the things I tried, what worked and what didn’t. This is not meant as a complete documentation but, nevertheless, some more concrete examples can be found at the end and certainly in the code. In the end I finished fifth of the almost 700 competing teams.