Big data can pilot self driving cars, suggest Netflix shows, and recommend Amazon products. But it still can’t make you healthier. At least not yet. While there is a drive to accelerate healthcare through model-driven medicine, medical data has several distinctive features that differentiate it from that of other disciplines. But the free release of the DeepLesion dataset could soon open the doors to a new era of deep-learning-based radiology. Here’s why:
Medical Deep Learning Activated
When you step into a CT (computer tomography) scanner, radiologists measure and mark meaningful clinical findings with an electronic bookmarking tool. In the case of lesions, these bookmarks help them keep tabs on growth of development. They contain numerous forms, including: lines, text, segmentation, arrows and measurements.
DeepLesion is essentially a huge dataset of harvested and categorized bookmarks. At the time of writing, it contains:
- 32,735 annotated lesions from more than 10,000 case studies
- 32,120 axial CT slices of 4,427 patients
This makes it the largest publicly available medical image dataset in the world, with practical uses ranging from the study of multi-class lesion detection to retrieval or segmentation, amongst others.
Announced in a paper named: “"DeepLesion: Automated mining of large-scale lesion annotations and universal lesion detection with deep learning”, the dataset is of course intended to be used for deep learning. In fact, the team behind DeepLesion has already developed a detector system based on their data. They hope it will become a powerful screening tool for radiologists in the future.
Through the power of deep learning, it is not impossible to imagine a time in which cancer detection, disease prevention or other medical diagnosis is performed with a higher success rate by AI-driven machines than by human experts.
Challenges of Gathering and Leveraging Medical Data
The release of DeepLesion highlights a real challenge in the healthcare industry: the problem of data silos. While datasets existed for individual lesion types (kidneys, bones, lung nodules or enlarged lymph nodes), it’s the first time that a multi-category dataset combines them all, potentially automating radiological diagnosis.
Still, medical data is notoriously hard to acquire, and even harder to use expertly for a number of reasons:
- Fear of data misuse
- Problem with patient anonymity
- Lack of data sharing incentives
- Absence of a universal protocol to acquire data
- Variety of sources (administrative, clinical, medical imaging etc…)
- Variety of scales (seconds to years)
- Private data hidden behind paywalls
There is a lot to battle here. Even if healthcare practitioners manage to get their hands on the data, there is no guarantee that will be good, and thus helpful. It is only by processing large, worldwide amounts of data like with the DeepLesion case that training sets can be optimised, scaled and leveraged for more efficient healthcare neural networks.
Open Source Datasets - Good for the Scientific Community and for You
Ronald Summers, lead author on the paper for DeepLesion, says he hopes “ the dataset will benefit the medical imaging area just as ImageNet benefited the computer vision area.”
ImageNet, the visual object recognition database is one of the largest open source, crowdsourced dataset available in the world, but it is by no means the only one. Containing more than 20 million categories today, there is no denying that its popularity helped kickstart the deep learning revolution, not to mention train countless algorithms that are currently employed in a wide range of private companies.
While our medical data is increasingly sold and resold as part of a multibillion dollar industry, it is easy to see how everyone could benefit from sharing it openly. Who knows, the deep-learning system that uses this free data could one day save your life in an emergency.