Fourcade 2019

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

J Stomatol Oral Maxillofac Surg 120 (2019) 279–288

Available online at

ScienceDirect
www.sciencedirect.com

Original Article

Deep learning in medical image analysis: A third eye for doctors


A. Fourcade a,*, R.H. Khonsari b
a
Service de Chirurgie Plastique, Maxillo-faciale et Stomatologie, Centre Hospitalier de Gonesse, Gonesse, France
b
Service de Chirurgie Maxillo-Faciale et Chirurgie Plastique, Hôpital Necker–Enfants Malades, Assistance Publique–Hôpitaux de Paris, Centre de Référence
Maladies Rares MAFACE, Filière Maladies Rares TêteCou, Université Paris Descartes, Université de Paris, Paris, France

A R T I C L E I N F O A B S T R A C T

Article history: Aim and scope: Artificial intelligence (AI) in medicine is a fast-growing field. The rise of deep learning
Received 24 May 2019 algorithms, such as convolutional neural networks (CNNs), offers fascinating perspectives for the
Accepted 18 June 2019 automation of medical image analysis. In this systematic review article, we screened the current
Available online 26 June 2019
literature and investigated the following question: ‘‘Can deep learning algorithms for image recognition
improve visual diagnosis in medicine?’’
Keywords: Materials and methods: We provide a systematic review of the articles using CNNs for medical image
Deep learning
analysis, published in the medical literature before May 2019. Articles were screened based on the
Artificial intelligence
Neural network
following items: type of image analysis approach (detection or classification), algorithm architecture,
Image analysis dataset used, training phase, test, comparison method (with specialists or other), results (accuracy,
Systematic review sensibility and specificity) and conclusion.
Computer vision Results: We identified 352 articles in the PubMed database and excluded 327 items for which
performance was not assessed (review articles) or for which tasks other than detection or classification,
such as segmentation, were assessed. The 25 included papers were published from 2013 to 2019 and
were related to a vast array of medical specialties. Authors were mostly from North America and Asia.
Large amounts of qualitative medical images were necessary to train the CNNs, often resulting from
international collaboration. The most common CNNs such as AlexNet and GoogleNet, designed for the
analysis of natural images, proved their applicability to medical images.
Conclusion: CNNs are not replacement solutions for medical doctors, but will contribute to optimize
routine tasks and thus have a potential positive impact on our practice. Specialties with a strong visual
component such as radiology and pathology will be deeply transformed. Medical practitioners, including
surgeons, have a key role to play in the development and implementation of such devices.
C 2019 Published by Elsevier Masson SAS.

1. Introduction  the global rise of ‘‘big data’’, that is the constitution and the
analysis of very large databases;
1.1. Artificial intelligence and image analysis  the spectacular increase of the computing power of processors;
 the design of new deep learning algorithms.
Artificial intelligence (AI) in medicine is a fast-growing field,
generating hopes and raising perplexing issues. AI can be defined These technological advances allowed algorithms to process
as the ability for a computer to mimic the cognitive abilities of a images and sounds, and be integrated into the digital tools of
human being. AI corresponds to a large array of techniques. Among everyday life and into automated professional workflows.
them, machine learning is one of the most relevant approaches in The medical field faces a current massive growth in volume,
the medical field. Three converging technical progresses have complexity and heterogeneity of raw data. Effective and medically-
facilitated the medical applications of machine learning: focused big data analysis is a major issue in public health. AI offers
three promising perspectives in this field:
* Corresponding author. Service de Chirurgie Plastique, Maxillo-Faciale et
Stomatologie, Centre Hospitalier de Gonesse, 2, boulevard du 19 Mars 1962,
 risks prediction via correlations analyses;
95500 Gonesse, France.  genomic analysis and phenotype-genotype association studies;
E-mail address: [email protected] (A. Fourcade).  automation of medical image analysis.

https://doi.org/10.1016/j.jormas.2019.06.002
2468-7855/ C 2019 Published by Elsevier Masson SAS.
280 A. Fourcade, R.H. Khonsari / J Stomatol Oral Maxillofac Surg 120 (2019) 279–288

In this report, we focused on the third topic and investigated the


following question: ‘‘Can image recognition deep learning
algorithms improve medical visual diagnosis?’’ [1].

1.2. Historical background

The fundamentals of AI were formalized in the 1950s


[2]. Initially, two contrasting conceptions of AI were developed:

 cognitivism, that is the development of rule-based programs


referred to as expert systems [3,4] and;
 connexionism, corresponding to the development of naive
programs educated secondarily by data. Connexionism currently
dominates the world of AI.

Algorithms known as artificial neurons, organized into artificial


networks, have led to the design of programs that can be
‘‘educated’’ [5]. The first neural networks were designed in the
Fig. 2. Structure of a typical artificial neural network: the multi-layered Perceptron.
1960s. They consisted in sets of artificial neurons organized into
superimposed layers. After going through an input layer for
presentation purposes, data was analyzed via intermediate layers techniques, which were time consuming and less precise
and ended into an output layer that produced a result. The learning [10]. CNNs are based on a specific multi-layered architecture that
abilities of neural networks were exploited only in the 1990s, with can be trained by data (Fig. 2).
the initiation of ‘‘deep learning’’, which corresponded to the Each individual piece of input data corresponds to an image
development of very large multi-layered neural networks [6]. In associated with a label (for instance a medical diagnosis for a chest
addition to the development of big data analysis and to the X-ray). The process of training a CNN with labelled images is
increase in computation power, deep learning was boosted in the known as supervised learning. During training, the network
years 2010 due to the development of a certain type of neural analyzes the image, considered as a matrix composed of pixels
network known as Convolutional Neural Networks (CNN). CNNs of varying intensities, by transiting it through the different layers.
had specifically high performances in the field of pattern Some layers have the ability to modify the input data using
recognition. Most of the theoretical background of CNNs derives features extraction, and are thus able to produce abstract images
from the achievements of a French computer scientist in the 1980s, determined by the level of extracted features; others layers reduce
Yann LeCun, who created the LeNET network [7]. LeNet was an the data into vectors, that are finally interpreted as probabilities by
automated handwriting recognition algorithm, intended to be the last layers: ‘‘Is the image on the chest X-ray a lung cancer?’’, ‘‘Is
used by banks for reading checks. In 2010, major international AI this stain on the skin a melanoma?’’ (Fig. 3).
teams took part into the ‘‘ImageNET Challenge’’, a competition Initially, the naive network, before proper training, labels the
during which they had the task to classify millions of natural input pictures with a certain amount of error. Errors are then back-
images into thousands of different categories [8]. The current propagated [11] within the network, which will be able to correct
navigation devices embedded into autonomous cars are derived the characteristics of its neurons in order to match the proper label
from the networks designed by LeCun nearly forty years ago of the input image. The CNN thus extracts the features of the
(Fig. 1). images uploaded during the training process and keeps the
CNNs are inspired by the structure of the primary visual cortex memory of these features into its neural layers. After an efficient
[9]; image recognition proceeds from the automatic extraction of training process involving a large input dataset, the network will
the visual features of images. Automatic extraction is a break- be able to classify new un-labelled data and provide results as label
through when compared with traditional manual extraction categories.

Fig. 1. Concepts and theories in deep learning.


A. Fourcade, R.H. Khonsari / J Stomatol Oral Maxillofac Surg 120 (2019) 279–288 281

Fig. 3. Structure of a Convolutional Neural Network (CNN).

2. Material and methods the papers were produced by North American and Asian
teams: 12 papers were from the USA and Canada [12–
Here we provide a systematic review of the publications using 14,16,20,21,23,24,28,30,31,34] and 13 papers were written by
CNN technology for medical image analysis, available in the Asian teams [13,16–18,22,26,29–33,35,36]. Interestingly, interna-
National Library of Medicine database (PubMed). The search tional collaboration between Asian/South American teams and
equation was the following: (convolutional OR deep learning) AND European/North American teams allowed compiling very large
(classification OR detection) AND (image OR photography), filtered databases in 4 cases [13,16,26,31]. The assessment of the
for ‘‘Human studies’’ and ‘‘Title/Abstract’’ as search fields. characteristics of the networks (Table 3) showed that most
The selected articles were screened according to a standard grid teams used previously developed codes such as GoogleNet
containing the following items: [12,16,17,20,21,32,36] and AlexNet [19,21,29,33,36]. Most of the
25 networks were pre-trained [12,16–19,21,23–25,30–36], and
 aim of the study: detection or classification; most teams had included proofs of veracity for the labeling of the
 methods: network architecture, dataset, training, validation, images, based on standard methods depending on the dataset used
test, comparison method (with specialists or other); in the study, such as for instance histological confirmation
 results: accuracy, sensibility and specificity and; [12,25,27] or assessment by 1–4 specialists (Table 4) [14–16,18–
 conclusion. 22,24,28–30,32,34]. The quantity of included images (Table 5) was
variable and was not very informative given the variety of the
More precisely, within the methods section, the following data datasets: from 170 pictures of skins lesions used for melanoma
regarding network architecture were collected: detection [13] to 139,886 fundoscopy images used for diabetic
retinopathy diagnosis [16]. Only 5/25 papers compared the perfor-
 pre-training (yes/no); mances of the networks to medical professionals [12,15,16,32,34];
 network parameters; nevertheless, 13/25 studies compared the performances of the
 layers (convolution and pooling, fully connected and softmax) networks to traditional detection or classification techniques
and; [13,14,17,22,25–29,32–35]. The results of the quantitative assess-
 software. ment of performances of the networks in all studies were
satisfactory; 18 figures were provided by the authors: the lowest
Datasets were characterized using the following parameters: precision score was 0.75 [28] and the lowest sensibility was
0.70 [27]. All other values of precision, sensibility, specificity
 variety/veracity: origin and certification; and AUC (area under curve) were between 0.8 and 0.9
 volume: number, size, pre-training, augmentation, set distribu- [13,19,20,22,25,26,29] or over 0.9 [12,14–18,21,23,24,30–36]. Only
tion; 9/25 studies provided visualization methods in order to better
 speed (Graphic Processor Unit [GPU] type) (Fig. 4). understand the inner mechanisms of the network (Table 6)
[12,14,15,17,18,21,23,27,28].

3. Results 4. Discussion

We identified 352 articles in the PubMed database based on our Most of the studies included into this review have been
search equation between 01/01/2013 and 05/20/2019. We published within the last three years, and authors are from North
excluded 327 articles where CNNs performance was not assessed America and Asia. These demographics underline the need for
(review articles) or where tasks other than detection or classifica- European Countries to launch specific plans in order to promote
tion, such as segmentation, were assessed (Fig. 5). The 25 included medical IA. Several French initiatives are promising and may
papers were ordered according to medical specialties: dermatolo- contribute to build an EU medical AI community, such as the PaRis
gy [12,13], ophthalmology [14–18], cardiothoracic imaging [19– Artificial Intelligence Research InstitutE (PRAIRIE), and the
24], senology [25–27], oral and maxillofacial surgery [28–32], development of specific training programs at the University of
hepato-gastro-enterology [33–36]. Demographic data about the Paris (the largest medical school in Europe), with a dedicated
included papers and information on the journals in which international Master’s Degree, or at CentraleSupelec in association
the works were published are provided in Tables 1 and 2. These with Inria Saclay (two high-level engineering schools and research
results are summarized in Figs. 6 and 7, which showed a stable centers).
increase of the volume of publications on CNN performances in The choice of the network architecture is conditioned by
the last few years: 1 paper in 2013 [27], 8 papers in 2016 specific tasks. Nevertheless, most authors have used previously
[13,15,16,19,22,25,29,33], 6 papers in 2017 [12,14,20,21,26,28], developed networks already efficient on natural images, such as
5 papers in 2018 [17,23,24,30,34] and already 5 papers in 2019 ‘‘AlexNET’’ [37] and ‘‘GoogleNET’’ [38], especially in their pre-
[18,31,32,35,36]. Our results furthermore indicate that most of trained versions [39]. These networks easily run on softwares as
282 A. Fourcade, R.H. Khonsari / J Stomatol Oral Maxillofac Surg 120 (2019) 279–288

Fig. 4. Canvas for data extraction from the included articles.

‘‘Caffe’’ [40], ‘‘Theano’’ [41] or ‘‘Tensorflow’’ [42]. All of these  volume and variety. The training phase requires a significant
softwares are freely available online and are based on the number of images. Beyond the quantity, heterogeneous
widespread programing language ‘‘Python’’. databases increase robustness. Augmentation techniques (ro-
The data used in the 25 studies included macroscopic and tating – cutting – resizing the images) are often used to increase
microscopic images of different types: clinical photographs [12– volume and variety without decreasing the third V, veracity;
18,30–36], pathology slides [27], X-rays [20,21,23–26,28] and CT  veracity is conditioned by two quality criteria;
images [19,22,29]. Four criteria are often cited in the literature as  image quality – resolution, limited variability due to angle of
crucial elements for the design of reliable networks and are view, zoom or brightness – often increased by pre-treatment
referred to as the four ‘‘V’’: [43];
A. Fourcade, R.H. Khonsari / J Stomatol Oral Maxillofac Surg 120 (2019) 279–288 283

Fig. 5. Inclusion process for the systematic review.

 label quality, depending on the expertise of the practitioners AI has clearly shown its efficiency in several clinical visual tasks,
who created the database. Database review by a scientific but comparative clinical studies showing the integration of this
committee is thus recommended before training; technique into clinical workflows are still missing for most
 velocity, related to power of the processor. GPU, made of several applications. Nevertheless, both the robustness of the current
parallel cores, are well fitted for CNNs, specifically during the results and the potentially simple interfaces that can be designed
training phase when a lot of images are simultaneously using trained CNNs provide bases for straightforward, time-saving,
processed. reliable and practical applications. CNNs could then be conceived
as colleagues providing expert second opinions on tricky clinical
CNN training is a key step, for which specific technical skills are questions. In addition, CNNs are intrinsically not subjected to
required, in order to avoid over-fitting [44] on limited data, leading confusing factors such as tiredness, individual beliefs or hierarchi-
to issues for when using the network to analyze wider datasets. cal issues, and thus minimize inter- and intra-individual variabi-
Training thus requires evaluation and monitoring. lities when achieving a specific task.
Transparency is a fundamental parameter in medical AI. CNNs are However, adverse effects of the use of CNNs consist in
opaque structures often referred to as ‘‘black boxes’’. Some systems deskilling, phenomenon described during the automation of any
offer partial visualization techniques (heat-maps, probability maps) type of task. Another non-intended consequence of the clinical use
in order to provide some view of the inner functioning of the CNN. of CNNs could be an overreliance to the machine by the
The understanding of how these networks ‘‘work’’ is a relevant and practitioner. This could lead to the omission of crucial factors,
major challenge in medical AI. Nevertheless, using computer vision such as clinical data, that the machine does not integrate into its
devices that have ‘‘non-human’’ image analysis abilities is by itself decision-making process. Indeed, CNNs answer specific questions
potentially fruitful as hidden correlations between images can and sole visual criteria remain insufficient to perform a diagnosis
potentially be uncovered based on parameters that are not perceived in most cases. Some authors have suggested combining CNNs with
by the human brain, even by the brain of a trained expert. algorithms screening other categories of data (for instance clinical,
284 A. Fourcade, R.H. Khonsari / J Stomatol Oral Maxillofac Surg 120 (2019) 279–288

Table 1
Systematic review of deep learning applications in medical image analysis. Articles assessing detection and classification abilities of neural networks. Year of publication –
Journal – Impact factor.

Authors Year Journal Impact factor

Dermatology
Esteva et al. [12] 2017 Nature Medecine 41,577
Nasr-esfahani et al. [13] 2016 IEEE Engineering in Medicine and Biology Society 3.05
Ophtalmology
Gargeya et al. [14] 2017 Journal of American Academy of Ophthalmology 8.2
Grinsven et al. [15] 2016 IEEE Transactions on Medical Imaging 3.942
Gulshan et al. [16] 2016 Journal of American Medical Association 44.4
Ahn et al. [17] 2018 PLOS Medecine 11.675
Phan et al. [18] 2019 Japanese Journal of Ophtalmology 1.775
Thoracic
Anthimopoulos et al. [19] 2016 IEEE Transactions on Medical Imaging 3.942
Cicero et al. [20] 2017 Investigative Radiology 5.195
Lakhani et al. [21] 2017 Radiology 7.296
Li et al. [22] 2016 Computational and Mathematical Methods in Medicine 0.937
Zech et al. [23] 2018 PLOS Medecine 11.675
Taylor et al. [24] 2018 PLOS Medecine 11.675
Senology
Arevalo et al. [25] 2016 Computer Methods and Programs in Biomedicine 1.862
Jadoon et al. [26] 2017 Biomed Research International 2.476
Ciresan et al. [27] 2013 Medical Image Computing and Computer-Assisted Intervention –
O-M-F surgery
Arik et al. [28] 2017 Journal of Medical Imaging 1.109
Miki et al. [29] 2016 Computers in Biology and Medicine 1.836
Uthoff et al. [30] 2018 PLOS One 2.766
Gurovich et al. [31] 2019 Nature Medecine 41.577
Jeyaraj et al. [32] 2019 Journal of Cancer Research and Clinical Oncology 3.081
Gastro-enterology
Jia et al. [33] 2016 Engineering in Medicine and Biology Society 0.76
Urban et al. [34] 2018 Gastroenterology 20.877
Horie et al. [35] 2019 Gastrointestinal Endoscopy 5.369
Alaskar et al. [36] 2019 Sensors 2.475

Table 2
Systematic review of deep learning applications in medical image analysis. Articles assessing detection and classification abilities of neural networks. Origin – Team –
Objective.

Authors Origin Team Objective

Dermatology
Esteva et al. [12] California, USA Mixeda Skin lesion classification
Nasr-esfahani et al. [13] Isfahan, Iran/Michigan, USA Mixed Melanoma detection
Ophthalmology
Gargeya et al. [14] California, USA Mixed Diabetic retinopathy detection
Grinsven et al. [15] Nijmegen, Netherlands Engineers Retinal hemorrhages detection
Gulshan et al. [16] California, USA/Texas, USA/Madurai, India Mixed Diabetic retinopathy detection
Ahn et al. [17] Seoul, Korea Mixed Glaucoma detection
Phan et al. [18] Tokyo, Japan Mixed Glaucoma detection
Thoracic
Anthimopoulos et al. [19] Bern, Switzerland Engineers Interstitial lung disease classification
Cicero et al. [20] Toronto, Canada Mixed Thoracic disease classification
Lakhani et al. [21] Pennsylvania, USA Mixed Tuberculosis detection
Li et al. [22] Shenyang, China Mixed Pulmonary node detection
Zech et al. [23] California, USA Mixed Pneumonia detection
Taylor et al. [24] California USA Mixed Pneumothorax detection
Senology
Arevalo et al. [25] Bogota, Colombia/Aveiro, Portugal Mixed Lesion classification on mammograms
Jadoon et al. [26] London, UK/Islamabad, Pakistan Engineers Lesion classification on mammograms
Ciresan et al. [27] Lugano, Switzerland Engineers Mitosis detection on histological slices of breast tumors
O-M-F Surgery
Arik et al. [28] California, USA Mixed Anatomical point-of-interest detection and classification
Miki et al. [29] Gifu, Japan Mixed Teeth classification
Uthoff et al. [30] Arizona, USA/Bangalore, India Mixed Oral cancer detection
Gurovich et al. [31] Israel/Germany/USA Mixed Facial phenotyping of genetic disorders
Jeyaraj et al. [32] Sivakasi, India Engineers Oral cancer detection
Gastro-enterology
Jia et al. [33] Hong Kong, Hong Kong Engineers Gastrointestinal bleeding detection
Urban et al. [34] California, USA Mixed Polyps detection
Horie et al. [35] Japan Mixed Esophageal cancer detection
Alaskar et al. [36] Alkharj, Saudi Arabia/Liverpool, UK Engineers Esophageal and gastric ulcer detection
a
Team of engineers and medical doctors.
A. Fourcade, R.H. Khonsari / J Stomatol Oral Maxillofac Surg 120 (2019) 279–288 285

Fig. 6. Geographical origin of the included publications.

Fig. 7. Number of scientific publications per year related to medical imaging interpretation and deep learning.

biological, molecular). In brief, networks provide a certain decision  engineers: specific training programs for medical AI are
with a given probability of accurateness, which is then to integrate required, based on collaborations between academic hospitals,
into a wider array of diagnostic arguments. medical schools and engineering schools;
A key point of medical application of CNNs is that the process  entrepreneurs and investors: by partnering with medical
directly involves the patients themselves. Melanoma screening doctors, investors create the financial conditions required to
methods can be used as self-medicine devices, and more generally, hire competent engineers and benefit from powerful datacen-
CNNs can be seen as solutions improving management and ters and computation power;
prognosis by promoting earlier diagnosis and earlier treatment.  lawmakers: data protection institutions, such as the Commis-
This point raises the confidence patients will place into these sion Nationale de l’Informatique et des Libertés (CNIL) and the
devices: excessive or minimal confidence in CNNs may interfere Institut National des Données de Santé (INDS) in France
within the relationship between patients and doctors. supervise the use of clinical data and the application of these
The medical use of CNNs will only be successful via collabora- medical devices. Collaborations with the national medical
tion between several actors, each having a specific and fundamen- councils (Conseil National de l’Ordre des Médecins in France)
tal role in the process: will result in the formulation of ethical recommendation and
legal innovations.
 patients: they are the source of the data and their consent is With the help of the digitization of the medical data, all
required for its use; the actors involved in the design and practical application of
 practitioners: they define relevant medical questions for which CNNs will interact in a fruitful manner, thanks to the optimization
CNNs could be of use and collect the data, which they interpret of data collection, interpretation, storage, sharing and use. Multi-
and label. They are then crucial for the validation of the CNNs in a centric platforms will contribute to augment data volume and
clinical environment; veracity.
286 A. Fourcade, R.H. Khonsari / J Stomatol Oral Maxillofac Surg 120 (2019) 279–288

Table 3
Systematic review of deep learning applications in medical image analysis. Articles assessing detection and classification abilities of neural networks. Network – Training
step.

Authors Networks Pre-training Transfer-learning Software Hardware

Dermatology
Esteva et al. [12] GoogleNet Inception v3 Yes (Imagenet) Yes (fine tuning) Google TensorFlow –
Nasr-esfahani et al. [13] Personal No No – 2 GPU
Ophthalmology
Gargeya et al. [14] 5 No No – 2 CPU or iPhone
Grinsven et al. [15] OxfordNet No No – 1 GPU
Gulshan et al. [16] GoogleNet Inception v3 Yes (Imagenet) Yes (fine tuning) Google TensorFlow –
Ahn et al. [17] Personal and GoogleNet Inception v3 Yes Yes Google TensorFlow –
Phan et al. [18] VGG19, ResNet152 and DenseNet201 Yes Yes – –
Thoracic
Anthimopoulos et al. [19] Personal, AlexNet and VGG-Net AlexNet-PT AlexNet-PT Theano and Caffe 1 GPU
Cicero et al. [20] GoogleNet nception v3 No No Caffe 3 GPU
Lakhani et al. [21] GoogleNet Inception v3 and AlexNet Yes (Imagenet) and no Yes (fine tuning) and no Caffe 1 GPU
Li et al. [22] Personal No No – 2 CPU
Zech et al. [23] ResNet and DenseNet Yes Yes Pytorch –
Taylor et al. [24] VGG16/19, Xception, Inception, and ResNet Yes Yes Google TensorFlow 8 GPU
Senology
Arevalo et al. [25] Personal Both No Theano 1 GPU
Jadoon et al. [26] CNN-CT and CNN-WT No No – –
Ciresan et al. [27] Personal No No – 1 GPU
O-M-F Surgery
Arik et al. [28] Personal No No MATLAB –
Miki et al. [29] AlexNet No No Caffe 1 GPU
Uthoff et al. [30] VGG Yes Yes – –
Gurovich et al. [31] – Yes Yes – –
Jeyaraj et al. [32] GoogleNet Inception v3 Yes Yes – 1 GPU
Gastro-enterology
Jia et al. [33] AlexNet No No Caffe 1 GPU
Urban et al. [34] VGG and ResNET Yes Yes – 1 GPU
Horie et al. [35] Personal Yes Yes – –
Alaskar et al. [36] AlexNet and GoogleNet Yes Yes – –

Table 4
Systematic review of deep learning applications in medical image analysis. Articles assessing detection and classification abilities of neural networks. Dataset quality.

Authors Data type Dataset Veracity Size Pre-processing

Dermatology
Esteva et al. [12] Skin lesion photographs 21 Histological proof 299  299 No
Nasr-esfahani et al. [13] Skin lesion photographs 1 – 188  188 Yes
Ophtalmology
Gargeya et al. [14] Retinal fundus photographs 3 Labelling 512  512  3 Yes
Grinsven et al. [15] Retinal fundus photographs 2 Proofreading by 3 ophthalmologists 512  512 Yes
Gulshan et al. [16] Retinal fundus photographs 6 Proofreading by 54 ophthalmologists 299  299 No
Ahn et al. [17] Retinal fundus photographs 1 – 224  224 Yes
Phan et al. [18] Retinal fundus photographs 2 Proofreading by glaucoma expert 256  256 Yes
512  512
Thoracic
Anthimopoulos et al. [19] Computed tomography slices 2 Proofreading by radiologists 32  32 ROI
Cicero et al. [20] Chest X-ray 1 Proofreading by 2 radiologists 256  256 No
Lakhani et al. [21] Chest X-ray 4 Proofreading by 1 radiologist 256  256 No
Li et al. [22] Computed tomography slices 1 Proofreading by 4 radiologists 32  22 ROI
Zech et al. [23] Chest X-ray 3 – 224  224 Yes
Taylor et al. [24] Chest X-ray 1 Proofreading by 6 radiologists 512  512 Yes
Senology
Arevalo et al. [25] Mammogram images 1 Histological proof 150  150 Yes
Jadoon et al. [26] Mammogram images 4 – 128  128 Yes
Ciresan et al. [27] Histological slice photographs 1 Proofreading by pathologists 2084  2084 Mitosis centered images
O-M-F SURGERY
Arik et al. [28] Lateral cephalogram 3 POI placed by 2 specialists 81  81 POI centered images
Miki et al. [29] Axial slice CBCT 2 Manually defined ROI 227  227 Yes
Uthoff et al. [30] Oral lesion photographs 1 Proofreading by specialist – Yes
Gurovich et al. [31] Facial photographs 1 – – Yes
Jeyaraj et al. [32] Oral lesion photographs 1 Proofreading by specialist 250  250 Yes
Gastro-enterology
Jia et al. [33] Wireless capsule endoscopy images 1 – 240  240  3 No
Urban et al. [34] Colonoscopy images 1 Proofreading by 3 experts colonoscopists 224  224 Yes
Horie et al. [35] Endoscopic images 1 – – Yes
Alaskar et al. [36] Wireless capsule endoscopy images – 224  224 and 227  227 No
A. Fourcade, R.H. Khonsari / J Stomatol Oral Maxillofac Surg 120 (2019) 279–288 287

Table 5
Systematic review of deep learning applications in medical image analysis. Articles assessing detection and classification abilities of neural networks. Dataset size.

Authors Initial dataset Augmentation Training and validation dataset Test dataset
techniques

Dermatology
Esteva et al. [12] 129,405 Yes 127,463 1942
Nasr-esfahani et al. [13] 170 Yes ( 36) 4896 1224
Ophtalmology
Gargeya et al. [14] 77,348 Yes 75,137 1748 + 463
Grinsven et al. [15] 7879 Yes 5287 2592
Gulshan et al. [16] 139,886 No 128,175 9963 + 1748
Ahn et al. [17] 1542 Yes 1078 464
Phan et al. [18] 3312 Yes 75% 25%
Thoracic
Anthimopoulos et al. [19] 14,696 images from 120 CT scans Yes 13,646 1050
Cicero et al. [20] 35,038 Yes 32,586 2443
Lakhani et al. [21] 1007 Both 857 150
Li et al. [22] 62,492 images from 1010 CT scans No 54,680 7811
Zech et al. [23] 158,323 – 80% 20%
Taylor et al. [24] 13,292 (3107 pneumothorax) Yes 85% 15%
Senology
Arevalo et al. [25] 736 Yes ( 8) 442 294
Jadoon et al. [26] 2796 Yes ( 7) – –
Ciresan et al. [27] 300 images of mitosis from Yes 35 (66,000 mitosis and 15
50 histological slices 151 million non-mitosis)
O-M-F surgery
Arik et al. [28] 19 POI images from 400 X-ray Yes ( 25) 150 250
Miki et al. [29] 35,259 slices from 52 CBCT Yes 40 CBCT 12 CBCT
Uthoff et al. [30] 170 Yes ( 8) – –
Gurovich et al. [31] 17,000 No 502
Jeyaraj et al. [32] 500 – – –
Gastro-enterology
Jia et al. [33] 10,000 Yes 8200 1800
Urban et al. [34] 2000 coloscopy (8641 images) – 2000 20
Horie et al. [35] 9546 – 8428 1118
Alaskar et al. [36] 1875 – 421 105

Table 6
Systematic review of deep learning applications in medical image analysis. Articles assessing detection and classification abilities of neural networks. Results and
comparisons.

Authors Comparison to Comparison to Associated techniques Results Visualisation


specialists traditional techniques techniques

Dermatology
Esteva et al. [12] 21 dermatologists No No AUC 0.96 and 0.94 Yes
Nasr-esfahani et al. [13] No Yes No Precision 0.81 No
Ophthalmology
Gargeya et al. [14] No Yes No AUC 0.95 Yes (heat-map)
Grinsven et al. [15] 2 ophthalmologists No No AUC 0.97 Yes
Gulshan et al. [16] 15 ophthalmologists No No AUC 0.99 No
Ahn et al. [17] No Yes No AUC 0.93 Yes
Phan et al. [18] No No No AUC > 0.9 Yes (heat-map)
Thoracic
Anthimopoulos et al. [19] No No No Precision 0.856 No
Cicero et al. [20] No No No AUC between 0.85 and 0.96 No
Lakhani et al. [21] No No 2 CNN and radiologists + CNN AUC 0.98 Yes (heat-map)
Li et al. [22] No Yes No Precision 0.864 No
Zech et al. [23] No No Yes AUC 0,931 Yes (heat-map)
Taylor et al. [24] No No No AUC 0.94 No
Senology
Arevalo et al. [25] No Yes Yes AUC 0.860 No
Jadoon et al. [26] No Yes No AUC 0.855 No
Ciresan et al. [27] No Yes No Sn 0.70–Sp 0.88 Yes
O-M-F SURGERY
Arik et al. [28] No Yes No Precision 0.75 Yes
Miki et al. [29] No Yes No Precision 0.88 No
Uthoff et al. [30] No No No AUC 0.908 No
Gurovich et al. [31] No No No Accuracy 91% No
Jeyaraj et al. [32] Yes Yes Yes Accuracy 94,5% No
Gastro-enterology
Jia et al. [33] No Yes (2) Yes Precision 0.999 No
Urban et al. [34] Yes Yes No Accuracy 96,4% No
AUC 0,991
Horie et al. [35] – No No Sn 98% No
Alaskar et al. [36] No Yes No Accuracy 100% No

AUC: area under curve; Sp: specificity; Sn: sensitivity.


288 A. Fourcade, R.H. Khonsari / J Stomatol Oral Maxillofac Surg 120 (2019) 279–288

5. Conclusion [20] Cicero M, Bilbily A, Colak E, Dowdell T, Gray B, Perampaladas K, et al. Training
and validating a deep convolutional neural network for computer-aided
detection and classification of abnormalities on frontal chest radiographs.
CNNs are not replacement solutions for medical doctors, but will Invest Radiol 2017;52(5):281–7.
contribute to optimize routine tasks and thus have a potential [21] Lakhani P, Sundaram B. Deep learning at chest radiography: automated
classification of pulmonary tuberculosis by using convolutional neural net-
positive impact on our practice. Specialties with a strong visual works. Radiology 2017;284(2):574–82.
component such as radiology and pathology will be deeply [22] Li W, Cao P, Zhao D, Wang J. Pulmonary nodule classification with deep
transformed by CNNs but all the fields of medical and surgical convolutional neural networks on computed tomography images. Comput
Math Methods Med 2016;2016. Article ID6215085, 7 pages.
practice will be affected by this technology. The role of practitioners [23] Zech JR, Badgeley MA, Liu M, Costa AB, Titano JJ, Oermann EK. Variable
is key for the development and implementation of such devices. generalization performance of a deep learning model to detect pneumonia
Medical doctors currently have a historical chance to take part into a in chest radiographs: a cross-sectional study. Plos Med 2018;15(11):
e1002683.
scientific revolution by understanding deep learning, taking part in
[24] Taylor AG, Mielke C, Mongan J. Automated detection of moderate and large
the conception and evaluation of new devices but also by pneumothorax on frontal chest X-rays using deep convolutional neural net-
contribution to conceive a framework for the regulation of this works: a retrospective study. Plos Med 2018;15(11):e1002697.
new type of medical activity. [25] Arevalo J, Gonzalez F, Ramos-Pollan R, Oliveira J, Angel M, Guevara Lopez M.
Representation learning for mammography mass lesion classification
with convolutional neural networks. Comput Methods Programs Biomed
Disclosure of interest 2015;127:248–57.
[26] Jadoon MM, Zhang Q, Haq IU, Butt S, Jadoon A. Three-class mammogram
classification based on descriptive CNN features. Biomed Res Int 2017;2017.
The authors declare that they have no competing interest. Article ID 3640901, 11 pages.
[27] Ciresan C, Giusti A, Gambardella L, Schmidhuber J. Mitosis detection in breast
cancer histology images with deep neural networks. Med Image Comput
References Comput Assist Interv 2013;16(2):411–8.
[28] Arik SÖ, Ibragimov B, Xing L. Fully automated quantitative cephalometry using
[1] Fourcade A, Khonsari RH. Apprentissage profond : un troisième œil pour les convolutional neural networks. J Med Imaging 2017;4(1):1.
médecins. Université Paris-Est Créteil UPEC; 2017. [29] Miki Y, Muramatsu C, Hayashi T, Zhou X, Hara T, Katsumata A, et al. Classifi-
[2] McCarthy J, Minsky M, Rochester N, Shannon C. A proposal for the Dartmouth cation of teeth in cone-beam CT using deep convolutional neural network.
Summer Research Project on artificial intelligence; 1955. Comput Biol Med 2017;1(80):24–9.
[3] Miller R, Pople HJ, Myers J. Internist-1, an experimental computer- [30] Uthoff RD, Song B, Sunny S, Patrick S, Suresh A, et al. Point-of-care, smart-
based diagnostic consultant for general internal medicine. N Engl J Med phone-based, dual-modality, dual-view, oral cancer screening device with
1982;307(8):468–76. neural network classification for low-resource communities. Plos One
[4] Shortliffe E, Davis R, Axline S, Buchanan B, Green C, Cohen S. Computer-based 2018;13(12):e0207493.
consultations in clinical therapeutics: explanation and rule acquisition capa- [31] Gurovich Y, Hanani Y, Bar O, Nadav G, Fleischer N, Gelbman D, et al. Identifying
bilities of the MYCIN system. Comput Biomed Res 1975;8(4):303–20. facial phenotypes of genetic disorders using deep learning. Nat Med
[5] Rosenblatt F. The perceptron: a probabilistic model for information storage 2019;25(1):60–4.
and organization in the brain. Psychol Rev 1958;65(6):386–408. [32] Jeyaraj PR, Samuel Nadar ER. Computer-assisted medical image classification
[6] LeCun Y, Bengio Y, Hinton G. Deep learning. Nature 2015;521(7553):436–44. for early diagnosis of oral cancer employing deep learning algorithm. J Cancer
[7] LeCun Y, Boser B, Denker JS, Henderson D, Howard RE, Hubbard W, et al. Res Clin Oncol 2019;145:829.
Backpropagation applied to handwritten zip code recognition. Neural Comput [33] Jia X, Meng Q. A deep convolutional neural network for bleeding detection in
1989;1(4):541–51. wireless capsule endoscopy images. Conf Proc IEEE Eng Med Biol Soc
[8] Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L. ImageNet: a large-scale 2016;2016:639–42.
hierarchical image database; 2009 [Conference on computer vision and pat- [34] Urban G, Tripathi T, Alkayali T, Mittal M, Jalali F, Karnes W, et al. Deep learning
tern recognition]. localizes and identifies polyps in real time with 96% accuracy in screening
[9] Hubel D, Wiesel T. Receptive fields, binocular interaction and functional colonoscopy. Gastroenterology 2018;155:1069–78.
architecture in the cat’s visual cortex. Physiol 1962;160:106–54. [35] Horie Y, Yoshio T, Aoyama K, Yoshimizu S, Horiuchi Y, et al. Diagnostic
[10] Shen D, Wu G, Suk H-I. Deep learning in medical image analysis. Annu Rev outcomes of esophageal cancer by artificial intelligence using convolutional
Biomed Eng 2017;19(1):221–48. neural networks. Gastrointest Endosc 2019;89(1):25–32.
[11] Rumelhart D, Hinton G, Williams R. Learning representations by back-propa- [36] Alaskar H, Hussain A, Al-Aseem N, Liatsis P, Al-Jumeily D. Application of
gating errors. Nature 1986;323:533–6. convolutional neural networks for automated ulcer detection in wireless
[12] Esteva A, Kuprel B, Novoa RA, Ko J, Swetter SM, Blau HM, et al. Dermatologist- capsule endoscopy images. Sensors 2019;19(6):1265.
level classification of skin cancer with deep neural networks. Nature [37] Krizhevsky A, Sutskever I, Hinton G. ImageNet classification with deep convo-
2017;542:115–8. lutional neural networks. In: Neural information processing systems.
[13] Nasr-Esfahani E, Samavi S, Karimi N, Soroushmehr SMR, Jafari MH, Ward K, 2012;1097–105.
et al. Melanoma detection by analysis of clinical images using convolutional [38] Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z. Rethinking the inception
neural network. IEEE Eng Med Biol Soc 2016;2016:1373–6. architecture for computer vision. In: Computer vision and pattern recognition.
[14] Gargeya R, Leng T. Automated identification of diabetic retinopathy using deep 2015;2818–26.
learning. Ophthalmology 2017;124(7):962–9. [39] Shin H, Roth HR, Gao M, Lu L, Member S, Xu Z, et al. Deep convolutional neural
[15] Van Grinsven MJJP, Van Ginneken B, Hoyng CB, Theelen T, Clara IS. Fast networks for computer-aided detection: CNN architectures, dataset charac-
convolutional neural network training using selective data sampling: appli- teristics and transfer learning. IEEE Trans Med Imaging 2016;35(5):1285–98.
cation to hemorrhage detection in color fundus images. IEEE Trans Med [40] Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, et al. Caffe:
Imaging 2016;35(5):1273–84. convolutional architecture for fast feature embedding; 2014;675–8 [ACM
[16] Gulshan V, Peng L, Coram M, Stumpe MC, Wu D, Narayanaswamy A, et al. International conference on multi-media].
Development and validation of a deep learning algorithm for detection of [41] Bastien F, Lamblin P, Pascanu R, Bergstra J, Goodfellow I, Bergeron A, et al.
diabetic retinopathy in retinal fundus photographs. JAMA 2016;316(22): Theano: new features and speed improvements. Deep Learn Unsupervised
2402–10. Featur Learn 2012;1211:5590.
[17] Ahn JM, Kim S, Ahn KS, Cho SH, Lee KB, Kim US. A deep learning model for the [42] Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, et al. Tensorflow:
detection of both advanced and early glaucoma using fundus photography. large-scale machine learning on heterogeneous distributed systems. arXiv;
Plos One 2018;13(11):e0207982. 2016;1603 [04467].
[18] Phan S, Satoh S, Yoda Y, Kashiwagi K, Oshika T. Evaluation of deep convolu- [43] Kohli M, Prevedello LM, Filice RW, Geis JR. Implementing machine learning in
tional neural networks for glaucoma detection. Jpn J Ophthalmol 2019;63:276. radiology practice and research. AJR Am J Roentgenol 2017;208(4):754–60.
[19] Anthimopoulos M, Christodoulidis S, Ebner L, Christe A, Mougiakakou S. Lung [44] Caruana R, Lawrence S, Giles L. Overfitting in neural nets: backpropagation,
pattern classification for interstitial lung diseases using a deep convolutional conjugate gradient and early stopping; 2001;402–8 [International conference
neural network. IEEE Trans Med Imaging 2016;35(5):1207–16. on neural information processing systems].

You might also like