J Amjmed 2017 10 035
J Amjmed 2017 10 035
J Amjmed 2017 10 035
PII: S0002-9343(17)31117-8
DOI: https://doi.org/10.1016/j.amjmed.2017.10.035
Reference: AJM 14363
Please cite this article as: D. Douglas Miller, Eric W. Brown, Artificial Intelligence in Medical
Practice: the Question to the Answer?, The American Journal of Medicine (2017),
https://doi.org/10.1016/j.amjmed.2017.10.035.
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service
to our customers we are providing this early version of the manuscript. The manuscript will
undergo copyediting, typesetting, and review of the resulting proof before it is published in its
final form. Please note that during the production process errors may be discovered which could
affect the content, and all legal disclaimers that apply to the journal pertain.
Review Article
a
Corresponding Author: D. Douglas Miller, MD, Professor of Medicine, New York Medical College, 40
Sunshine Cottage Road, Valhalla, NY 10595; phone: 706-755-5365; email: [email protected]
b
Co-author: Eric W. Brown, PhD, Director, Foundational Innovations, IBM Watson Health, Yorktown
Heights, NY 10598
Conflict of Interest Statement: D. Douglas Miller, MD: None; Eric W. Brown, PhD: Employment
Verification Statement: All authors had access to the data and a role in writing the manuscript
Funding: None
Clinical Significance
Artificial intelligence (AI) medical image analysis achieves diagnostic speed exceeding and
accuracy paralleling experts.
AI will impact medical practice by applying natural language processing to ‘read’ the
expanding scientific literature and collate diverse electronic medical records.
Page 1 of 10
Machines learning directly from medical data could avert clinical errors due to human
cognitive biases, positively impacting patient care.
Because AI is neither astute nor intuitive, physicians will remain essential to cognitive
medical practice.
Abstract
Computer science advances and ultra-fast computing speeds find artificial intelligence (AI) broadly
benefitting modern society – forecasting weather, recognizing faces, detecting fraud, and deciphering
genomics. AI’s future role in medical practice remains an unanswered question. Machines (computers)
learn to detect patterns not decipherable using biostatistics by processing massive datasets (big data)
through layered mathematical models (algorithms). Correcting algorithm mistakes (training) adds to AI
predictive model confidence. AI is being successfully applied for image analysis in radiology, pathology,
and dermatology, with diagnostic speed exceeding and accuracy paralleling medical experts. While
diagnostic confidence never reaches 100%, combining machines plus physicians reliably enhances
system performance. Cognitive programs are impacting medical practice by applying natural language
processing to read the rapidly expanding scientific literature and collate years of diverse electronic
medical records. In this and other ways, AI may optimize the care trajectory of chronic disease patients,
suggest precision therapies for complex illnesses, reduce medical errors, and improve subject
enrollment into clinical trials.
Key Words: Artificial intelligence, neural networks, machine learning, deep learning, big data, analytics,
natural language processing, electronic medical record, chronic disease, precision medicine, medical
imaging
Numbers Games
In 1936, mathematician Alan Turing published On Computable Numbers, With an Application to the
Entscheidungsproblem, a paper later dubbed “the founding document of the computer age.”1 Turing’s
life was reprised in the 2014 film, The Imitation Game. Attempting to solve the Entscheidungsproblem,
Turing and his Princeton colleague, Alonzo Church, used calculus to define the concept of “effective
calculability.” Such intelligent human problem-solving became the basis of computational models called
algorithms.
Page 2 of 10
In 1943, neurophysiologist Warren McCulloch and mathematician Walter Pitts modeled brain neuronal
interactions using a simple neural network made of electrical circuits. The first computer research with
artificial neural networks was done in the 1950’s by Nathanial Rochester at International Business
Machines (IBM), and Bernard Widrow and Marcian Hoff at Stanford. Today’s computer scientists apply
multi-layered algorithms using a variety of artificial neural network configurations to solve complex
problems. Modern artificial neural networks represent one of the most active areas of artificial
intelligence (AI) research.
In 1964, television guru Merv Griffin invented Jeopardy!, America’s third longest running game show. In
2011, a supercomputer named for IBM’s first chief executive, Thomas J. Watson, used AI to defeat two
very intelligent humans in an exhibition match culminating with the correct response to this question:
“Which author's most famous novel was inspired by William Wilkinson's ‘An Account of the Principalities
of Wallachia and Moldavia’?" (Answer: Bram Stoker’s Dracula).
Answerable Questions
Some questions about AI’s role in modern society have been answered:
Why has AI emerged as useful in several diverse sectors (business, science, government)?
How do AI applications differ from smart technologies (medical devices, digital diagnostics, data
management systems) already used in medical practice?
While AI encompasses a wide range of symbolic and statistical approaches to learning and reasoning
(Figure 1), recent advances in algorithms, computational power, and access to large datasets have
enabled artificial neural networks to emerge as the leading AI method. Artificial neural networks are
flexible mathematical models that use multiple algorithms to identify complex nonlinear relationships
within large datasets (analytics). Machines learn when errors encountered in response to minor
algorithm modifications are corrected (training), progressively improving predictive model accuracy
(confidence)2.
Deep learning uses ultra-fast computing to rapidly optimize large multi-layered datasets organized in a
variety of configurations, including filter layers as convolutional neural networks and recursive layers as
recurrent neural networks. Deep learning has been applied commercially since the 1990’s3, and while
modern math is similar to that employed in the 1980’s, supercomputer speeds and Cloud networking
permit deconvolution of massive datasets. In 2006, Geoffrey Hinton introduced a novel method to train
Page 3 of 10
very deep neural networks by pre-training one hidden algorithm layer at a time using an unsupervised
ML procedure4, and Yoshua Bengio validated Hinton’s work with test data and used it with other
unsupervised techniques such as auto-encoders5.
Ten years later, deep learning modeling of big datasets exerts major influences on modern society –
from web searching to social media networking, and from fintech banking to facial recognition3.
Advanced algorithms achieve acceptable performance with ~5,000 data points per category, and exceed
human performance with datasets of >10 million labeled examples2. The bigger the dataset, the easier it
is for machines to learn (gain confidence) because the burden of standard bio-statistical estimation is
reduced2. Despite this, like human thinking, predictive model confidence never reaches 100%.
Works in Progress
Questions remain about the applicability, practicality and value of AI in medical practice:
How is AI use in medical practice distinguished from big data analytics applications for
healthcare delivery and population health?
Can AI address medical practice “pain points”, providing more efficient and efficacious care
while de-escalating physician burnout?
Can internet-of-things healthcare facilities and medical homes become a platform for safer,
higher quality, more connected patient care?
Simple neural networks have been used in medicine since the early 1990’s to interpret
electrocardiograms8, diagnose myocardial infarction9 and predict intensive care unit length-of-stay
following cardiac surgery10. AI’s scientific applications have proliferated, including image analysis
(radiographic, histologic), text recognition with natural language processing, drug activity design, and
prediction of gene mutation expression6,7. Recent AI applications provide proof-of-concept for AI use in
specialty medical practice, while projecting future utility in general medical practice.
A. Cognitive Diagnostics
Gene-chips are widely used to detect cancer cell gene expression. However, despite chips holding
diagnostic probes for 20,000 – 50,000 genetic features, noisy data and experimental limitations reduce
their clinical utility. Deep learning addresses this by reducing data diversity (dimensionality), and
applying layered auto-encoding analyses to train artificial neural networks to achieve more accurate
cancer detection and classification11.
Page 4 of 10
Histopathology of 1,417 skin images analyzed using deep learning architecture visual pattern analysis to
detect basal cell carcinoma and differentiate malignant from benign lesions outperformed prior
automated analyses, with diagnostic accuracy of >90% compared to experts12. Deep learning
histopathology identifies metastatic breast cancer in sentinel lymph node biopsies, with diagnostic
accuracy for tumor detection and localization similar to experts13. These systems train by comparing the
features of millions of tumor positive and negative histological patches, post-processing these data using
heat-maps to predict tumor probability. Combining pathologists and deep learning optimized
performance, reducing the human error rate by 85%.
Classification of 129,450 images of 2,032 malignant and benign skin diseases using multiple layered
algorithms trained to identify common deadly skin cancers, convolutional neural networks
outperformed 21 dermatologists at keratinocyte carcinoma and melanoma detection14.
AI analytics support the practice of precision medicine, especially in the difficult setting of chronic
diseases characterized by multi-organ involvement, erratic acute events, and long illness progression
latencies.
For >29 million Americans with diabetes, retinopathy is among the most debilitating complications.
Using 128,175 retinal photographs from 5,871 adults, two deep learning systems trained to detect and
grade diabetic retinopathy and macular edema achieved high specificities (98%) and sensitivities (87-
90%) for detecting moderately severe retinopathy and macular edema, compared to 54
ophthalmologists and senior residents15. The feasibility of this approach in medical practice and its
capacity to improve diabetes care and outcomes require validation.
Depression affects 6.8-8.7% of the adult U.S. population, resulting in 8 million annual ambulatory care
visits16. Primary care practices are not equipped to manage chronic depressive illnesses. Phenotypic
dimensionality and a paucity of objective depression activity markers may be addressable by applying
deep learning to MRI mapping of white matter neuronal water content17. Image heat-map pattern
recognition was 74% accurate for predicting major depressive disorder, with certain brain regions
contributing more to model confidence.
Congestive heart failure is a clinically and biologically diverse condition affecting 5.8 million Americans,
and 23 million worldwide18. Heart failure with preserved ejection fraction (HFpEF) is a phenotypically
heterogeneous condition influenced by numerous weak genetic factors, without proven therapies.
When supervised machine learning was applied to forty-six clinical variables from 397 HFpEF patients,
phenotypic heat-map clusters predicted patient survival more accurately than commonly employed risk
assessments19. AI approaches could identify HFpEF subsets or individuals that could benefit from
therapies that failed to show survival benefits in clinical trial cohorts.
Page 5 of 10
C. Electronic Medical Record (EMR) Applications
EMR’s are purported tools for documenting and sharing medical care information. EMR challenges
include lack of interoperability across technology platforms over time, and massive expansion of
structured and unstructured data elements. Natural language processing is an AI tool that ‘reads’ and
contextualizes different medical words and expressions in EMRs. Available products can accurately
compile and connect decades of accumulated diverse EMR data – history, physical, laboratory, imaging,
medications – in a user-friendly manner. IBM Watson generates accurate universal problem lists from
diverse EMR’s in seconds, while also compiling relevant medical literature in response to clinical
queries20. Deep learning modeling of EMR data memory can predict future illness trajectories and
medical outcomes, confidently predicting interventions and readmissions in two patient cohorts that
exert heavy economic and societal burdens – diabetes and mental health21.
Potential Jeopardies
Concerns about cognitive medical practice are largely the result of existing information deficits:
What non-medical barriers exist to the use of AI in direct patient care (reimbursement,
regulatory, etc.)?
Will AI put some physicians out of work (obsolescence) and/or reduce physician compensation
(relative value)?
Are physicians using AI at risk for skill erosion in diagnostic expertise, clinical acumen and/or
critical thinking?
Will younger tech-savvy learners and clinicians become early technology adopters, driving the
development of AI-infused cognitive practice?
Technology Insertion
Tracey Kidder’s 1982 Pulitzer Prize-winning book, The Soul of a New Machine, underscored how
imperfect humans remained critical to intelligent computer design. The current AI medical literature
reproducibly supports a widely held tautology – that collaborative human-machine tasking improves
performance over either alone. While AI’s technology displacement curve is paralleled by an
Page 6 of 10
opportunity curve, concerns abide that AI will dislocate highly skilled health professionals from their
jobs.
A tool is a device or implement used by humans for a particular function; tools are combined into
machines for industrial production. At the turn of the 20th century, combustion engines combined tools
to autonomously power vehicles over land. In 2016, global automobile sales increased to 88 million
units, with China leading all nations. Public health evidence indicates that fossil-fueled vehicles emit
multiple air pollutants, contributing to 2.1 million excess deaths in Asia alone between 1990 and 201022.
At the turn of the 21st century, mobile devices placed the data capture and analytics power of
computers into human hands. By 2018, average daily mobile device use for Internet access alone will
increase to 113 minutes per human. Research associates mobile device use to higher risks of cancer,
accidents, and medical device interference23. Just as machines have created unanticipated risks for
humans, there may be risks to AI use in medical practice.
The defense and aerospace industries often insert new or improved technology into an existing product
or system24. Associated process management challenges include, “platform modernization and achieving
the rapid fielding of the new technology”. But the primary impediment to successful technology
insertion is a lack of common understanding of the technology among key users.
Industrial technology insertion differs from new medical device or software regulation, under the aegis
of the Food and Drug Administration’s (FDA). Although FDA is establishing a digital health unit, U.S. and
European regulatory platforms are not yet equipped to oversee AI’s insertion into medical practice. It is
unclear whether the cost of using AI technologies in medical practice will be reimbursed by value-
conscious insurers.
Final Jeopardy! answer: An 1816 medical instrument invented by Dr. René Laennec to avoid patient
contact. Correct question: “What is the stethoscope?”
There are two reasons why medical schools still teach students to use a centuries old tool. The first is
that the stethoscope reveals diagnostic information helpful to patient care. The second is that the hand-
held device requires learners to physically contact the precordium, a connection that is both humanistic
of doctors and reassuring to patients. While experienced auscultators glean 75-80% of the information
generated by a Doppler-echocardiogram, best medical practices and third party reimbursement require
that humans use this simple tool before employing more modern machines.
Today’s cognitive machines have sophisticated sensors that capture big and little data, and generate
corrective computer models simulating a rudimentary human nervous system. In response to driver-
reported battery fires from road debris impact, Tesla Motors downloaded chassis height adjustments to
all of its smart vehicles to mitigate further risk.
Page 7 of 10
The daily practice of medicine is a game, of sorts, requiring repeated situational assessments, pattern
recognition based on case experience, and evidence-based risk-benefit adjustments. Mounting
performance pressures can prompt reliance on information processing shortcuts – heuristic thinking or
gaming cheats – to improve decision-making efficiency and workflow. Unfortunately, resulting cognitive
biases may foster clinical errors. Machines that learn directly from medical data could avert such human
cognitive biases, thereby contributing positively to patient care.
AI was not specifically developed as a tool for healthcare. And while AI is poised to address indurate
medical practice “pain points,” it is neither astute nor intuitive. So it is that humans will remain essential
to the intelligent use of AI in medical practice.
References
Page 8 of 10
Detection. In: Mori, K, Sakuma, I, Sato, Y, Barillot, C, Navab,N (Eds.) Medical Image Computing and
Computer-Assisted Intervention (MICCAI) 2013. Lecture Notes in Computer Science, Vol. 8150,
Springer, Berlin, Heidelberg.
13. Wang, D, Khosla, A, Gargeya, R, Irshad, H, Beck, AH. Deep Learning for Identifying Breast Cancer.
Proceedings of the International Society on Biomedical Imaging (ISBI), 2016.
14. Esteva, A, Kuprel, B, Novoa, RA, et al. Dermatologist-level Classification of Skin Cancer with Deep
Neural Networks. Nature (Letter) 2017; 542: 115-119.
15. Gulshan, V, Peng, L, Coram, M, et al. Development and Validation of a Deep Learning Algorithm for
Detection of Diabetic Retinopathy in Retinal Fundus Photographs. Journal of the American Medical
Association 2016; 316 (22): 2402-2410.
16. Bishop, TF, Ramsay, PP, Casalino, LP, Bao, Y, Pincus, HA, Shortell, SM. Care Management Processes
Used Less Often for Depression Than for Other Chronic Conditions in US Primary Care Practices.
Health Affairs 2016; 35(3):394-400.
17. Schnyer, DM, Clasen, PC, Gonzalez, C, Beevers, CG. Evaluating the Diagnostic Utility of Applying a
Machine Learning Algorithm to Diffusion Tensor MRI Measures in Individuals with Major Depressive
Disorder. Psychiatry Research: Neuroimaging 2017
http://dx.doi.org/10.1016/j.pscychresns.2017.03.003
18. Roger, VL. Epidemiology of Heart Failure. Circulation Research 2013; 113:646-659.
19. Deo, R.C. Machine Learning in Medicine. Circulation 2015; 132: 1920-1930.
20. Tsou, C-H, Devarakonda, M, Liang, JJ. Toward Generating Domain-specific / Personalized Problem
Lists from Electronic Medical Records. AAAI Fall Symposium, November 2015
https://www.aaai.org/ocs/index.php/FSS/FSS15/paper/viewFile/11733/11479
Accessed June 25, 2017
21. Pham, T, Tran, T, Phung, D, Venkatesh, S. Predicting Healthcare Trajectories from Medical Records:
A Deep Learning Approach. J. Biomedical Informatics 2017 http://doi.org/10.1016/j.jbi.2017.04.001
22. Lim, SS, Vos, T, Flaxman, AD, et al. A Comparative Risk Assessment of Burden of Disease and Injury
Attributable to 67 Risk Factors and Risk Factor Clusters in 21 Regions, 1990-2010: a Systemic
Analysis for the Global Burden of Disease Study 2010. The Lancet 2012
http://dx.doi.org?10.1016/S0140-6736(12)61766-8
23. Cell Phones and Cancer Risk. NIH National Cancer Institute. May, 2016 www.cancer.gov Accessed
May 10, 2017
24. Kerr, CIV, Phaal, R, Probert, DR. Technology Insertion in the Defense Industry: A Primer. Proceedings
of the Institution of Mechanical Engineers, Part B. Journal of Engineering Manufacture. September
2008 http://journals.sagepub.com Accessed May 15, 2017
Page 9 of 10
Figure 1. In the computation science universe, artificial intelligence (AI) is distinguished from standard
statistics and databases, but overlaps with knowledge discovery and data mining (KDD) methodologies
that extract useful insights from large datasets. The mathematics of pattern recognition (kernel
machines, cluster analysis) overlaps significantly with machine learning edge detection algorithms and
with neurocomputing based on artificial neural networks. The area of machine learning outside AI and
within statistics/pattern recognition is linear regression analysis.
Page 10 of 10