Application of Computer-Aided Diagnosis On Breast Ultrasonography
Evaluation of Diagnostic Performances and Agreement
of Radiologists According to Different Levels of Experience
Eun Cho, MD, Eun-Kyung Kim, MD, PhD , Mi Kyung Song, MS, Jung Hyun Yoon, MD, PhD
ver the recent decades, breast ultrasonography (US) has
E-mail: [email protected]
trend, the American College of Radiology Breast Imaging
AUC, area under the receiver operating Reporting and Data System (BI-RADS) was released and recently
characteristic curve; BI-RADS, Breast Imag- updated.1 The ultrasonographic BI-RADS provides a sonographic lexi-
ing Reporting and Data System; CAD, com- con and final assessment categories, which have excellent diagnostic
puter-aided diagnosis; NPV, negative performances when applied to breast US in the differential diagnosis
predictive value; PPV, positive predictive
value; US, ultrasonography of breast masses and suggest standardized management for patients.2,3
In spite of the excellent performances reported by using the
doi:10.1002/jum.14332 ultrasonographic BI-RADS, the final assessments made for breast
masses by different performers are known to vary signifi- excision, or a combination thereof were included in this
cantly,4,5 mostly because of the multiple BI-RADS ultra- study. Among them, 7 women were excluded for non-
sonographic descriptors used for describing breast mass lesions seen on breast US (n 5 4), since BI-RADS
lesions and the subjectiveness of US. To increase the descriptors are hard to apply on these lesions, and lack
diagnostic accuracy of breast US, several additional ultra- of grayscale images of the targeted breast mass stored in
sonographic techniques have been developed and the US machine for S-Detect analysis (n 5 3). Only elas-
applied in clinical practice, such as elastography,6 auto- tograms and Doppler US images were stored for the 3
mated breast US,7,8 and computer-aided diagnosis cases with images not fit for the S-Detect analysis.
(CAD) systems.9–13 Among these additional imaging Finally, 119 breast masses in 116 women were included
modalities, CAD systems have been reported to enable in this study.
efficient interpretation, in which consistent improved
accuracy can be expected.14,15 S-Detect (Samsung Medi- Ultrasonographic Examinations
son, Co, Ltd, Seoul, Korea), a recently developed com- Ultrasonographic examinations were performed with a
mercially available imaging analysis program for breast 3–12-MHz linear transducer (RS80A with Prestige;
US, analyzes the morphologic features of a target breast Samsung Medison, Co, Ltd). Two staff radiologists
mass according to the BI-RADS ultrasonographic (J.H.Y. and E.-K.K.) with 7 and 19 years of experience in
descriptors and provides a final assessment, which is breast imaging, respectively, were involved in image
based on analysis of the ultrasonographic features.16 acquisition. Bilateral whole-breast ultrasonographic
This program may be useful in clinical practice, since it examinations were routinely performed, according to
can be used to provide a second opinion on tumor char- scanning protocols including representative transverse
acterization and guidance in deciding on patient treat- and longitudinal images of the breast masses with and
ment. In a recent study analyzing the diagnostic without calipers used for size measurements. The same
performances of S-Detect and a dedicated breast radiol- radiologist who had performed breast US proceeded
ogist,16 S-Detect had equivalent diagnostic performance with US-guided core needle biopsy. Both radiologists
and moderate agreement with the radiologist. The use- had access to the mammographic images obtained
fulness of CAD for breast US can differ according to the before US, images of ultrasonographic examinations per-
level of experience of the radiologist, as reported in a formed in the past, and medical records containing clini-
recent study,17 but the previous study used in-house cal information of the patients.
software that is not commercially available, which had its Image Review and Application of S-Detect
limitations in that the analytic results could differ accord- Ultrasonographic images of the 119 breast masses were
ing to different algorithms or models used for analysis. retrospectively reviewed and documented for data analy-
Therefore, the purpose of this study was to investigate sis by 2 breast radiologists (J.H.Y., radiologist 1; and
the clinical feasibility of S-Detect when applied to breast E.C., radiologist 2) with 7 and 1 years of breast imaging
US by comparing the diagnostic performances and experience, respectively. All observers were blinded to
agreement between S-Detect and radiologists with vari- the clinical information and pathologic results of each
ous degrees of experience in breast imaging. mass during image review. Breast masses were analyzed
according to the descriptors and final assessment catego-
Materials and Methods ries used in the fifth edition of the BI-RADS and final
assessments (Table 1).1 Calcifications were not analyzed
This study was a prospective study, and it was approved because of the limited data analytic ability of S-Detect
by the Institutional Review Board of Severance Hospital. for calcifications.16 The radiologists individually chose
Informed consent was obtained from all patients for the most appropriate term to describe each lesion for
study inclusion. each descriptor and made a final assessment accordingly.
The ultrasonographic BI-RADS final assessment catego-
Patients ries were made as follows: category 2 (benign), 3 (prob-
From December 2015 to March 2016, 126 breast ably benign), 4a (low suspicion of malignancy), 4b
masses in 123 consecutive women who were scheduled (moderate suspicion of malignancy), 4c (high suspicion
for breast US-guided core needle biopsy, surgical of malignancy), and 5 (highly suggestive of malignancy).
After image review by the radiologists, S-Detect was Data and Statistical Analyses
applied to the same image the radiologists used for gray- Histopathologic results from US-guided core needle
scale ultrasonographic feature analysis. A region of inter- biopsy, vacuum-assisted excision, or surgery were
est was either automatically or manually drawn along the regarded as the reference standards. Patients with a
border of the mass by S-Detect. Analytic results of S- diagnosis of high-risk lesions, such as atypical ductal
Detect, including the BI-RADS lesion descriptors and hyperplasia, intraductal papilloma, radial scar, and lobu-
final assessments, were immediately displayed and re- lar carcinoma in situ, were recommended to have further
corded for data analysis (Figure 1). After being informed surgical treatment, for which the final pathologic diagno-
of the final assessment made by S-Detect, each radiolog- sis was considered. Final histopathological diagnoses of
ist gave a final assessment for each breast mass, integrat- high-risk lesions were considered benign for the statisti-
ing the analytic results of S-Detect. cal analysis.
Because the final assessment data from S-Detect
were in a dichotomized form, “possibly benign” and
“possibly malignant,” final assessments from the radiol-
Table 1. Ultrasonographic Descriptors Used for Image Analysis ogists based on the BI-RADS were also divided into 2
Characteristic Descriptors groups for statistical analysis: positive assessments con-
Shape Oval, round, irregular
sisted of categories 4a to 5, and negative assessments
Orientation Parallel, nonparallel consisted of categories 2 and 3. Diagnostic performances
Margin Circumscribed, not circumscribed: of the individual radiologists, S-Detect, and the integra-
indistinct, angular, microlobulated, tion of S-Detect with each radiologist were analyzed and
Echo pattern Anechoic, hyperechoic, complex
compared, including sensitivity, specificity, positive pre-
cystic and solid, hypoechoic, dictive value (PPV), negative predictive value (NPV),
isoechoic, heterogeneous and accuracy. The generalized estimating equation
Posterior acoustic No posterior features, method was used for comparing the diagnostic perform-
features enhancement, shadowing,
ances between radiologists and S-Detect. The area under
the receiver operating characteristic curve (AUC) was
Figure 1. Representative image of setting the region of interest for S-Detect analysis in a 52-year-old woman with a diagnosis of cancer in her left
breast. The region of interest was set automatically along the margin of the breast mass for analysis (green line). After the region of interest was
set, the ultrasonographic features were automatically analyzed by S-Detect, and a final assessment was automatically visualized.
obtained and compared by the Delong method. j statis- excision, and 103 (86.6%) by US-guided core needle
tics were calculated to assess the agreement for ultraso- biopsy. Fourteen (11.8%) lesions were diagnosed as
nographic descriptors and final assessments among the benign according to ultrasonographic features without
radiologists, S-Detect, and the integration of each radiol- additional biopsy, among which 8 were considered
ogist and S-Detect. Estimation of the overall j was based benign, since they showed benign ultrasonographic fea-
on a study by Landis and Koch18: less than 0 indicated tures along with stability during imaging follow-up for
poor agreement; 0.00 to 0.20 indicated slight agreement; greater than 2 years, 1 being a cyst, and 5 proven as gal-
0.21 to 0.40 indicated fair agreement; 0.41 to 0.60 indi- actoceles on aspiration. The mean size of the malignant
cated moderate agreement; 0.61 to 0.80 indicated sub- masses was significantly larger than that of the benign
stantial agreement; and 0.81 to 1.00 indicated almost masses: 17.0 mm (range, 4.0–45.0 mm) and 11.0 mm
perfect agreement. The weighted least squares approach (range, 4.0–60.0 mm), respectively (P < .001).
of Barnhart and Williamson19 was used to compare j
coefficients. All statistical analyses in this study were Diagnostic Performances of the Radiologists
performed with SAS version 9.2 software (SAS Inc, Versus S-Detect
Cary, NC), P < .05 was considered to have statistical The distribution of breast masses with histopathologic
significance. diagnoses analyzed by the radiologists and S-Detect is
summarized in Table 2. There were no significant differ-
Results ences in S-Detect results between benign and malignant
masses for both radiologists (P > .05).
Of the 119 breast masses included in this study, 65 Table 3 summarizes the diagnostic performances of
(54.6%) lesions were benign, and 54 (45.4%) lesions the radiologists and S-Detect. The sensitivity and NPV
were malignant. The mean age 6 SD of the 116 women were significantly higher for both radiologists compared
included in this study was 48.5 6 12.2 years (range, 20– to S-Detect, whereas the specificity, PPV, and accuracy
83 years). The mean size of the breast masses was were higher for S-Detect (all P < .05). The AUCs of the
16.9 6 10.7 mm (range, 4–60 mm). Among the 119 radiologists were significantly higher than that of S-
breast masses, 39 (33.3%) lesions were diagnosed by Detect (0.887 and 0.901 compared to 0.815, respec-
surgery, 8 (6.7%) by US-guided vacuum-assisted tively; P 5 .023 and .004). When the results of S-Detect
Radiologist 1 Radiologist 2
Final Assessment S-Detect Benign Malignant Total Pa Benign Malignant Total Pa
Data are presented as number (percent) where applicable. NA indicates not applicable.
Comparison between each radiologist and S-Detect in each BI-RADS category analyzed by the radiologist.
were integrated, the specificity, PPV, and accuracy Figure 2. Receiver operating characteristic curves for the radiologists,
showed significant improvement compared to the per- S-Detect, and integration of the radiologists and S-Detect.
Table 3. Diagnostic Performances of the Radiologists With and Without Integration of S-Detect
S-Detect ranged from fair to substantial (j 5 0.31– levels of experience perform breast US, and the useful-
0.68). When the BI-RADS final assessments of the radi- ness of S-Detect may be different according to the level
ologists were dichotomously divided into benign (cate- of experience: for example, radiologists with less experi-
gories 2 and 3) and malignant (categories 4a–5), fair and ence may benefit more by using S-Detect.
moderate agreements were seen between the radiologists In our study, S-Detect showed significantly higher
and S-Detect (j 5 0.40 and 0.45, respectively). specificity, PPV, and accuracy compared to the radiolog-
Interobserver agreements for the shape, orientation, ists (all P < .001). When integrating the results of S-
and posterior features between the radiologists were sig- Detect, it led to significant improvements in specificity,
nificantly higher than those between the radiologists and PPV, and accuracy in both radiologists, similar to the
S-Detect (P < .05). For the final assessment categories, results of previous studies for CAD systems. In addition,
there were no significant differences in the agreement 24 of 26 (92.3%) and 21 of 23 (91.3%) benign breast
between the radiologists versus that between the individ- masses initially assessed as category 4a by the radiologists
ual radiologists and S-Detect (P > .05). were categorized as probably benign by S-Detect (Table
2). Based on our results, S-Detect could be used as an
Discussion additional tool with breast US regardless of the level of
experience the radiologist has and may be used to reduce
The ultrasonographic BI-RADS lexicon is widely used the number of unnecessary biopsies of benign breast
for breast lesion descriptions, but because of the subjec- masses. Although the specificity, PPV, and accuracy were
tive tendency of US, observer variability is inevitable, improved, the AUCs of the radiologists integrated with S-
and it can lead to inconsistent diagnoses among per- Detect had no significant differences compared to those
formers.2,5 Computer-aided diagnosis systems were of the radiologists alone (all P > .05). Both of the radiol-
recently applied to overcome the observer variability of ogists already had very high AUC values; therefore, little
breast US,20 as well as to improve the diagnostic per- was left to improve, which may have been the cause for
formances.14,15 S-Detect applies a novel feature extrac- no differences in AUCs after the integration of S-Detect.
tion technique and support vector machine classifier that Several reports have been published on applying dif-
categorizes breast masses into benign or malignant ferent types of CAD to breast US.16,17,22 These studies
according to the suggested feature combinations inte- commonly reported that the CAD systems enable
grated according to the contents of BI-RADS ultrasono- improvement in diagnostic performances of breast US,
graphic descriptors.21 In a recent study, Kim et al16 especially specificity and accuracy. Shen et al22 suggested
reported that S-Detect had significantly higher specific- that computer-aided classification systems could be help-
ity, PPV, accuracy, and AUC compared to a dedicated ful in assessing indeterminate category 4 cases. Wang
breast radiologist, with fair to substantial agreement in et al17 concluded that the inclusion of CAD was more
ultrasonographic feature analysis for breast masses. helpful for junior radiologists than the seniors, with
However, in clinical practice, radiologists with different greater improvement in the diagnostic performances in
Table 4. Interobserver Agreement for BI-RADS Sonographic Descriptors and Final Assessment Categories Among the Radiologists and
Shape 0.72 (0.62–0.83) 0.44 (0.30–0.58) 0.44 (0.30–0.58) <.001 <.001 .996
Orientation 0.72 (0.58–0.85) 0.55 (0.39–0.71) 0.68 (0.54–0.83) .024 .694 .053
Margin 0.35 (0.24–0.46) 0.28 (0.15–0.39) 0.31 (0.19–0.43) .316 .582 .656
Echo pattern 0.33 (0.18–0.48) 0.23 (0.03–0.44) 0.35 (0.19–0.51) .355 .883 .311
Posterior acoustic features 0.76 (0.66–0.86) 0.35 (0.22–0.49) 0.46 (0.32–0.61) <.001 .001 .037
Final assessment 0.57 (0.41–0.73) 0.40 (0.28–0.53) 0.45 (0.33–0.58) .050 .223 .392
