AI Risk Score On Screening (Odete 071123)

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

ORIGINAL RESEARCH • BREAST IMAGING

AI Risk Score on Screening Mammograms Preceding


Breast Cancer Diagnosis
Marthe Larsen, MSc • Camilla F. Olstad, MS • Henrik W. Koch, MD • Marit A. Martiniussen, MD •
Solveig R. Hoff, MD, PhD • Håkon Lund-Hanssen, MD • Helene S. Solli, MD • Karl Øyvind Mikalsen, PhD •
Steinar Auensen, MSc • Jan Nygård, PhD • Kristina Lång, MD, PhD • Yan Chen, PhD • Solveig Hofvind, PhD
From the Section for Breast Cancer Screening (M.L., C.F.O., S.H.) and Department of Register Informatics (S.A., J.N.), Cancer Registry of Norway, P.O. Box 5313, 0304
Oslo, Norway; D ­ epartment of Radiology, Stavanger University Hospital, Stavanger, Norway (H.W.K.); Faculty of Health Sciences, University of Stavanger, Stavanger, Norway
(H.W.K.); Department of Radiology, Østfold Hospital Trust, Kalnes, Norway (M.A.M.); Institute of Clinical Medicine, University of Oslo, Oslo, Norway (M.A.M.); Depart-
ment of Radiology, Ålesund Hospital, Møre og Romsdal Hospital Trust, Ålesund, Norway (S.R.H.); Department of Circulation and Medical Imaging, Faculty of Medicine
and Health Sciences, National University for Science and Technology, Trondheim, Norway (S.R.H.); Department of Radiology and Nuclear Medicine, St Olavs University
­Hospital, Trondheim, Norway (H.L.H.); Department of Radiology, Hospital of Southern Norway, Kristiansand, Norway (H.S.S.); SPKI–The Norwegian Centre for Clinical
Artificial Intelligence, University Hospital of North Norway, Tromsø, Norway (K.Ø.M.); Department of Clinical Medicine (K.Ø.M.) and Health and Care Sciences (S.H.),
Faculty of Health Sciences, UiT–The Arctic University of Norway, Tromsø, Norway; Department of Translational Medicine, Diagnostic Radiology, Lund University, Lund,
Sweden (K.L.); Unilabs Mammography Unit, Skåne University Hospital, Malmø, Sweden (K.L.); School of Medicine, University of Nottingham, Clinical Science Building,
­Nottingham City Hospital, Nottingham, United Kingdom (Y.C.). Received April 21, 2023; revision requested June 15; revision received August 17; accepted August 29.
­Address correspondence to S.H. (email: [email protected]).
Supported by the Norwegian Cancer Society and the Pink Ribbon Campaign (grant number: 214931).
Conflicts of interest are listed at the end of this article.
See also the editorial by Mehta in this issue.
Radiology 2023; 309(1):e230989 • https://doi.org/10.1148/radiol.230989 • Content codes:

Background: Few studies have evaluated the role of artificial intelligence (AI) in prior screening mammography.

Purpose: To examine AI risk scores assigned to screening mammography in women who were later diagnosed with breast cancer.

Materials and Methods: Image data and screening information of examinations performed from January 2004 to December 2019 as part
of BreastScreen Norway were used in this retrospective study. Prior screening examinations from women who were later diagnosed with
cancer were assigned an AI risk score by a commercially available AI system (scores of 1–7, low risk of malignancy; 8–9, intermediate
risk; and 10, high risk of malignancy). Mammographic features of the cancers based on the AI score were also assessed. The association
between AI score and mammographic features was tested with a bivariate test.

Results: A total of 2787 prior screening examinations from 1602 women (mean age, 59 years ± 5.1 [SD]) with screen-detected
(n = 1016) or interval (n = 586) cancers showed an AI risk score of 10 for 389 (38.3%) and 231 (39.4%) cancers, respectively, on the
mammograms in the screening round prior to diagnosis. Among the screen-detected cancers with AI scores available two screening
rounds (4 years) before diagnosis, 23.0% (122 of 531) had a score of 10. Mammographic features were associated with AI score for in-
vasive screen-detected cancers (P < .001). Density with calcifications was registered for 13.6% (43 of 317) of screen-detected cases with
a score of 10 and 4.6% (15 of 322) for those with a score of 1–7.

Conclusion: More than one in three cases of screen-detected and interval cancers had the highest AI risk score at prior screening,
suggesting that the use of AI in mammography screening may lead to earlier detection of breast cancers.
© RSNA, 2023

Supplemental material is available for this article.

B reast cancer is the most common cancer type in women


worldwide, accounting for 2.3 million new cancers
in 2020 and a predicted 3 million new cancers in 2040
Furthermore, Martiniussen et al (8) reported that approxi-
mately 10% of screen-detected and interval cancers were
discussed at a consensus meeting in the prior screening
(1). Despite reduced disease-specific mortality due to the round but were dismissed and women were not recalled
implementation of screening and improved treatment, for further assessments.
breast cancer is the second most common cause of cancer Artificial intelligence (AI) has shown potential in the field
death among women in developed countries (2). Standard- of radiology to reduce workload and increase diagnostic ac-
ized mammographic screening is recommended by several curacy (9). In mammographic screening, AI can be used as
­international health authorities (3). a stand-alone technique to triage examinations and/or as
More than 99% of screening examinations are deter- support for radiologists in their interpretations. The perfor-
mined to have a negative result (4,5). Because of the low mance of different AI algorithms is evaluated in both retro-
prevalence of the disease, the interpretation of the screen- spective and prospective studies (10–15). Several retrospec-
ing mammograms requires trained and experienced breast tive studies have reported AI risk scores for screen-detected
radiologists to keep sensitivity and specificity at acceptable cancers (16–19) or have reported the AI risk score of interval
levels. Retrospective review studies (6,7) have reported cancers for different AI systems (17,19–24). For example, a
that 20%–30% of screen-detected and interval cancers retrospective study (20) reviewing interval cancers showed
were classified as false-negative findings at prior screening. that 19% (n = 83) of 429 interval cancers had the highest AI
This copy is for personal use only. To order copies, contact [email protected]
AI Risk Score at Screening Mammography Preceding Breast Cancer Diagnosis

Abbreviations
AI = artificial intelligence, HER2 = human epidermal growth factor
receptor 2

Summary
More than 38% of both screen-detected and interval cancers were
assigned the highest artificial intelligence risk score on screening
mammograms that preceded breast cancer diagnosis.

Key Results
■ In this retrospective study of 1602 patients with breast cancer,
38.3% (389 of 1016) of screen-detected cancers and 39.4%
(231 of 586) of interval cancers had the highest malignancy risk
score assigned by artificial intelligence (AI) on the screening
mammogram prior to diagnosis.
■ Mammographic features were associated with an AI score of
10 (high risk) versus 1–7 (low risk) on prior mammograms for
screen-detected invasive cancers (P < .001).
■ Among invasive screen-detected cancers with an AI score of
10 and 1–7 at prior mammography, density with calcifications
was observed for 13.6% (43 of 317) and 4.6% (15 of 322),
respectively.

score and that AI was able to flag with the correct suspicious loca-
tion. However, few studies have assessed the AI risk scores on the
prior mammograms of screen-detected cancers (17,24).
With the aim of exploring the potential for earlier detection
of breast cancer, imaging data collected in BreastScreen Norway
were used and AI scores on the mammograms from screening Figure 1: Flowchart of exclusions and the final study sample A (all examina-
tions) and B (cancer cases and prior examination of cancer cases). * = The artificial
examinations preceding breast cancer diagnosis were analyzed.
intelligence (AI) system can process more and less images than the standard of four
­Furthermore, to determine whether the cases with a high AI score images. However, due to storage format of the mammograms, we had technical
on prior mammograms were of clinical relevance and had the issues with some images from mainly one breast center.
potential to be diagnosed earlier, prognostic histopathologic tu-
mor characteristics and mammographic features were analyzed. After excluding examinations performed after breast cancer
diagnosis and examinations in which fewer than four images
Materials and Methods were processed with the AI system, the overall study sample
This retrospective registry study included imaging data and (sample A) included 344 337 examinations, composed of 1929
screening information from BreastScreen Norway, which is screen-detected and 586 interval cancers (Fig 1). Results for all
­administered by the Cancer Registry of Norway (4). The study examinations were presented to give an overview of the AI per-
was approved by the Regional Committees for Medical and formance. In the analysis of AI scores of prior examinations for
Health Research Ethics (number 13294) and had a legal basis cancer cases, we excluded 338 958 examinations among women
in accordance with Articles 6 (1)(e) and 9 (2)(j) of the Gen- without breast cancer, 756 screen-detected cancers with prior
eral Data Protection Regulation. Pursuant to Section 35 of the examinations outside the study period or without prior exami-
Health Research Act, the Regional Committees for Medical nations, and 639 examinations that did not follow the biennial
Research Ethics has granted the project exemption from the re- screening scheme. Furthermore, we excluded information from
quirement of consent (25). 181 examinations that were 8 years or older (four or more screen-
ing rounds prior to diagnosis) because the data file included a
Study Sample limited number of women with mammograms that were 8 years
Examinations were identified from the Cancer Registry of or older. Ultimately, study sample B included 2787 prior screen-
­Norway, which is where information from all breast centers ing examinations, including 1733 prior screening examinations
and screening units in BreastScreen Norway are stored. Digital in 1016 women with screen-­detected cancer and 1054 prior
Imaging and Communications in Medicine data from 372 580 screening examinations in 586 women with interval cancer.
digital screening examinations performed from January 2004 to
December 2019 in five breast centers in BreastScreen Norway Imaging and Reading Procedure
were analyzed with an AI system (Fig 1). Results from two breast BreastScreen Norway invites biennially women aged 50–69
centers have been included in previous studies (23,26). A total years to two-view (craniocaudal and mediolateral oblique
of 122 969 examinations were included in these studies, but the view) mammography screening in each breast (4). From 2017
aim of these studies was overall AI performance and to explore to 2021, the rate of attendance was 75%, recalls were at 3.3%,
different clinical workflows for AI and radiologists. screen-detected cancer was at 0.64%, and interval cancer was

2 radiology.rsna.org ■ Radiology: Volume 309: Number 1—October 2023


Larsen et al

at 0.18% (21). Screening mammograms are independently version 1.7.0 (ScreenPoint Medical). For each examination, a
interpreted by two breast radiologists, and both radiologists malignancy risk score (AI score) from 1 to 10 was determined,
assign each breast a score from 1 to 5 (4). A score of 1 in- where 1–7 indicated a low risk of malignancy, 8–9 indicated in-
dicates normal findings; 2, probably benign; 3, intermediate termediate risk, and 10 indicated a high risk of malignancy. The
suspicion; 4, probably malignant; and 5, high suspicion of ma- AI score indicates risk of malignancy at the given screening ex-
lignancy. If either or both radiologists score 2 or higher, the amination. We showed previously (23) that 4.4% of the screen-
examination is discussed in a consensus meeting by the same detected and 30.2% of the interval cancers had an AI score of
or other radiologists. Consensus decides whether to recall the 1–7; 8.8% and 24.9% had AI scores of 8 and 9, respectively; and
woman. Recall assessment might include clinical examination, 86.6% and 44.9% had an AI score of 10. The AI system aims
additional imaging (mammography, US, and eventually MRI), to assign approximately 10% of the examinations to each score.
and needle biopsy.
In this retrospective study, the radiologists did not have AI
results available. All data regarding AI assessments were collected Table 1: Examination Characteristics
retrospectively.
Parameter Study Sample A Study Sample B
Variables of Interest Mean age at screening (y) 59.7 ± 5.7 59.7 ± 5.1
A screen-detected cancer was defined as a histologic analysis–­ Mean age at diagnosis, 60.9 ± 5.8 62.4 ± 5.0
verified ductal carcinoma in situ or invasive breast cancer screen-detected cancers (y)
diagnosed after a recall for further assessment due to mam- Mean age at diagnosis, 61.2 ± 5.9 61.2 ± 5.9
mographic findings and within 6 months after screening (4). interval cancers (y)
Interval c­ancer was defined as breast cancer detected after a No. of prevalent screening 46 087 (13.4) 281 (7.4)*
negative result at screening or more than 6 months after be- examinations
ing recalled with a negative result and within 24 months after Note.—Data are ± SDs unless otherwise indicated. Data in
screening. Screen-detected and interval cancer were considered parentheses are percentages. Study sample A is all examinations
the reference standard. (n = 344 337); study sample B includes women with breast cancer
Prognostic histopathologic tumor characteristics included (n = 1602) and screening examinations performed prior to cancer
diagnosis (n = 2787).
histologic type (ductal carcinoma in situ or invasive), tumor di-
* Prior examinations can be prevalent screening examinations
ameter, histologic grade 1–3, lymph node involvement, and im- only for interval cancer cases.
munohistochemical subtypes for invasive cancers. The subtypes
were classified as luminal
A–like, luminal B–like
human epidermal growth
factor receptor 2 (HER2)
negative, luminal B–like
HER2 positive, HER2
positive, and triple nega-
tive, and were deter-
mined based on estrogen
receptor, progesterone re-
ceptor, and HER2 status
(27). Information about
mammographic features
was reported by the ra-
diologists and classified
as mass, spiculated mass,
architectural distortion,
asymmetric density, den-
sity with calcifications,
and calcifications alone.

AI System
All women included in
the study were screened
with Mammomat Inspi-
ration (Siemens Health- Figure 2: Distribution of artificial intelligence (AI) scores for all examinations (n = 344 337), screen-detected cancers (n = 1929),
care). Examinations were and interval cancers (n = 586) for the full data set. An AI score of 1 indicates low risk of malignancy and an AI score of 10 indicates
analyzed with Transpara high risk of malignancy.

Radiology: Volume 309: Number 1—October 2023 ■ radiology.rsna.org 3


AI Risk Score at Screening Mammography Preceding Breast Cancer Diagnosis

Table 2: Artificial Intelligence Score on Prior Screening Examinations for Women with Screen-detected or Interval Cancer

No. of Screen-detected Cancers No. of Interval Cancers

AI Score P1 P2 P3 P1 P2 P3
1 62 (6.1) 60 (11.3) 19 (10.2) 40 (6.8) 30 (9.7) 17 (10.6)
2 40 (3.9) 25 (4.7) 9 (4.8) 21 (3.6) 10 (3.3) 6 (3.8)
3 43 (4.2) 33 (6.2) 13 (7.0) 24 (4.1) 15 (4.9) 8 (5.0)
4 50 (4.9) 37 (7.0) 16 (8.6) 23 (3.9) 20 (6.5) 14 (8.8)
5 57 (5.6) 29 (5.5) 12 (6.5) 33 (5.6) 31 (10.1) 10 (6.3)
6 62 (6.1) 42 (7.9) 20 (10.8) 27 (4.6) 19 (6.2) 17 (10.6)
7 72 (7.1) 45 (8.5) 19 (10.2) 41 (7.0) 26 (8.4) 13 (8.1)
8 96 (9.5) 52 (9.8) 18 (9.7) 40 (6.8) 30 (9.7) 20 (12.5)
9 145 (14.3) 86 (16.2) 29 (15.6) 106 (18.1) 55 (17.9) 18 (11.3)
10 389 (38.3) 122 (23.0) 31 (16.7) 231 (39.4) 72 (23.4) 37 (23.1)
Total 1016 (100) 531 (100) 186 (100) 586 (100) 308 (100) 160 (100)
Note.—Data in parentheses are percentages. P1 is the screening examination prior to diagnosis (≤2 years prior to diagnosis), P2 is the
screening examination two examinations prior to diagnosis (≤4 years prior to diagnosis), and P3 is the screening examination three
examinations prior to diagnosis (≤6 years prior to diagnosis).

The AI system uses convolutional neural networks to identify of 31 025) for an AI score of 10. The percentage of screen-
calcifications and soft tissue lesions, and is trained, validated, detected cancers with an AI score of 10 ranged from 86.4%
and tested on mammograms from four different vendors (28). (248 of 287) to 91.7% (341 of 372) among the five breast
Transpara version 1.7.0 does not include prior examinations in centers, whereas the percentage of interval cancers with an AI
the risk score assessment. score of 10 ranged from 35.4% (52 of 147) to 53% (32 of
60) (Table S1).
Statistical Analysis
Descriptive analyses were performed separately for study samples Characteristics of Study Sample B
A and B. Frequencies and percentages were presented to sum- For women with screen-detected cancer, 1016 examinations
marize AI findings. Histopathologic tumor characteristics and were performed at P1, 531 were performed 4 years prior to di-
mammographic features were presented only for invasive can- agnosis (hereafter, referred to as P2), and 186 were performed 6
cers with an AI score of 10 (high risk) and 1–7 (low risk) at 2 years prior to diagnosis (hereafter, referred to as P3). For women
years (screen-detected cancers) or less (interval cancers) prior to with interval cancers, 586 examinations were performed at P1,
diagnosis (hereafter, referred to as P1). Percentages were calcu- 308 examinations were performed at P2, and 160 examinations
lated, and percentages were calculated from nonmissing values. were performed at P3. Mean ages at diagnosis were 62.4 years ±
­Associations were tested with bivariate tests, with P > .05 in- 5.0 for screen-detected cancers and 61.2 years ± 5.9 for interval
dicating statistical significance. All analyses were performed by cancers (Table 1).
an author (M.L., with 5 years of experience) by using statistical
software (Stata; StataCorp). AI Scores on Prior Screening Mammograms of Screen-
detected Cancers
Results A total of 38.3% (389 of 1016) of the examinations at P1 had an
AI score of 10 and 23.7% (241 of 1016) had a score of 8 or 9 (Ta-
Characteristics of Study Sample A ble 2). Among the 389 screen-detected cancers with an AI score of
After excluding 3740 examinations performed after breast can- 10 at P1, 11 (2.8%) had an AI score of less than 10 at diagnosis. A
cer diagnosis and 24 492 examinations in which fewer than total of 27.2% (106 of 389) of the examinations with a score of 10
four images were processed with the AI system, study sample at P1 were discussed at the consensus meeting, whereas 22.6% (88
A included 344 337 examinations (Fig 1). Mean patient age of 389) were found to be normal after the consensus meeting and
was 59.7 years ± 5.7 (SD); 13.4% (46 087 of 344 337) were not recalled, and 4.6% (18 of 389) underwent recall assessment
prevalent examinations (Table 1). with a negative result. In comparison, 11.1% (43 of 386) of the
A total of 21.9% (75 547 of 344 337) of the examinations examinations with scores of 1–7 at P1 were discussed at consensus.
had an AI score of 1, whereas 9.0% (31 025 of 344 337) had Among the consensus cases with AI scores of 10, 83.0% (88 of
an AI score of 10 (Figs 2, S1). A total of 88.0% (1697 of 106) were dismissed at the consensus meeting, whereas 81% (71
1929) of the screen-detected cancers and 39.4% (231 of 586) of 88) of the dismissed cases were selected for consensus (an inter-
of the interval cancers had a score of 10 (Fig 2). The prob- pretation score of 2 or higher) by only one of the two radiologists.
ability of screen-detected cancer for examinations with an For the 531 women with screen-detected cancer and an AI
AI score of 1 was 0.01% (nine of 75 547) and 5.5% (1697 score available at P2, 23.0% (122 of 531) had an AI score of 10,

4 radiology.rsna.org ■ Radiology: Volume 309: Number 1—October 2023


Larsen et al

Figure 3: (A) Left craniocaudal (L-CC; right) and mediolateral oblique


(L-MLO; left) mammographic views in a 57-year-old woman with a screen-detected
invasive cancer. The tumor was histologic grade 2, lymph node negative, estrogen
receptor positive, and progesterone receptor positive. The tumor is shown (arrows).
(B) Left craniocaudal (right) and left mediolateral oblique (left) views from the
screening examination prior to diagnosis of cancer, 2 years before diagnosis.
(C) Left craniocaudal (right) and left mediolateral oblique (left) views from the screen-
ing examination occurring two examinations prior to diagnosis, 4 years before diag-
nosis. The artificial intelligence system gave a score of 10 (high risk of malignancy) at
diagnosis (A); at 2 years or less prior to diagnosis (B); and at screening examination
occurring two examinations prior to diagnosis, 4 years or less before diagnosis (C).

9–18 mm) and for cancers with a score of 1–7 was 11 mm


(IQR, 7–17 mm) (P = .02) (Table 3). Histologic grade 3 was
observed for 17.0% (56 of 329) of cancers with a score of 10
at P1 and for 24.8% (85 of 343) of cancers with score 1–7
at P1 (P = .03). The most frequent mammographic feature of
the invasive cases was spiculated mass, found in 41.3% (131
and 90 women had AI scores of 10 at both P1 and P2 (Table 2, of 317) of those with a score of 10 at P1 and 43.5% (140 of
Fig 3). At P3, 16.7% (31 of 186) had an AI score of 10. 322) of those with a score of 1–7 (Table 4). The association
between mammographic feature and AI score 10 versus 1–7 at
AI Scores on Prior Screening Mammograms of P1 was statistically significant for screen-detected cancers (P <
Interval Cancers .001). Density with calcifications was registered for 13.6% (43
For interval cancers, 39.4% (231 of 586) had an AI score of of 317) of screen-detected cases with a score of 10 and 4.6%
10 and 24.9% (146 of 586) had an AI score of 8 or 9 at P1 (15 of 322) for those with a score of 1–7.
­(Table 2). A total of 35.5% (82 of 231) of the interval cancers Among the interval cancers with an AI score of 10 at P1,
with an AI score of 10 at P1 were discussed at consensus, and 94.8% (219 of 231) were invasive, whereas 94.7% (198 of
among these cases, 42% (34 of 82) underwent recall assess- 209) of those with an AI score of 1–7 were invasive. No sta-
ment with a negative result, and 58% (48 of 82) were dismissed. tistically significant associations were observed for tumor char-
Among the dismissed cases, 85% (41 of 48) were selected for acteristics for invasive interval cancers with AI score 10 at P1
consensus by only one of the two radiologists. At P2 and P3, versus those with a score of 1–7 (Table 3). However, 30.5%
23.4% (72 of 308) and 23.1% (37 of 160) of the interval cancers (65 of 213) of the interval cancers with a score of 10 at P1 were
had an AI score of 10 (Table 2). histologic grade 3 and lymph node involvement was reported
for 37.3% versus 39.5% (75 of 190) and 30.3% (57 of 188)
Prognostic Histopathologic Tumor Characteristics and of cancers with a score of 1–7. For invasive interval cancers,
Mammographic Features the association between mammographic feature and AI score
Among the screen-detected cancers with an AI score of 10 and 10 versus 1–7 at P1 was not statistically significant (P = .22).
1–7 at P1, 85.4% (332 of 389) and 89.9% (347 of 386) were However, calcifications alone were observed for 7% (six of 87)
invasive, respectively. Median tumor diameter for screen-de-
­
of those with an AI score of 10 and for none of the cases with
tected cancers with an AI score of 10 at P1 was 13 mm (IQR, an AI score of 1–7 (Table 4).

Radiology: Volume 309: Number 1—October 2023 ■ radiology.rsna.org 5


AI Risk Score at Screening Mammography Preceding Breast Cancer Diagnosis

Table 3: Histopathologic Tumor Characteristics of Invasive Tumors

Screen-detected Cancers Interval Cancers

AI Score 10 AI Score 1–7 AI Score 10 AI Score 1–7


Parameter (n = 332) (n = 347) P Value* (n = 219) (n = 198) P Value*
Median diameter (mm)† 13 (9–18) 11 (7–17) .02 18 (12–27) 16 (11–25) .40
No. with NA 5 4 12 22
Histologic grade .03 .66
1 103 (31.3) 87 (25.4) 38 (17.8) 28 (14.7)
2 170 (51.7) 171 (49.9) 110 (51.6) 87 (45.8)
3 56 (17.0) 85 (24.8) 65 (30.5) 75 (39.5)
No. with NA 3 4 6 8
Lymph node involvement 66 (20.1) 65 (19.0) .70 79 (37.3) 57 (30.3) .79
No. with NA 4 4 7 10
Immunohistochemical .06 .12
subtypes
Luminal A–like 154 (51.3) 143 (45.7) 65 (34.4) 42 (25.5)
Luminal B–like, 89 (29.7) 93 (29.7) 53 (28.0) 49 (29.7)
HER2-negative
Luminal B–like, 35 (11.7) 45 (14.4) 37 (19.6) 25 (15.2)
HER2-positive
HER2-positive 11 (3.7) 6 (1.9) 17 (9.0) 16 (9.7)
Triple negative 11 (3.7) 26 (8.3) 17 (9.0) 33 (20.0)
No. with NA 32 34 30 33
Note.—Unless otherwise indicated, the data are numerators and the data in parentheses are percentages. Data are from screening
examinations performed 2 years or less prior to diagnosis. An artificial intelligence (AI) score of 10 indicates high risk of malignancy and
1–7 indicates low risk. HER2 = human epidermal growth factor receptor, NA = information not available.
* Overall association between each tumor characteristic variable and AI score (10 vs 1–7) were tested with bivariate test for continuous or
categorical outcome as appropriate.

Data in parentheses are IQRs.

Discussion study. In a study in which results from another AI system were


In our study, which included 344 337 examinations, we analyzed, it was reported that in the top 10% with the highest
found that 88.0% of the screen-detected and 39.4% of the risk score, 45% of screen-detected cancers were selected by the
interval cancers had a score of 10. When considering prior ex- AI system at the prior screening (17).
aminations for cancer cases, we found that the prior screening For interval cancers, we observed a larger proportion of
examinations (referred to as P1) were classified as high risk by cases with an AI score of 10 than what was reported in a
the artificial intelligence (AI) system (score of 10) in 38.3% study from Sweden (39.4% vs 33.3%, respectively) (20). An
of screen-detected cancers and 39.4% of interval cancers. In updated version of the AI system was used in our study (ver-
two screening rounds prior to diagnosis, 23.0% and 23.4% sion 1.7.0 vs 1.5.0). In a study in which version 1.6.0 was
of the screen-detected and interval cancers had an AI score used (21), it was reported that 37.5% of interval cancers were
of 10, respectively. Mammographic features were associated identified at 90% specificity.
with AI score (10 vs 1–7) at P1, 2 years prior to diagnosis Review studies have reported that screen-detected and in-
for invasive screen-detected cancers (P < .001). Density with terval cancers classified as missed had tumor characteristics
calcifications was registered for 13.6% (43 of 317) of screen- that were favorable compared with those classified as true-
detected cases with a score of 10 at P1 and 4.6% (15 of 322) negative findings (6,7). Missed cases might have visible find-
for cancers with an AI score of 1–7. ings suspicious for cancer at prior examinations and have the
Prior mammograms for 745 screen-detected cancers in a potential to be detected earlier. In our study, a higher propor-
­cancer-enriched sample were analyzed with the same commer- tion of grade 3 tumors (overall, P = .03) in those with an
cially available AI system as in our study (29). Of these cases, AI score of 1–7 versus 10 at P1 indicates less favorable tu-
41.9% had an AI score of 10 at P1. In our study, the corre- mor characteristics and might thus be true-negative findings.
sponding percentage was 38.3%. We observed a lower propor- However, larger tumor diameter (P = .02) for an AI score of
tion of examinations with an AI score of 10 than the expected 10 versus 1–7 at P1 indicated the opposite. The clinical rel-
proportion of 10%, and in the enriched sample (26), a higher evance of earlier detection for cases with a score of 10 at P1
­proportion was observed. This might explain the lower propor- thus remains unclear when histopathologic tumor character-
tion of screen-detected cancers with an AI score of 10 in our istics are considered.

6 radiology.rsna.org ■ Radiology: Volume 309: Number 1—October 2023


Larsen et al

Table 4: Mammographic Features for Invasive Cancers

Invasive Screen-detected Cancers Invasive Interval Cancers

AI Score 10 AI Score 1–7 AI Score 10 AI Score 1–7


Parameter (n = 332) (n = 347) P Value* (n = 219) (n = 198) P Value*
Mammographic <.001 .27
feature
Mass 23 (7.3) 78 (24.2) … 13 (14.9) 17 (21.8) …
Spiculated mass 131 (41.3) 140 (43.5) … 38 (43.7) 25 (32.1) …
Architectural distortion 12 (3.8) 11 (3.4) … 5 (5.8) 3 (3.9) …
Asymmetric density 58 (18.3) 53 (16.5) … 17 (19.5) 30 (38.5) …
Density with calcifications 43 (13.6) 15 (4.7) … 8 (9.2) 3 (3.9) …
Calcifications alone 50 (15.8) 25 (7.8) … 6 (6.9) 0 (0) …
Information not available 15 25 … 132 120 …
Note.—Data are numerators; data in parentheses are percentages. Data are from screening examinations acquired 2 years or less prior to
diagnosis. An artificial intelligence (AI) score of 10 indicates high risk of malignancy and 1–7 indicates low risk.
* Overall association between AI score (10 vs 1–7) and mammographic feature tested with bivariate test.

The results for mammographic features for invasive screen- authors; approval of final version of submitted manuscript, all authors; agrees to
ensure any questions related to the work are appropriately resolved, all authors;
detected cancers indicate calcifications alone or in combination literature research, M.L., H.W.K., M.A.M., H.L.H., S.H.; clinical studies, S.R.H.;
with density to be more common for screen-detected cancers statistical analysis, M.L., S.A., S.H.; and manuscript editing, all authors
with an AI score of 10 versus 1–7 at P1 (overall, P < .001). Poorer
survival rates in women with small (<15 mm) screen-detected Disclosures of conflicts of interest: M.L. No relevant relationships. C.F.O. No
invasive cancers manifesting as calcification and large (≥15 mm) relevant relationships. H.W.K. No relevant relationships. M.A.M. No relevant re-
lationships. S.R.H. No relevant relationships. H.L.H. No relevant relationships.
tumors manifesting as a density with calcifications have been re- H.S.S. No relevant relationships. K.Ø.M. No relevant relationships. S.A. No rel-
ported (30). It might be important to recall women with calci- evant relationships. J.N. No relevant relationships. K.L. No relevant relationships.
fications and a score of 10 for assessment because it may lead to Y.C. No relevant relationships. S.H. The Cancer Registry of Norway has a research
agreement with ScreenPoint Medical, Lunit, and Vara.
earlier detection of relevant cancers. However, the proportion of
calcifications and an AI score of 10 among disease-free women
also must be explored because it might influence the rate of false- References
positive screening results (recalled with a negative result). 1. Arnold M, Morgan E, Rumgay H, et al. Current and future burden of
breast cancer: Global statistics for 2020 and 2040. Breast 2022;66:15–23.
Our study had limitations. First, we used mammograms from 2. Lauby-Secretan B, Scoccianti C, Loomis D, et al. Breast-cancer
a single vendor, and we only included women from ­Norway, screening--viewpoint of the IARC Working Group. N Engl J Med
which limited the generalizability of the findings. Second, there 2015;372(24):2353–2358.
3. Ren W, Chen M, Qiao Y, Zhao F. Global guidelines for breast cancer
was a lack of knowledge about the correlation between the loca- screening: A systematic review. Breast 2022;64:85–99.
tion of the AI markings and the location of the cancer. This limi- 4. Bjørnson E, Holen ÅS, Sagstad S, et al. BreastScreen Norway: 25 years
tation may have led to an overestimation of our findings in favor of organized screening. Report No.: ISBN 978-82-93804-03-1. Cancer
Registry of Norway; 2022.
of the AI system. A review of the hot spot versus the location of 5. Blanks RG, Wallis MG, Alison RJ, Given-Wilson RM. An analysis of
the cancer is needed to understand this issue. screen-detected invasive cancers by grade in the English breast cancer
In conclusion, we found that more than one in three cases screening programme: are we failing to detect sufficient small grade 3 can-
cers? Eur Radiol 2021;31(4):2548–2558.
of screen-detected and interval cancers had an artificial intelli- 6. Houssami N, Hunter K. The epidemiology, radiology and biological char-
gence (AI) risk score of 10 at prior screening. This indicates a acteristics of interval breast cancers in population mammography screen-
potential of AI to detect breast cancer earlier, which could lead ing. NPJ Breast Cancer 2017;3(1):12.
7. Hovda T, Hoff SR, Larsen M, Romundstad L, Sahlberg KK, Hofvind S.
to less harmful treatment for the affected female patients. Review True and Missed Interval Cancer in Organized Mammographic Screen-
studies and prospective studies comparing location of AI mark- ing: A Retrospective Review Study of Diagnostic and Prior Screening
ings versus the location of the cancer is needed to further under- Mammograms. Acad Radiol 2022;29(Suppl 1):S180–S191.
8. Martiniussen MA, Sagstad S, Larsen M, et al. Screen-detected and interval
stand this issue. Furthermore, high AI score on mammograms breast cancer after concordant and discordant interpretations in a popu-
in women who are not diagnosed with breast cancer represent a lation based screening program using independent double reading. Eur
challenge that is important to consider. Radiol 2022;32(9):5974–5985.
9. Hosny A, Parmar C, Quackenbush J, Schwartz LH, Aerts HJWL. Artifi-
cial intelligence in radiology. Nat Rev Cancer 2018;18(8):500–510.
Acknowledgments: We thank all personnel involved in the collection of imaging 10. Freeman K, Geppert J, Stinton C, et al. Use of artificial intelligence for
data at the breast centers included in the study. image analysis in breast cancer screening programmes: systematic review
of test accuracy. BMJ 2021;374:n1872.
Author contributions: Guarantors of integrity of entire study, M.L., S.H.; study 11. Mammography Screening With Artificial Intelligence (MASAI). Clini-
concepts/study design or data acquisition or data analysis/interpretation, all authors; calTrials.gov. https://ClinicalTrials.gov/show/NCT04838756. Published
manuscript drafting or manuscript revision for important intellectual content, all December 13, 2022. Accessed March 28, 2023.

Radiology: Volume 309: Number 1—October 2023 ■ radiology.rsna.org 7


AI Risk Score at Screening Mammography Preceding Breast Cancer Diagnosis

12. Artificial Intelligence in Large-scale Breast Cancer Screening (ScreenTrust- 22. Byng D, Strauch B, Gnas L, et al. AI-based prevention of interval can-
Cad). Clinical­Trials.gov. https://ClinicalTrials.gov/show/NCT04778670. cers in a national mammography screening program. Eur J Radiol
Published March 14, 2023. Accessed March 28, 2023. 2022;152:110321.
13. Lång K, Josefsson V, Larsson AM, et al. Artificial intelligence-supported 23. Larsen M, Aglen CF, Lee CI, et al. Artificial Intelligence Evaluation of
screen reading versus standard double reading in the Mammography 122 969 Mammography Examinations from a Population-based Screen-
Screening with Artificial Intelligence trial (MASAI): a clinical safety analy- ing Program. Radiology 2022;303(3):502–511.
sis of a randomised, controlled, non-inferiority, single-blinded, screening 24. Park GE, Kang BJ, Kim SH, Lee J. Retrospective Review of Missed Cancer
accuracy study. Lancet Oncol 2023;24(8):936–944. Detection and Its Mammography Findings with Artificial-Intelligence-
14. Yoon JH, Strand F, Baltzer PAT, et al. Standalone AI for Breast Cancer Detec- Based, Computer-Aided Diagnosis. Diagnostics (Basel) 2022;12(2):387.
tion at Screening Digital Mammography and Digital Breast Tomosynthesis: 25. Lov om helseregistre og behandling av helseopplysninger (helseregister-
A Systematic Review and Meta-Analysis. Radiology 2023;307(5):e222639. loven). https://lovdata.no/dokument/NL/lov/2014-06-20-43. Published
15. Hickman SE, Woitek R, Le EPV, et al. Machine Learning for Workflow December 21, 2001. Accessed April 21, 2023.
Applications in Screening Mammography: Systematic Review and Meta- 26. Larsen M, Aglen CF, Hoff SR, Lund-Hanssen H, Hofvind S. Possible
Analysis. Radiology 2022;302(1):88–104. strategies for use of artificial intelligence in screen-reading of mammo-
16. Rodriguez-Ruiz A, Lång K, Gubern-Merida A, et al. Can we reduce the grams, based on retrospective data from 122,969 screening examinations.
workload of mammographic screening by automatic identification of Eur Radiol 2022;32(12):8238–8246.
normal exams with artificial intelligence? A feasibility study. Eur Radiol 27. Goldhirsch A, Winer EP, Coates AS, et al. Personalizing the treatment of
2019;29(9):4825–4832. women with early breast cancer: highlights of the St Gallen International
17. Dembrower K, Wåhlin E, Liu Y, et al. Effect of artificial intelligence-based Expert Consensus on the Primary Therapy of Early Breast Cancer 2013.
triaging of breast cancer screening mammograms on cancer detection Ann Oncol 2013;24(9):2206–2223.
and radiologist workload: a retrospective simulation study. Lancet Digit 28. Rodriguez-Ruiz A, Lång K, Gubern-Merida A, et al. Stand-Alone Arti-
Health 2020;2(9):e468–e474. ficial Intelligence for Breast Cancer Detection in Mammography: Com-
18. Lång K, Dustler M, Dahlblom V, Åkesson A, Andersson I, Zackrisson S. parison With 101 R ­ adiologists. J Natl Cancer Inst 2019;111(9):916–922.
Identifying normal mammograms in a large screening population using 29. Koch HW, Larsen M, Bartsch H, Kurz KD, Hofvind S. Artificial in-
artificial intelligence. Eur Radiol 2021;31(3):1687–1692. telligence in BreastScreen Norway: a retrospective analysis of a can-
19. Lauritzen AD, Rodríguez-Ruiz A, von Euler-Chelpin MC, et al. An Artifi- cer-enriched sample including 1254 breast cancer cases. Eur Radiol
cial ­Intelligence-based Mammography Screening Protocol for Breast Can- 2023;33(5):3735–3743.
cer: Outcome and Radiologist Workload. Radiology 2022;304(1):41–49. 30. Tabar L, Tony Chen HH, Amy Yen MF, et al. Mammographic tumor
20. Lång K, Hofvind S, Rodríguez-Ruiz A, Andersson I. Can artificial intel- features can predict long-term outcomes reliably in women with 1-14-mm
ligence reduce the interval cancer rate in mammography screening? Eur invasive breast carcinoma. Cancer 2004;101(8):1745–1759.
Radiol 2021;31(8):5940–5947.
21. Wanders AJT, Mees W, Bun PAM, et al. Interval Cancer Detection Using
a Neural Network and Breast Density in Women with Negative Screening
Mammograms. Radiology 2022;303(2):269–275.

8 radiology.rsna.org ■ Radiology: Volume 309: Number 1—October 2023

You might also like