AI Risk Score On Screening (Odete 071123)
AI Risk Score On Screening (Odete 071123)
AI Risk Score On Screening (Odete 071123)
Background: Few studies have evaluated the role of artificial intelligence (AI) in prior screening mammography.
Purpose: To examine AI risk scores assigned to screening mammography in women who were later diagnosed with breast cancer.
Materials and Methods: Image data and screening information of examinations performed from January 2004 to December 2019 as part
of BreastScreen Norway were used in this retrospective study. Prior screening examinations from women who were later diagnosed with
cancer were assigned an AI risk score by a commercially available AI system (scores of 1–7, low risk of malignancy; 8–9, intermediate
risk; and 10, high risk of malignancy). Mammographic features of the cancers based on the AI score were also assessed. The association
between AI score and mammographic features was tested with a bivariate test.
Results: A total of 2787 prior screening examinations from 1602 women (mean age, 59 years ± 5.1 [SD]) with screen-detected
(n = 1016) or interval (n = 586) cancers showed an AI risk score of 10 for 389 (38.3%) and 231 (39.4%) cancers, respectively, on the
mammograms in the screening round prior to diagnosis. Among the screen-detected cancers with AI scores available two screening
rounds (4 years) before diagnosis, 23.0% (122 of 531) had a score of 10. Mammographic features were associated with AI score for in-
vasive screen-detected cancers (P < .001). Density with calcifications was registered for 13.6% (43 of 317) of screen-detected cases with
a score of 10 and 4.6% (15 of 322) for those with a score of 1–7.
Conclusion: More than one in three cases of screen-detected and interval cancers had the highest AI risk score at prior screening,
suggesting that the use of AI in mammography screening may lead to earlier detection of breast cancers.
© RSNA, 2023
Abbreviations
AI = artificial intelligence, HER2 = human epidermal growth factor
receptor 2
Summary
More than 38% of both screen-detected and interval cancers were
assigned the highest artificial intelligence risk score on screening
mammograms that preceded breast cancer diagnosis.
Key Results
■ In this retrospective study of 1602 patients with breast cancer,
38.3% (389 of 1016) of screen-detected cancers and 39.4%
(231 of 586) of interval cancers had the highest malignancy risk
score assigned by artificial intelligence (AI) on the screening
mammogram prior to diagnosis.
■ Mammographic features were associated with an AI score of
10 (high risk) versus 1–7 (low risk) on prior mammograms for
screen-detected invasive cancers (P < .001).
■ Among invasive screen-detected cancers with an AI score of
10 and 1–7 at prior mammography, density with calcifications
was observed for 13.6% (43 of 317) and 4.6% (15 of 322),
respectively.
score and that AI was able to flag with the correct suspicious loca-
tion. However, few studies have assessed the AI risk scores on the
prior mammograms of screen-detected cancers (17,24).
With the aim of exploring the potential for earlier detection
of breast cancer, imaging data collected in BreastScreen Norway
were used and AI scores on the mammograms from screening Figure 1: Flowchart of exclusions and the final study sample A (all examina-
tions) and B (cancer cases and prior examination of cancer cases). * = The artificial
examinations preceding breast cancer diagnosis were analyzed.
intelligence (AI) system can process more and less images than the standard of four
Furthermore, to determine whether the cases with a high AI score images. However, due to storage format of the mammograms, we had technical
on prior mammograms were of clinical relevance and had the issues with some images from mainly one breast center.
potential to be diagnosed earlier, prognostic histopathologic tu-
mor characteristics and mammographic features were analyzed. After excluding examinations performed after breast cancer
diagnosis and examinations in which fewer than four images
Materials and Methods were processed with the AI system, the overall study sample
This retrospective registry study included imaging data and (sample A) included 344 337 examinations, composed of 1929
screening information from BreastScreen Norway, which is screen-detected and 586 interval cancers (Fig 1). Results for all
administered by the Cancer Registry of Norway (4). The study examinations were presented to give an overview of the AI per-
was approved by the Regional Committees for Medical and formance. In the analysis of AI scores of prior examinations for
Health Research Ethics (number 13294) and had a legal basis cancer cases, we excluded 338 958 examinations among women
in accordance with Articles 6 (1)(e) and 9 (2)(j) of the Gen- without breast cancer, 756 screen-detected cancers with prior
eral Data Protection Regulation. Pursuant to Section 35 of the examinations outside the study period or without prior exami-
Health Research Act, the Regional Committees for Medical nations, and 639 examinations that did not follow the biennial
Research Ethics has granted the project exemption from the re- screening scheme. Furthermore, we excluded information from
quirement of consent (25). 181 examinations that were 8 years or older (four or more screen-
ing rounds prior to diagnosis) because the data file included a
Study Sample limited number of women with mammograms that were 8 years
Examinations were identified from the Cancer Registry of or older. Ultimately, study sample B included 2787 prior screen-
Norway, which is where information from all breast centers ing examinations, including 1733 prior screening examinations
and screening units in BreastScreen Norway are stored. Digital in 1016 women with screen-detected cancer and 1054 prior
Imaging and Communications in Medicine data from 372 580 screening examinations in 586 women with interval cancer.
digital screening examinations performed from January 2004 to
December 2019 in five breast centers in BreastScreen Norway Imaging and Reading Procedure
were analyzed with an AI system (Fig 1). Results from two breast BreastScreen Norway invites biennially women aged 50–69
centers have been included in previous studies (23,26). A total years to two-view (craniocaudal and mediolateral oblique
of 122 969 examinations were included in these studies, but the view) mammography screening in each breast (4). From 2017
aim of these studies was overall AI performance and to explore to 2021, the rate of attendance was 75%, recalls were at 3.3%,
different clinical workflows for AI and radiologists. screen-detected cancer was at 0.64%, and interval cancer was
at 0.18% (21). Screening mammograms are independently version 1.7.0 (ScreenPoint Medical). For each examination, a
interpreted by two breast radiologists, and both radiologists malignancy risk score (AI score) from 1 to 10 was determined,
assign each breast a score from 1 to 5 (4). A score of 1 in- where 1–7 indicated a low risk of malignancy, 8–9 indicated in-
dicates normal findings; 2, probably benign; 3, intermediate termediate risk, and 10 indicated a high risk of malignancy. The
suspicion; 4, probably malignant; and 5, high suspicion of ma- AI score indicates risk of malignancy at the given screening ex-
lignancy. If either or both radiologists score 2 or higher, the amination. We showed previously (23) that 4.4% of the screen-
examination is discussed in a consensus meeting by the same detected and 30.2% of the interval cancers had an AI score of
or other radiologists. Consensus decides whether to recall the 1–7; 8.8% and 24.9% had AI scores of 8 and 9, respectively; and
woman. Recall assessment might include clinical examination, 86.6% and 44.9% had an AI score of 10. The AI system aims
additional imaging (mammography, US, and eventually MRI), to assign approximately 10% of the examinations to each score.
and needle biopsy.
In this retrospective study, the radiologists did not have AI
results available. All data regarding AI assessments were collected Table 1: Examination Characteristics
retrospectively.
Parameter Study Sample A Study Sample B
Variables of Interest Mean age at screening (y) 59.7 ± 5.7 59.7 ± 5.1
A screen-detected cancer was defined as a histologic analysis– Mean age at diagnosis, 60.9 ± 5.8 62.4 ± 5.0
verified ductal carcinoma in situ or invasive breast cancer screen-detected cancers (y)
diagnosed after a recall for further assessment due to mam- Mean age at diagnosis, 61.2 ± 5.9 61.2 ± 5.9
mographic findings and within 6 months after screening (4). interval cancers (y)
Interval cancer was defined as breast cancer detected after a No. of prevalent screening 46 087 (13.4) 281 (7.4)*
negative result at screening or more than 6 months after be- examinations
ing recalled with a negative result and within 24 months after Note.—Data are ± SDs unless otherwise indicated. Data in
screening. Screen-detected and interval cancer were considered parentheses are percentages. Study sample A is all examinations
the reference standard. (n = 344 337); study sample B includes women with breast cancer
Prognostic histopathologic tumor characteristics included (n = 1602) and screening examinations performed prior to cancer
diagnosis (n = 2787).
histologic type (ductal carcinoma in situ or invasive), tumor di-
* Prior examinations can be prevalent screening examinations
ameter, histologic grade 1–3, lymph node involvement, and im- only for interval cancer cases.
munohistochemical subtypes for invasive cancers. The subtypes
were classified as luminal
A–like, luminal B–like
human epidermal growth
factor receptor 2 (HER2)
negative, luminal B–like
HER2 positive, HER2
positive, and triple nega-
tive, and were deter-
mined based on estrogen
receptor, progesterone re-
ceptor, and HER2 status
(27). Information about
mammographic features
was reported by the ra-
diologists and classified
as mass, spiculated mass,
architectural distortion,
asymmetric density, den-
sity with calcifications,
and calcifications alone.
AI System
All women included in
the study were screened
with Mammomat Inspi-
ration (Siemens Health- Figure 2: Distribution of artificial intelligence (AI) scores for all examinations (n = 344 337), screen-detected cancers (n = 1929),
care). Examinations were and interval cancers (n = 586) for the full data set. An AI score of 1 indicates low risk of malignancy and an AI score of 10 indicates
analyzed with Transpara high risk of malignancy.
Table 2: Artificial Intelligence Score on Prior Screening Examinations for Women with Screen-detected or Interval Cancer
AI Score P1 P2 P3 P1 P2 P3
1 62 (6.1) 60 (11.3) 19 (10.2) 40 (6.8) 30 (9.7) 17 (10.6)
2 40 (3.9) 25 (4.7) 9 (4.8) 21 (3.6) 10 (3.3) 6 (3.8)
3 43 (4.2) 33 (6.2) 13 (7.0) 24 (4.1) 15 (4.9) 8 (5.0)
4 50 (4.9) 37 (7.0) 16 (8.6) 23 (3.9) 20 (6.5) 14 (8.8)
5 57 (5.6) 29 (5.5) 12 (6.5) 33 (5.6) 31 (10.1) 10 (6.3)
6 62 (6.1) 42 (7.9) 20 (10.8) 27 (4.6) 19 (6.2) 17 (10.6)
7 72 (7.1) 45 (8.5) 19 (10.2) 41 (7.0) 26 (8.4) 13 (8.1)
8 96 (9.5) 52 (9.8) 18 (9.7) 40 (6.8) 30 (9.7) 20 (12.5)
9 145 (14.3) 86 (16.2) 29 (15.6) 106 (18.1) 55 (17.9) 18 (11.3)
10 389 (38.3) 122 (23.0) 31 (16.7) 231 (39.4) 72 (23.4) 37 (23.1)
Total 1016 (100) 531 (100) 186 (100) 586 (100) 308 (100) 160 (100)
Note.—Data in parentheses are percentages. P1 is the screening examination prior to diagnosis (≤2 years prior to diagnosis), P2 is the
screening examination two examinations prior to diagnosis (≤4 years prior to diagnosis), and P3 is the screening examination three
examinations prior to diagnosis (≤6 years prior to diagnosis).
The AI system uses convolutional neural networks to identify of 31 025) for an AI score of 10. The percentage of screen-
calcifications and soft tissue lesions, and is trained, validated, detected cancers with an AI score of 10 ranged from 86.4%
and tested on mammograms from four different vendors (28). (248 of 287) to 91.7% (341 of 372) among the five breast
Transpara version 1.7.0 does not include prior examinations in centers, whereas the percentage of interval cancers with an AI
the risk score assessment. score of 10 ranged from 35.4% (52 of 147) to 53% (32 of
60) (Table S1).
Statistical Analysis
Descriptive analyses were performed separately for study samples Characteristics of Study Sample B
A and B. Frequencies and percentages were presented to sum- For women with screen-detected cancer, 1016 examinations
marize AI findings. Histopathologic tumor characteristics and were performed at P1, 531 were performed 4 years prior to di-
mammographic features were presented only for invasive can- agnosis (hereafter, referred to as P2), and 186 were performed 6
cers with an AI score of 10 (high risk) and 1–7 (low risk) at 2 years prior to diagnosis (hereafter, referred to as P3). For women
years (screen-detected cancers) or less (interval cancers) prior to with interval cancers, 586 examinations were performed at P1,
diagnosis (hereafter, referred to as P1). Percentages were calcu- 308 examinations were performed at P2, and 160 examinations
lated, and percentages were calculated from nonmissing values. were performed at P3. Mean ages at diagnosis were 62.4 years ±
Associations were tested with bivariate tests, with P > .05 in- 5.0 for screen-detected cancers and 61.2 years ± 5.9 for interval
dicating statistical significance. All analyses were performed by cancers (Table 1).
an author (M.L., with 5 years of experience) by using statistical
software (Stata; StataCorp). AI Scores on Prior Screening Mammograms of Screen-
detected Cancers
Results A total of 38.3% (389 of 1016) of the examinations at P1 had an
AI score of 10 and 23.7% (241 of 1016) had a score of 8 or 9 (Ta-
Characteristics of Study Sample A ble 2). Among the 389 screen-detected cancers with an AI score of
After excluding 3740 examinations performed after breast can- 10 at P1, 11 (2.8%) had an AI score of less than 10 at diagnosis. A
cer diagnosis and 24 492 examinations in which fewer than total of 27.2% (106 of 389) of the examinations with a score of 10
four images were processed with the AI system, study sample at P1 were discussed at the consensus meeting, whereas 22.6% (88
A included 344 337 examinations (Fig 1). Mean patient age of 389) were found to be normal after the consensus meeting and
was 59.7 years ± 5.7 (SD); 13.4% (46 087 of 344 337) were not recalled, and 4.6% (18 of 389) underwent recall assessment
prevalent examinations (Table 1). with a negative result. In comparison, 11.1% (43 of 386) of the
A total of 21.9% (75 547 of 344 337) of the examinations examinations with scores of 1–7 at P1 were discussed at consensus.
had an AI score of 1, whereas 9.0% (31 025 of 344 337) had Among the consensus cases with AI scores of 10, 83.0% (88 of
an AI score of 10 (Figs 2, S1). A total of 88.0% (1697 of 106) were dismissed at the consensus meeting, whereas 81% (71
1929) of the screen-detected cancers and 39.4% (231 of 586) of 88) of the dismissed cases were selected for consensus (an inter-
of the interval cancers had a score of 10 (Fig 2). The prob- pretation score of 2 or higher) by only one of the two radiologists.
ability of screen-detected cancer for examinations with an For the 531 women with screen-detected cancer and an AI
AI score of 1 was 0.01% (nine of 75 547) and 5.5% (1697 score available at P2, 23.0% (122 of 531) had an AI score of 10,
The results for mammographic features for invasive screen- authors; approval of final version of submitted manuscript, all authors; agrees to
ensure any questions related to the work are appropriately resolved, all authors;
detected cancers indicate calcifications alone or in combination literature research, M.L., H.W.K., M.A.M., H.L.H., S.H.; clinical studies, S.R.H.;
with density to be more common for screen-detected cancers statistical analysis, M.L., S.A., S.H.; and manuscript editing, all authors
with an AI score of 10 versus 1–7 at P1 (overall, P < .001). Poorer
survival rates in women with small (<15 mm) screen-detected Disclosures of conflicts of interest: M.L. No relevant relationships. C.F.O. No
invasive cancers manifesting as calcification and large (≥15 mm) relevant relationships. H.W.K. No relevant relationships. M.A.M. No relevant re-
lationships. S.R.H. No relevant relationships. H.L.H. No relevant relationships.
tumors manifesting as a density with calcifications have been re- H.S.S. No relevant relationships. K.Ø.M. No relevant relationships. S.A. No rel-
ported (30). It might be important to recall women with calci- evant relationships. J.N. No relevant relationships. K.L. No relevant relationships.
fications and a score of 10 for assessment because it may lead to Y.C. No relevant relationships. S.H. The Cancer Registry of Norway has a research
agreement with ScreenPoint Medical, Lunit, and Vara.
earlier detection of relevant cancers. However, the proportion of
calcifications and an AI score of 10 among disease-free women
also must be explored because it might influence the rate of false- References
positive screening results (recalled with a negative result). 1. Arnold M, Morgan E, Rumgay H, et al. Current and future burden of
breast cancer: Global statistics for 2020 and 2040. Breast 2022;66:15–23.
Our study had limitations. First, we used mammograms from 2. Lauby-Secretan B, Scoccianti C, Loomis D, et al. Breast-cancer
a single vendor, and we only included women from Norway, screening--viewpoint of the IARC Working Group. N Engl J Med
which limited the generalizability of the findings. Second, there 2015;372(24):2353–2358.
3. Ren W, Chen M, Qiao Y, Zhao F. Global guidelines for breast cancer
was a lack of knowledge about the correlation between the loca- screening: A systematic review. Breast 2022;64:85–99.
tion of the AI markings and the location of the cancer. This limi- 4. Bjørnson E, Holen ÅS, Sagstad S, et al. BreastScreen Norway: 25 years
tation may have led to an overestimation of our findings in favor of organized screening. Report No.: ISBN 978-82-93804-03-1. Cancer
Registry of Norway; 2022.
of the AI system. A review of the hot spot versus the location of 5. Blanks RG, Wallis MG, Alison RJ, Given-Wilson RM. An analysis of
the cancer is needed to understand this issue. screen-detected invasive cancers by grade in the English breast cancer
In conclusion, we found that more than one in three cases screening programme: are we failing to detect sufficient small grade 3 can-
cers? Eur Radiol 2021;31(4):2548–2558.
of screen-detected and interval cancers had an artificial intelli- 6. Houssami N, Hunter K. The epidemiology, radiology and biological char-
gence (AI) risk score of 10 at prior screening. This indicates a acteristics of interval breast cancers in population mammography screen-
potential of AI to detect breast cancer earlier, which could lead ing. NPJ Breast Cancer 2017;3(1):12.
7. Hovda T, Hoff SR, Larsen M, Romundstad L, Sahlberg KK, Hofvind S.
to less harmful treatment for the affected female patients. Review True and Missed Interval Cancer in Organized Mammographic Screen-
studies and prospective studies comparing location of AI mark- ing: A Retrospective Review Study of Diagnostic and Prior Screening
ings versus the location of the cancer is needed to further under- Mammograms. Acad Radiol 2022;29(Suppl 1):S180–S191.
8. Martiniussen MA, Sagstad S, Larsen M, et al. Screen-detected and interval
stand this issue. Furthermore, high AI score on mammograms breast cancer after concordant and discordant interpretations in a popu-
in women who are not diagnosed with breast cancer represent a lation based screening program using independent double reading. Eur
challenge that is important to consider. Radiol 2022;32(9):5974–5985.
9. Hosny A, Parmar C, Quackenbush J, Schwartz LH, Aerts HJWL. Artifi-
cial intelligence in radiology. Nat Rev Cancer 2018;18(8):500–510.
Acknowledgments: We thank all personnel involved in the collection of imaging 10. Freeman K, Geppert J, Stinton C, et al. Use of artificial intelligence for
data at the breast centers included in the study. image analysis in breast cancer screening programmes: systematic review
of test accuracy. BMJ 2021;374:n1872.
Author contributions: Guarantors of integrity of entire study, M.L., S.H.; study 11. Mammography Screening With Artificial Intelligence (MASAI). Clini-
concepts/study design or data acquisition or data analysis/interpretation, all authors; calTrials.gov. https://ClinicalTrials.gov/show/NCT04838756. Published
manuscript drafting or manuscript revision for important intellectual content, all December 13, 2022. Accessed March 28, 2023.
12. Artificial Intelligence in Large-scale Breast Cancer Screening (ScreenTrust- 22. Byng D, Strauch B, Gnas L, et al. AI-based prevention of interval can-
Cad). ClinicalTrials.gov. https://ClinicalTrials.gov/show/NCT04778670. cers in a national mammography screening program. Eur J Radiol
Published March 14, 2023. Accessed March 28, 2023. 2022;152:110321.
13. Lång K, Josefsson V, Larsson AM, et al. Artificial intelligence-supported 23. Larsen M, Aglen CF, Lee CI, et al. Artificial Intelligence Evaluation of
screen reading versus standard double reading in the Mammography 122 969 Mammography Examinations from a Population-based Screen-
Screening with Artificial Intelligence trial (MASAI): a clinical safety analy- ing Program. Radiology 2022;303(3):502–511.
sis of a randomised, controlled, non-inferiority, single-blinded, screening 24. Park GE, Kang BJ, Kim SH, Lee J. Retrospective Review of Missed Cancer
accuracy study. Lancet Oncol 2023;24(8):936–944. Detection and Its Mammography Findings with Artificial-Intelligence-
14. Yoon JH, Strand F, Baltzer PAT, et al. Standalone AI for Breast Cancer Detec- Based, Computer-Aided Diagnosis. Diagnostics (Basel) 2022;12(2):387.
tion at Screening Digital Mammography and Digital Breast Tomosynthesis: 25. Lov om helseregistre og behandling av helseopplysninger (helseregister-
A Systematic Review and Meta-Analysis. Radiology 2023;307(5):e222639. loven). https://lovdata.no/dokument/NL/lov/2014-06-20-43. Published
15. Hickman SE, Woitek R, Le EPV, et al. Machine Learning for Workflow December 21, 2001. Accessed April 21, 2023.
Applications in Screening Mammography: Systematic Review and Meta- 26. Larsen M, Aglen CF, Hoff SR, Lund-Hanssen H, Hofvind S. Possible
Analysis. Radiology 2022;302(1):88–104. strategies for use of artificial intelligence in screen-reading of mammo-
16. Rodriguez-Ruiz A, Lång K, Gubern-Merida A, et al. Can we reduce the grams, based on retrospective data from 122,969 screening examinations.
workload of mammographic screening by automatic identification of Eur Radiol 2022;32(12):8238–8246.
normal exams with artificial intelligence? A feasibility study. Eur Radiol 27. Goldhirsch A, Winer EP, Coates AS, et al. Personalizing the treatment of
2019;29(9):4825–4832. women with early breast cancer: highlights of the St Gallen International
17. Dembrower K, Wåhlin E, Liu Y, et al. Effect of artificial intelligence-based Expert Consensus on the Primary Therapy of Early Breast Cancer 2013.
triaging of breast cancer screening mammograms on cancer detection Ann Oncol 2013;24(9):2206–2223.
and radiologist workload: a retrospective simulation study. Lancet Digit 28. Rodriguez-Ruiz A, Lång K, Gubern-Merida A, et al. Stand-Alone Arti-
Health 2020;2(9):e468–e474. ficial Intelligence for Breast Cancer Detection in Mammography: Com-
18. Lång K, Dustler M, Dahlblom V, Åkesson A, Andersson I, Zackrisson S. parison With 101 R adiologists. J Natl Cancer Inst 2019;111(9):916–922.
Identifying normal mammograms in a large screening population using 29. Koch HW, Larsen M, Bartsch H, Kurz KD, Hofvind S. Artificial in-
artificial intelligence. Eur Radiol 2021;31(3):1687–1692. telligence in BreastScreen Norway: a retrospective analysis of a can-
19. Lauritzen AD, Rodríguez-Ruiz A, von Euler-Chelpin MC, et al. An Artifi- cer-enriched sample including 1254 breast cancer cases. Eur Radiol
cial Intelligence-based Mammography Screening Protocol for Breast Can- 2023;33(5):3735–3743.
cer: Outcome and Radiologist Workload. Radiology 2022;304(1):41–49. 30. Tabar L, Tony Chen HH, Amy Yen MF, et al. Mammographic tumor
20. Lång K, Hofvind S, Rodríguez-Ruiz A, Andersson I. Can artificial intel- features can predict long-term outcomes reliably in women with 1-14-mm
ligence reduce the interval cancer rate in mammography screening? Eur invasive breast carcinoma. Cancer 2004;101(8):1745–1759.
Radiol 2021;31(8):5940–5947.
21. Wanders AJT, Mees W, Bun PAM, et al. Interval Cancer Detection Using
a Neural Network and Breast Density in Women with Negative Screening
Mammograms. Radiology 2022;303(2):269–275.