Causal Inference and Observational Data: Editorial Open Access

Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

Olier et al.

BMC Medical Research Methodology (2023) 23:227 BMC Medical Research


https://doi.org/10.1186/s12874-023-02058-5
Methodology

E D I TO R I A L Open Access

Causal inference and observational data


Ivan Olier1*, Yiqiang Zhan2, Xiaoyu Liang3 and Victor Volovici4

Abstract
Observational studies using causal inference frameworks can provide a feasible alternative to randomized
controlled trials. Advances in statistics, machine learning, and access to big data facilitate unraveling complex
causal relationships from observational data across healthcare, social sciences, and other fields. However, challenges
like evaluating models and bias amplification remain.

Main text to the point of serving as a feasible substitute or comple-


Billions of data records are generated every day, facili- ment for RCTs in decision-making [1]. Most statistical
tating the discovery of knowledge. Particularly, medical, and ML methods are designed to establish an associa-
epidemiological, and social science research has signifi- tion map between input (factors) and output (target)
cantly benefited from the vast amount of data available variables. However, such association maps are unable to
through sources such as medical records, easily attain- identify potential latent factors that influence both inputs
able surveys, and social media platforms. This availabil- and outputs, making their use limited to determin-
ity has led to a significant increase in the popularity of ing causal links. For instance, several studies reported a
observational studies and meta-analyses as complemen- higher prevalence of lung cancer among coffee drinkers
tary approaches of randomized controlled trials (RCTs). compared to non-drinkers. However, since many coffee
RCTs are considered as the gold-standard study design drinkers also smoke, the observed association between
for decision-making. However, conducting RCTs may coffee drinking and lung cancer is confounded by smok-
not always be feasible due to ethical concerns, significant ing, the true cause of the disease [2].
costs, or time limitations. Traditionally, outcomes from Causal inference from observational data finds applica-
observational studies are considered of less value than tion across various fields, with notable impact observed
RCTs, mainly because the former are vulnerable to con- in domains such as healthcare, medicine, political and
founding and bias issues. Recently, novel developments economic sciences, and social sciences. In healthcare
in statistics and machine learning (ML) are driven the and medical research, causal inference enables the iden-
development of causal inference in observational studies tification of heterogeneous treatment effects and the
formulation of personalized treatment strategies. By
incorporating individual-level data, genetic information,
*Correspondence: and ML techniques, the field of personalized medicine
Ivan Olier
[email protected] benefits from enhanced causal inference methodologies
1
Data Science Research Centre, Liverpool John Moores University, [3]. The critical role of causal inference extends to policy
Liverpool, UK
2
evaluation and intervention assessment, where advance-
Institute of Environmental Medicine, Karolinska Institutet, Stockholm,
Sweden ments in causal inference methods facilitate evidence-
3
Department of Epidemiology and Biostatistics, Michigan State University based decision-making by rigorously evaluating policy
College of Human Medicine, East Lansing, MI 48824, USA
4
effectiveness, estimating causal impacts, and compre-
Department of Neurosurgery, Center for Medical Decision Making,
Erasmus MC, Rotterdam, The Netherlands hending unintended consequences. Additionally, the

© The Author(s) 2023. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use,
sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and
the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this
article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included
in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will
need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The
Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available
in this article, unless otherwise stated in a credit line to the data.
Olier et al. BMC Medical Research Methodology (2023) 23:227 Page 2 of 3

utilization of instrumental variables, regression dis- data and integrating them with structured data, thereby
continuity designs, and quasi-experimental approaches enhancing the depth of insights and broadening the
as methodological advancements further augment the applicability of causal inference from observational data.
understanding of complex social phenomena, policy However, causal inference with observational data
impacts, and economic relationships [4, 5]. is not free of challenges. For instance, causal inference
Broadly speaking, causal inference attempts to build models are hard to evaluate. If a causal link is found,
data-driven models that can predict the effect of inter- still there is no clear mechanism to assess whether the
ventions on outcomes. Using observational data for link is real or not. The performance of associative data-
causal inference is gaining momentum due to the conflu- driven models can be assessed and compared easily since
ence of factors such as the large amount of more com- large data repositories are publicly available and widely
plex and richer data and advanced techniques from used. However, this is not the case for causal inference,
statistics and ML. In general, two frameworks exist for for which the lack of public benchmark data is one of
causal inference in observational studies, which are not the biggest problems it is encountered in their develop-
necessarily mutually exclusive: the structural causal ment. There is also a lack of comparisons to non-causal
model (SCM) framework and the potential outcome methods in the literature [9]. It is also inevitable to make
framework (POF). The SCM framework relies on deter- untestable assumptions, which could also contribute to
ministic, functional equations to construct directed acy- bias amplification and harm the external validity when
clic graphs (DAGs) with variables as nodes and links as compared to non-causal counterparts [10].
causal relationships and is particularly useful in identi- As the field continues to advance, interdisciplinary col-
fying unknown causal and confounding variables while laborations, methodological innovations, and the integra-
estimating the actual effect of a given treatment. On the tion of emerging technologies will continue to expand the
other hand, the POF framework (also known as the coun- frontiers of causal inference and its applications in vari-
terfactual framework) examines outcomes that would ous domains. Nevertheless, challenges must be addressed
have likely been observed had the treatment differed, for swift adoption in social and medical research.
representing the counterfactual or the missing outcome.
Abbreviations
Other frameworks such as instrumental variables, media- DAG directed acyclic graphs
tion analysis, and Bayesian networks are also noteworthy ML machine learning
in causal inference research [6]. POF potential outcome framework
RCT randomized control trial
In recent years, there has been growing interest in SCM structural causal model
combining multiple frameworks and approaches to
improve causal inference. Integrating ideas from different Authors’ contributions
IO—conceived and drafted the Editorial. YZ, XL, VV revised the Editorial. All
frameworks can lead to more comprehensive and robust authors read and approved the final manuscript.
causal analyses. Additionally, the use of machine learn-
ing techniques and the exploration of new identifica- Funding
No funding was obtained for this editorial.
tion strategies are areas that hold promise for advancing
causal inference research [7]. Analysis of observational Data Availability
studies could benefit from the best of two worlds. ML Not applicable.

methods can help identify confounding variables, handle


high-dimensional data, and improve prediction accu- Declarations
racy, while causal inference provides interpretability and Competing interests
causal understanding. Integrating these fields can lead to The authors of this editorial are Editorial Board Members of BMC Medical
more powerful and robust causal inference models [8]. Research Methodology and Guest Editors of the Causal Inference and
Observational Data collection.
Causal inference research is a dynamic field that con-
tinues to evolve. Numerous real-world scenarios entail Ethics approval and consent to participate
complex systems comprising multiple interacting vari- Not applicable.
ables. Advances in causal inference are instrumental Consent for publication
in unraveling causal relationships in such systems. The Not applicable.
availability of large-scale datasets presents both oppor-
tunities and challenges for causal inference. The develop- Received: 21 September 2023 / Accepted: 6 October 2023

ment of scalable methods capable of efficiently handling


large data sets while addressing biases, confounding, and
selection effects constitute an active area of research. Fur-
thermore, efforts are being made to devise methodolo-
gies for extracting causal relationships from unstructured
Olier et al. BMC Medical Research Methodology (2023) 23:227 Page 3 of 3

References 7. Prosperi M, Guo Y, Sperrin M, Koopman JS, Min JS, He X, et al. Causal inference
1. Hernán MA, Methods of Public Health Research — Strengthening Causal and counterfactual prediction in machine learning for actionable healthcare.
Inference from Observational Data. New England Journal of Medicine [Inter- Nat Mach Intell. 2020;2(7):369–75.
net]. 2021 Oct 7 [cited 2023 May 23];385(15):1345–8. Available from: https:// 8. Luo Y, Peng J, Ma J. When causal inference meets deep learning. Nature
www.nejm.org/doi/full/https://doi.org/10.1056/NEJMp2113319. Machine Intelligence 2020 2:8 [Internet]. 2020 Aug 12 [cited 2023
2. Hemkens LG, Ewald H, Naudet F, Ladanie A, Shaw JG, Sajeev G, et al. Interpre- May 23];2(8):426–7. Available from: https://www.nature.com/articles/
tation of epidemiologic studies very often lacked adequate consideration of s42256-020-0218-x.
confounding. J Clin Epidemiol. 2018;93:94–102. 9. Kaddour J, Lynch A, Liu Q, Kusner MJ, Silva R. Causal Machine Learning: A
3. Sanchez P, Voisey JP, Xia T, Watson HI, O’Neil AQ, Tsaftaris SA. Causal machine Survey and Open Problems. arXiv:220615475 [Internet]. 2022 Jun 30 [cited
learning for healthcare and precision medicine. R Soc Open Sci. 2022;9(8). 2023 May 23]; Available from: http://arxiv.org/abs/2206.15475.
4. Rohlfing I, Zuber CI. Check Your Truth Conditions!Clarifying the Relation- 10. Hammerton G, Munafò MR. Causal inference with observational data: the
ship between Theories of Causation and Social Science Methods for Causal need for triangulation of evidence. Psychol Med [Internet]. 2021 Mar 1 [cited
Inference. Sociol Methods Res [Internet]. 2021 Nov 1 [cited 2023 May 2023 May 23];51(4):563–78. Available from: https://www.cambridge.org/
23];50(4):1623–59. Available from: https://journals.sagepub.com/doi/https:// core/journals/psychological-medicine/article/causal-inference-with-observa-
doi.org/10.1177/0049124119826156. tional-data-the-need-for-triangulation-of-evidence/AF5F7918753DF50F26B1
5. Varian HR, Proceedings of the National Academy of Sciences [Internet]. D49561F0DF83.
Causal inference in economics and marketing. 2016 Jul 5 [cited 2023 May
23];113(27):7310–5. Available from: https://www.pnas.org/doi/abs/https://
doi.org/10.1073/pnas.1510479113. Publisher’s Note
6. Shi J, Norgeot B. Learning Causal Effects from Observational Data in Health- Springer Nature remains neutral with regard to jurisdictional claims in
care: a review and Summary. Front Med (Lausanne). 2022;9:864882. published maps and institutional affiliations.

You might also like