Ebook1,259 pages24 hours

Applied Longitudinal Analysis

Name: Applied Longitudinal Analysis
Brand: Wiley
Rating: 3.0 (2 reviews)

By Garrett M. Fitzmaurice, Nan M. Laird and James H. Ware

Rating: 3 out of 5 stars

3/5

()

Read preview

About this ebook

Praise for the First Edition

". . . [this book] should be on the shelf of everyone interested in . . . longitudinal data analysis."
—Journal of the American Statistical Association

Features newly developed topics and applications of the analysis of longitudinal data

Applied Longitudinal Analysis, Second Edition presents modern methods for analyzing data from longitudinal studies and now features the latest state-of-the-art techniques. The book emphasizes practical, rather than theoretical, aspects of methods for the analysis of diverse types of longitudinal data that can be applied across various fields of study, from the health and medical sciences to the social and behavioral sciences.

The authors incorporate their extensive academic and research experience along with various updates that have been made in response to reader feedback. The Second Edition features six newly added chapters that explore topics currently evolving in the field, including:

Fixed effects and mixed effects models
Marginal models and generalized estimating equations
Approximate methods for generalized linear mixed effects models
Multiple imputation and inverse probability weighted methods
Smoothing methods for longitudinal data
Sample size and power

Each chapter presents methods in the setting of applications to data sets drawn from the health sciences. New problem sets have been added to many chapters, and a related website features sample programs and computer output using SAS, Stata, and R, as well as data sets and supplemental slides to facilitate a complete understanding of the material.

With its strong emphasis on multidisciplinary applications and the interpretation of results, Applied Longitudinal Analysis, Second Edition is an excellent book for courses on statistics in the health and medical sciences at the upper-undergraduate and graduate levels. The book also serves as a valuable reference for researchers and professionals in the medical, public health, and pharmaceutical fields as well as those in social and behavioral sciences who would like to learn more about analyzing longitudinal data.

Skip carousel

LanguageEnglish

PublisherWiley

Release dateOct 23, 2012

ISBN9781118551790

Author

Garrett M. Fitzmaurice

Related authors

Skip carousel

Related to Applied Longitudinal Analysis

Titles in the series (100)

Skip carousel

Probability and Conditional Expectation: Fundamentals for the Empirical Sciences
Ebook
Probability and Conditional Expectation: Fundamentals for the Empirical Sciences
byRolf Steyer
Rating: 0 out of 5 stars
0 ratings
Linear Statistical Inference and its Applications
Ebook
Linear Statistical Inference and its Applications
byC. Radhakrishna Rao
Rating: 0 out of 5 stars
0 ratings
Applications of Statistics to Industrial Experimentation
Ebook
Applications of Statistics to Industrial Experimentation
byCuthbert Daniel
Rating: 3 out of 5 stars
3/5
Measurement Errors in Surveys
Ebook
Measurement Errors in Surveys
byPaul P. Biemer
Rating: 0 out of 5 stars
0 ratings
Time Series Analysis: Nonstationary and Noninvertible Distribution Theory
Ebook
Time Series Analysis: Nonstationary and Noninvertible Distribution Theory
byKatsuto Tanaka
Rating: 0 out of 5 stars
0 ratings
Time Series Analysis with Long Memory in View
Ebook
Time Series Analysis with Long Memory in View
byUwe Hassler
Rating: 0 out of 5 stars
0 ratings
Robust Correlation: Theory and Applications
Ebook
Robust Correlation: Theory and Applications
byGeorgy L. Shevlyakov
Rating: 0 out of 5 stars
0 ratings
Theory of Ridge Regression Estimation with Applications
Ebook
Theory of Ridge Regression Estimation with Applications
byA. K. Md. Ehsanes Saleh
Rating: 0 out of 5 stars
0 ratings
Measuring Agreement: Models, Methods, and Applications
Ebook
Measuring Agreement: Models, Methods, and Applications
byPankaj K. Choudhary
Rating: 0 out of 5 stars
0 ratings
Modern Experimental Design
Ebook
Modern Experimental Design
byThomas P. Ryan
Rating: 0 out of 5 stars
0 ratings
Nonlinear Statistical Models
Ebook
Nonlinear Statistical Models
byA. Ronald Gallant
Rating: 0 out of 5 stars
0 ratings
Methods for Statistical Data Analysis of Multivariate Observations
Ebook
Methods for Statistical Data Analysis of Multivariate Observations
byR. Gnanadesikan
Rating: 0 out of 5 stars
0 ratings
Sequential Stochastic Optimization
Ebook
Sequential Stochastic Optimization
byR. Cairoli
Rating: 0 out of 5 stars
0 ratings
Theory of Probability: A critical introductory treatment
Ebook
Theory of Probability: A critical introductory treatment
byBruno de Finetti
Rating: 0 out of 5 stars
0 ratings
Statistics and Causality: Methods for Applied Empirical Research
Ebook
Statistics and Causality: Methods for Applied Empirical Research
byWolfgang Wiedermann
Rating: 0 out of 5 stars
0 ratings
Forecasting with Univariate Box - Jenkins Models: Concepts and Cases
Ebook
Forecasting with Univariate Box - Jenkins Models: Concepts and Cases
byAlan Pankratz
Rating: 0 out of 5 stars
0 ratings
A Course in Time Series Analysis
Ebook
A Course in Time Series Analysis
byDaniel Peña
Rating: 3 out of 5 stars
3/5
Multiple Imputation for Nonresponse in Surveys
Ebook
Multiple Imputation for Nonresponse in Surveys
byDonald B. Rubin
Rating: 2 out of 5 stars
2/5
Fundamentals of Queueing Theory
Ebook
Fundamentals of Queueing Theory
byJohn F. Shortle
Rating: 0 out of 5 stars
0 ratings
Nonparametric Finance
Ebook
Nonparametric Finance
byJussi Klemelä
Rating: 0 out of 5 stars
0 ratings
Computation for the Analysis of Designed Experiments
Ebook
Computation for the Analysis of Designed Experiments
byRichard Heiberger
Rating: 0 out of 5 stars
0 ratings
Periodically Correlated Random Sequences: Spectral Theory and Practice
Ebook
Periodically Correlated Random Sequences: Spectral Theory and Practice
byHarry L. Hurd
Rating: 0 out of 5 stars
0 ratings
Aspects of Multivariate Statistical Theory
Ebook
Aspects of Multivariate Statistical Theory
byRobb J. Muirhead
Rating: 0 out of 5 stars
0 ratings
Business Survey Methods
Ebook
Business Survey Methods
byBrenda G. Cox
Rating: 0 out of 5 stars
0 ratings
Statistical Models and Methods for Lifetime Data
Ebook
Statistical Models and Methods for Lifetime Data
byJerald F. Lawless
Rating: 0 out of 5 stars
0 ratings
Fundamental Statistical Inference: A Computational Approach
Ebook
Fundamental Statistical Inference: A Computational Approach
byMarc S. Paolella
Rating: 0 out of 5 stars
0 ratings
Linear Regression Analysis
Ebook
Linear Regression Analysis
byGeorge A. F. Seber
Rating: 3 out of 5 stars
3/5
The Statistical Analysis of Failure Time Data
Ebook
The Statistical Analysis of Failure Time Data
byJohn D. Kalbfleisch
Rating: 0 out of 5 stars
0 ratings
Statistical Methods for the Analysis of Biomedical Data
Ebook
Statistical Methods for the Analysis of Biomedical Data
byRobert F. Woolson
Rating: 0 out of 5 stars
0 ratings
Statistical Modeling by Wavelets
Ebook
Statistical Modeling by Wavelets
byBrani Vidakovic
Rating: 0 out of 5 stars
0 ratings

Related ebooks

Skip carousel

Methods of Multivariate Analysis
Ebook
Methods of Multivariate Analysis
byAlvin C. Rencher
Rating: 2 out of 5 stars
2/5
Case Studies in Bayesian Statistical Modelling and Analysis
Ebook
Case Studies in Bayesian Statistical Modelling and Analysis
byClair L. Alston
Rating: 0 out of 5 stars
0 ratings
Statistics and Causality: Methods for Applied Empirical Research
Ebook
Statistics and Causality: Methods for Applied Empirical Research
byWolfgang Wiedermann
Rating: 0 out of 5 stars
0 ratings
Simulation for Data Science with R
Ebook
Simulation for Data Science with R
byMatthias Templ
Rating: 0 out of 5 stars
0 ratings
Matrix Operations for Engineers and Scientists: An Essential Guide in Linear Algebra
Ebook
Matrix Operations for Engineers and Scientists: An Essential Guide in Linear Algebra
byAlan Jeffrey
Rating: 0 out of 5 stars
0 ratings
An Elementary Introduction to Statistical Learning Theory
Ebook
An Elementary Introduction to Statistical Learning Theory
bySanjeev Kulkarni
Rating: 0 out of 5 stars
0 ratings
A Course in Statistics with R
Ebook
A Course in Statistics with R
byPrabhanjan N. Tattar
Rating: 0 out of 5 stars
0 ratings
An Introduction to Econometric Theory
Ebook
An Introduction to Econometric Theory
byJames Davidson
Rating: 0 out of 5 stars
0 ratings
Nonlinear Parameter Optimization Using R Tools
Ebook
Nonlinear Parameter Optimization Using R Tools
byJohn C. Nash
Rating: 4 out of 5 stars
4/5
Latent Variable Models and Factor Analysis: A Unified Approach
Ebook
Latent Variable Models and Factor Analysis: A Unified Approach
byDavid J. Bartholomew
Rating: 0 out of 5 stars
0 ratings
R in Action, Third Edition: Data analysis and graphics with R and Tidyverse
Ebook
R in Action, Third Edition: Data analysis and graphics with R and Tidyverse
byRobert I. Kabacoff
Rating: 0 out of 5 stars
0 ratings
Experiments with Mixtures: Designs, Models, and the Analysis of Mixture Data
Ebook
Experiments with Mixtures: Designs, Models, and the Analysis of Mixture Data
byJohn A. Cornell
Rating: 5 out of 5 stars
5/5
Spatio-temporal Design: Advances in Efficient Data Acquisition
Ebook
Spatio-temporal Design: Advances in Efficient Data Acquisition
byJorge Mateu
Rating: 0 out of 5 stars
0 ratings
Methods and Applications of Statistics in Clinical Trials, Volume 1: Concepts, Principles, Trials, and Designs
Ebook
Methods and Applications of Statistics in Clinical Trials, Volume 1: Concepts, Principles, Trials, and Designs
byN. Balakrishnan
Rating: 0 out of 5 stars
0 ratings
Biostatistics Using JMP: A Practical Guide
Ebook
Biostatistics Using JMP: A Practical Guide
byTrevor Bihl
Rating: 0 out of 5 stars
0 ratings
Random Data: Analysis and Measurement Procedures
Ebook
Random Data: Analysis and Measurement Procedures
byJulius S. Bendat
Rating: 4 out of 5 stars
4/5
Practical Statistics Simply Explained
Ebook
Practical Statistics Simply Explained
byDr. Russell A. Langley
Rating: 4 out of 5 stars
4/5
Practical Data Analysis - Second Edition
Ebook
Practical Data Analysis - Second Edition
byHector Cuesta
Rating: 0 out of 5 stars
0 ratings
Mastering Text Mining with R
Ebook
Mastering Text Mining with R
byAvinash Paul
Rating: 0 out of 5 stars
0 ratings
Practical Data Science with R, Second Edition
Ebook
Practical Data Science with R, Second Edition
byJohn Mount
Rating: 4 out of 5 stars
4/5
Learning Predictive Analytics with R
Ebook
Learning Predictive Analytics with R
byMayor Eric
Rating: 0 out of 5 stars
0 ratings
Causality: Statistical Perspectives and Applications
Ebook
Causality: Statistical Perspectives and Applications
byCarlo Berzuini
Rating: 0 out of 5 stars
0 ratings
Statistical Modeling by Wavelets
Ebook
Statistical Modeling by Wavelets
byBrani Vidakovic
Rating: 0 out of 5 stars
0 ratings
R: Data Analysis and Visualization
Ebook
R: Data Analysis and Visualization
byBrett Lantz
Rating: 5 out of 5 stars
5/5
Cluster Analysis
Ebook
Cluster Analysis
byBrian S. Everitt
Rating: 4 out of 5 stars
4/5
Beginning Statistics with Data Analysis
Ebook
Beginning Statistics with Data Analysis
byFrederick Mosteller
Rating: 4 out of 5 stars
4/5
Beginning R: The Statistical Programming Language
Ebook
Beginning R: The Statistical Programming Language
byMark Gardener
Rating: 5 out of 5 stars
5/5
Applied Data Mining for Forecasting Using SAS
Ebook
Applied Data Mining for Forecasting Using SAS
byTim Rey
Rating: 0 out of 5 stars
0 ratings
The Real Work of Data Science: Turning data into information, better decisions, and stronger organizations
Ebook
The Real Work of Data Science: Turning data into information, better decisions, and stronger organizations
byRon S. Kenett
Rating: 0 out of 5 stars
0 ratings
Building a Recommendation System with R
Ebook
Building a Recommendation System with R
byGorakala Suresh K.
Rating: 0 out of 5 stars
0 ratings

Medical For You

Skip carousel

The Obesity Code: Unlocking the Secrets of Weight Loss (Why Intermittent Fasting Is the Key to Controlling Your Weight)
Ebook
The Obesity Code: Unlocking the Secrets of Weight Loss (Why Intermittent Fasting Is the Key to Controlling Your Weight)
byDr. Jason Fung
Rating: 4 out of 5 stars
4/5
What Happened to You?: Conversations on Trauma, Resilience, and Healing
Ebook
What Happened to You?: Conversations on Trauma, Resilience, and Healing
byOprah Winfrey
Rating: 4 out of 5 stars
4/5
Brain on Fire: My Month of Madness
Ebook
Brain on Fire: My Month of Madness
bySusannah Cahalan
Rating: 4 out of 5 stars
4/5
Natural Remedies Complete Collection: Barbara O'Neill Lost Book Containing Over 1000 Recipes in this Ultimate Guide to ALL of Dr. Barbara O’Neill’s Studies on Living a Whole Self Sustain Lifestyle.
Ebook
Natural Remedies Complete Collection: Barbara O'Neill Lost Book Containing Over 1000 Recipes in this Ultimate Guide to ALL of Dr. Barbara O’Neill’s Studies on Living a Whole Self Sustain Lifestyle.
byNiella Brown
Rating: 5 out of 5 stars
5/5
The Vagina Bible: The Vulva and the Vagina: Separating the Myth from the Medicine
Ebook
The Vagina Bible: The Vulva and the Vagina: Separating the Myth from the Medicine
byDr. Jen Gunter
Rating: 5 out of 5 stars
5/5
The Lost Book of Simple Herbal Remedies: Discover over 100 herbal Medicine for all kinds of Ailment, Inspired By Dr. Barbara O'Neill
Ebook
The Lost Book of Simple Herbal Remedies: Discover over 100 herbal Medicine for all kinds of Ailment, Inspired By Dr. Barbara O'Neill
byBlossom Davis
Rating: 5 out of 5 stars
5/5
Women With Attention Deficit Disorder: Embrace Your Differences and Transform Your Life
Ebook
Women With Attention Deficit Disorder: Embrace Your Differences and Transform Your Life
bySari Solden
Rating: 5 out of 5 stars
5/5
Adult ADHD: How to Succeed as a Hunter in a Farmer's World
Ebook
Adult ADHD: How to Succeed as a Hunter in a Farmer's World
byThom Hartmann
Rating: 4 out of 5 stars
4/5
The Little Book of Hygge: Danish Secrets to Happy Living
Ebook
The Little Book of Hygge: Danish Secrets to Happy Living
byMeik Wiking
Rating: 4 out of 5 stars
4/5
The Emotion Code: How to Release Your Trapped Emotions for Abundant Health, Love, and Happiness (Updated and Expanded Edition)
Ebook
The Emotion Code: How to Release Your Trapped Emotions for Abundant Health, Love, and Happiness (Updated and Expanded Edition)
byDr. Bradley Nelson
Rating: 4 out of 5 stars
4/5
The Emperor of All Maladies: A Biography of Cancer
Ebook
The Emperor of All Maladies: A Biography of Cancer
bySiddhartha Mukherjee
Rating: 5 out of 5 stars
5/5
The Diabetes Code: Prevent and Reverse Type 2 Diabetes Naturally
Ebook
The Diabetes Code: Prevent and Reverse Type 2 Diabetes Naturally
byDr. Jason Fung
Rating: 5 out of 5 stars
5/5
Mating in Captivity: Unlocking Erotic Intelligence
Ebook
Mating in Captivity: Unlocking Erotic Intelligence
byEsther Perel
Rating: 4 out of 5 stars
4/5
The Mind-Gut Connection: How the Hidden Conversation Within Our Bodies Impacts Our Mood, Our Choices, and Our Overall Health
Ebook
The Mind-Gut Connection: How the Hidden Conversation Within Our Bodies Impacts Our Mood, Our Choices, and Our Overall Health
byEmeran Mayer
Rating: 4 out of 5 stars
4/5
Native American Herbalist's Bible - 10 Books in 1: Create Your Green Paradise of Medicinal Plants and Herbal Remedies to Unleash Your Vitality: Herbal Apotecary Collection
Ebook
Native American Herbalist's Bible - 10 Books in 1: Create Your Green Paradise of Medicinal Plants and Herbal Remedies to Unleash Your Vitality: Herbal Apotecary Collection
byLomasi Ahusaka
Rating: 5 out of 5 stars
5/5
Gut: The Inside Story of Our Body's Most Underrated Organ (Revised Edition)
Ebook
Gut: The Inside Story of Our Body's Most Underrated Organ (Revised Edition)
byGiulia Enders
Rating: 4 out of 5 stars
4/5
"Cause Unknown": The Epidemic of Sudden Deaths in 2021 & 2022
Ebook
"Cause Unknown": The Epidemic of Sudden Deaths in 2021 & 2022
byEd Dowd
Rating: 5 out of 5 stars
5/5
Blitzed: Drugs in the Third Reich
Ebook
Blitzed: Drugs in the Third Reich
byNorman Ohler
Rating: 4 out of 5 stars
4/5
The 40 Day Dopamine Fast
Ebook
The 40 Day Dopamine Fast
byGreg Kamphuis
Rating: 4 out of 5 stars
4/5
Extra Focus: The Quick Start Guide to Adult ADHD
Ebook
Extra Focus: The Quick Start Guide to Adult ADHD
byJesse J. Anderson
Rating: 5 out of 5 stars
5/5
Good Energy Diet Recipes: Over 100 Delicious Recipes Inspired By Dr. Casey Means Teaching, To Help You Lose Weight, Feel Great and Boost Your Metabolism with The Ultimate 21-Day Meal Plan
Ebook
Good Energy Diet Recipes: Over 100 Delicious Recipes Inspired By Dr. Casey Means Teaching, To Help You Lose Weight, Feel Great and Boost Your Metabolism with The Ultimate 21-Day Meal Plan
byBarbara Seeber
Rating: 0 out of 5 stars
0 ratings
Mediterranean Diet Meal Prep Cookbook: Easy And Healthy Recipes You Can Meal Prep For The Week
Ebook
Mediterranean Diet Meal Prep Cookbook: Easy And Healthy Recipes You Can Meal Prep For The Week
byLisa Rainolds
Rating: 5 out of 5 stars
5/5
Tight Hip Twisted Core: The Key To Unresolved Pain
Ebook
Tight Hip Twisted Core: The Key To Unresolved Pain
byChristine Koth
Rating: 4 out of 5 stars
4/5
Hidden Lives: True Stories from People Who Live with Mental Illness
Ebook
Hidden Lives: True Stories from People Who Live with Mental Illness
byGabor Maté, MD
Rating: 4 out of 5 stars
4/5
The Song of the Cell: An Exploration of Medicine and the New Human
Ebook
The Song of the Cell: An Exploration of Medicine and the New Human
bySiddhartha Mukherjee
Rating: 4 out of 5 stars
4/5
The White Coat Investor: A Doctor's Guide to Personal Finance and Investing
Ebook
The White Coat Investor: A Doctor's Guide to Personal Finance and Investing
byJames Dahle
Rating: 4 out of 5 stars
4/5
Rewire Your Brain: Think Your Way to a Better Life
Ebook
Rewire Your Brain: Think Your Way to a Better Life
byJohn B. Arden
Rating: 4 out of 5 stars
4/5
Living Daily With Adult ADD or ADHD: 365 Tips o the Day
Ebook
Living Daily With Adult ADD or ADHD: 365 Tips o the Day
byDouglas A Puryear MD
Rating: 5 out of 5 stars
5/5
The Obesity Code: the bestselling guide to unlocking the secrets of weight loss
Ebook
The Obesity Code: the bestselling guide to unlocking the secrets of weight loss
byJason Fung
Rating: 4 out of 5 stars
4/5
52 Prepper Projects: A Project a Week to Help You Prepare for the Unpredictable
Ebook
52 Prepper Projects: A Project a Week to Help You Prepare for the Unpredictable
byDavid Nash
Rating: 5 out of 5 stars
5/5

Related podcast episodes

Skip carousel

#1 Bayes, open-source and bioinformatics, with Osvaldo Martin
UNLIMITED
#1 Bayes, open-source and bioinformatics, with Osvaldo Martin
byLearning Bayesian Statistics
0 ratings
0% found this document useful
Classic Replay: Does Science need God? Tom McLeish, David Hutchings & Suzie Sheehy: To honour the passing of the great scientist and Christian educator Tom McLeish, we're replaying this show first broadcast in 2017. David Hutchings is a physics teacher and prof Tom McLeish chaired the Royal Society’s education committee. Both...
UNLIMITED
Classic Replay: Does Science need God? Tom McLeish, David Hutchings & Suzie Sheehy: To honour the passing of the great scientist and Christian educator Tom McLeish, we're replaying this show first broadcast in 2017. David Hutchings is a physics teacher and prof Tom McLeish chaired the Royal Society’s education committee. Both...
byUnbelievable?
0 ratings
0% found this document useful
What Makes for 'Good' Math?: Terence Tao, who has been called the "Mozart of Mathematics," wrote an essay in 2007 about the common ingredients in "good" mathematical research. In this episode, the Fields Medalist joins Steven Strogatz to revisit the topic.
UNLIMITED
What Makes for 'Good' Math?: Terence Tao, who has been called the "Mozart of Mathematics," wrote an essay in 2007 about the common ingredients in "good" mathematical research. In this episode, the Fields Medalist joins Steven Strogatz to revisit the topic.
byThe Joy of Why
0 ratings
0% found this document useful
Complex Geometries: Modellansatz 086
UNLIMITED
Complex Geometries: Modellansatz 086
byModellansatz - English episodes only
0 ratings
0% found this document useful
Complex Geometries
UNLIMITED
Complex Geometries
byModellansatz
0 ratings
0% found this document useful
Unique3D: High-Quality and Efficient 3D Mesh Generation from a Single Image
UNLIMITED
Unique3D: High-Quality and Efficient 3D Mesh Generation from a Single Image
byPapers Read on AI
0 ratings
0% found this document useful
MLOps Coffee Sessions #11: Analyzing “Continuous Delivery and Automation Pipelines in ML" // Part 3
UNLIMITED
MLOps Coffee Sessions #11: Analyzing “Continuous Delivery and Automation Pipelines in ML" // Part 3
byMLOps.community
0 ratings
0% found this document useful
The urgent need for more grid automation: Grid optimization expert Dr. Kyri Baker explains how utilities can expand artificial intelligence projects today, and what’s next for the technology.
UNLIMITED
The urgent need for more grid automation: Grid optimization expert Dr. Kyri Baker explains how utilities can expand artificial intelligence projects today, and what’s next for the technology.
byWith Great Power
0 ratings
0% found this document useful
Office Hours w/ Professor Jacob Mays
UNLIMITED
Office Hours w/ Professor Jacob Mays
byPublic Power Underground
0 ratings
0% found this document useful
103: DigiPath Digest #11 (Pathology & AI: Metastasis Detection, Fast Annotations & Foundation Models)
UNLIMITED
103: DigiPath Digest #11 (Pathology & AI: Metastasis Detection, Fast Annotations & Foundation Models)
byDigital Pathology Podcast
0 ratings
0% found this document useful
#037 - Tour De Bayesian with Connor Tann
UNLIMITED
#037 - Tour De Bayesian with Connor Tann
byMachine Learning Street Talk (MLST)
0 ratings
0% found this document useful
Strachey Lecture: From classical to non-classical stochastic shortest path problems: Professor Christel Baier delivers the Hillary Term 2024 Strachey Lecture
UNLIMITED
Strachey Lecture: From classical to non-classical stochastic shortest path problems: Professor Christel Baier delivers the Hillary Term 2024 Strachey Lecture
byStrachey Lectures
0 ratings
0% found this document useful
Automatic Differentiation: Modellansatz 167
UNLIMITED
Automatic Differentiation: Modellansatz 167
byModellansatz - English episodes only
0 ratings
0% found this document useful
Writing in the Margins: Better Inference Pattern for Long Context Retrieval
UNLIMITED
Writing in the Margins: Better Inference Pattern for Long Context Retrieval
byPapers Read on AI
0 ratings
0% found this document useful
MLOps Coffee Sessions #10 Analyzing the Article “Continuous Delivery and Automation Pipelines in Machine Learning" // Part 2
UNLIMITED
MLOps Coffee Sessions #10 Analyzing the Article “Continuous Delivery and Automation Pipelines in Machine Learning" // Part 2
byMLOps.community
0 ratings
0% found this document useful
Material Science with Houlong Zhuang at Q2B Paris
UNLIMITED
Material Science with Houlong Zhuang at Q2B Paris
byThe New Quantum Era
0 ratings
0% found this document useful
Anyone Listening? Quantum Cryptography Applications with Vlatko Vedral: Upgrading isn't just for phone systems. Quantum information science tackles the upgrade of old existing technologies, which run by classical physics laws, to those that function in the quantum realm. It's as easy as it sounds: Vlatko Vederal tells...
UNLIMITED
Anyone Listening? Quantum Cryptography Applications with Vlatko Vedral: Upgrading isn't just for phone systems. Quantum information science tackles the upgrade of old existing technologies, which run by classical physics laws, to those that function in the quantum realm. It's as easy as it sounds: Vlatko Vederal tells...
byFinding Genius Podcast
0 ratings
0% found this document useful
Greg Yang | Large N Limits: Random Matrices & Neural Networks | The Cartesian Cafe with Timothy Nguyen: Greg Yang is a mathematician and AI researcher at Microsoft Research who for the past several years has done incredibly original theoretical work in the understanding of large artificial neural networks. Greg received his bachelors in mathematics fr...
UNLIMITED
Greg Yang | Large N Limits: Random Matrices & Neural Networks | The Cartesian Cafe with Timothy Nguyen: Greg Yang is a mathematician and AI researcher at Microsoft Research who for the past several years has done incredibly original theoretical work in the understanding of large artificial neural networks. Greg received his bachelors in mathematics fr...
byThe Cartesian Cafe
0 ratings
0% found this document useful
Episode: 42 - Machine Learning Informatics for Antibody Discovery
UNLIMITED
Episode: 42 - Machine Learning Informatics for Antibody Discovery
byThe Chain: Protein Engineering Podcast
0 ratings
0% found this document useful
031 Talking About Neural Flossing, Gliding, and Sliding: In this episode Adam and Greg discuss the commonly used neural dynamic tests and treatments. They discuss their plausibility and effectiveness. Do they do what they are said to do, and do they actually work? Watch out as some sacred cows maybe...
UNLIMITED
031 Talking About Neural Flossing, Gliding, and Sliding: In this episode Adam and Greg discuss the commonly used neural dynamic tests and treatments. They discuss their plausibility and effectiveness. Do they do what they are said to do, and do they actually work? Watch out as some sacred cows maybe...
byThe NAF Physio Podcast
0 ratings
0% found this document useful
Quantum Advantage Theory and Practice with Di Fang
UNLIMITED
Quantum Advantage Theory and Practice with Di Fang
byThe New Quantum Era
0 ratings
0% found this document useful
Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations: Large-scale recommendation systems are characterized by their reliance on high cardinality, heterogeneous features and the need to handle tens of billions of user actions on a daily basis. Despite being trained on huge volume of data with thousands o...
UNLIMITED
Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations: Large-scale recommendation systems are characterized by their reliance on high cardinality, heterogeneous features and the need to handle tens of billions of user actions on a daily basis. Despite being trained on huge volume of data with thousands o...
byPapers Read on AI
0 ratings
0% found this document useful
SE Radio 623: Michael J. Freedman on TimescaleDB: Michael J. Freedman, the Robert E. Kahn Professor in the Computer Science Department at Princeton University, as well as the co-founder and CTO of Timescale, spoke with SE Radio host about TimescaleDB. They revisit what time series data means in...
UNLIMITED
SE Radio 623: Michael J. Freedman on TimescaleDB: Michael J. Freedman, the Robert E. Kahn Professor in the Computer Science Department at Princeton University, as well as the co-founder and CTO of Timescale, spoke with SE Radio host about TimescaleDB. They revisit what time series data means in...
bySoftware Engineering Radio - the podcast for professional software developers
0 ratings
0% found this document useful
The Computational Complexity of Machine Learning: In this episode, Professor Michael Kearns from the University of Pennsylvania joins host Kyle Polich to talk about the computational complexity of machine learning, complexity in game theory, and algorithmic fairness. Michael's doctoral thesis gave an...
UNLIMITED
The Computational Complexity of Machine Learning: In this episode, Professor Michael Kearns from the University of Pennsylvania joins host Kyle Polich to talk about the computational complexity of machine learning, complexity in game theory, and algorithmic fairness. Michael's doctoral thesis gave an...
byData Skeptic
0 ratings
0% found this document useful
083R_Operationalising a concept: The systematic review of composite indicator building for measuring community disaster resilience (research summary)
UNLIMITED
083R_Operationalising a concept: The systematic review of composite indicator building for measuring community disaster resilience (research summary)
byWhat is The Future for Cities?
0 ratings
0% found this document useful
Why Microservices Are Better Than Cloud Computing: This episode on Systems—one of the four Domains of Data Science UVA uses to define the field—explores the challenges of cloud computing within the framework of biomedical research. Phil Bourne, Dean of the UVA School of Data Science, speaks with computational biologist and associate professor Nathan Sheffield about a paper they co-wrote on systemic issues from cloud platforms that do not support FAIRness, including platform lock-in, poor integration across platforms, and duplicated efforts for users and developers. They suggest instead prioritizing microservices and access to modular data in smaller chunks or summarized form. Emphasizing modularity and interoperability would lead to a more powerful Unix-like ecosystem of web services for biomedical analysis and data retrieval. The two discuss how funders, developers, and researchers can support microservices as the next generation of cloud-based bioinformatics. From Cloud Computing to
UNLIMITED
Why Microservices Are Better Than Cloud Computing: This episode on Systems—one of the four Domains of Data Science UVA uses to define the field—explores the challenges of cloud computing within the framework of biomedical research. Phil Bourne, Dean of the UVA School of Data Science, speaks with computational biologist and associate professor Nathan Sheffield about a paper they co-wrote on systemic issues from cloud platforms that do not support FAIRness, including platform lock-in, poor integration across platforms, and duplicated efforts for users and developers. They suggest instead prioritizing microservices and access to modular data in smaller chunks or summarized form. Emphasizing modularity and interoperability would lead to a more powerful Unix-like ecosystem of web services for biomedical analysis and data retrieval. The two discuss how funders, developers, and researchers can support microservices as the next generation of cloud-based bioinformatics. From Cloud Computing to
byUVA Data Points
0 ratings
0% found this document useful
Democratizing Causality - Aleksander Molak
UNLIMITED
Democratizing Causality - Aleksander Molak
byDataTalks.Club
0 ratings
0% found this document useful
GeoCalib: Learning Single-image Calibration with Geometric Optimization
UNLIMITED
GeoCalib: Learning Single-image Calibration with Geometric Optimization
byPapers Read on AI
0 ratings
0% found this document useful
Alignment Newsletter #164: How well can language models write code?: How well can language models write code?
UNLIMITED
Alignment Newsletter #164: How well can language models write code?: How well can language models write code?
byAlignment Newsletter Podcast
0 ratings
0% found this document useful
191R_Decision-making approach to urban energy retrofit – A comprehensive review (research summary)
UNLIMITED
191R_Decision-making approach to urban energy retrofit – A comprehensive review (research summary)
byWhat is The Future for Cities?
0 ratings
0% found this document useful

Skip carousel

The Big Idea Behind Big Data
NPR
UNLIMITED
The Big Idea Behind Big Data
Nov 17, 2017
As we find our way in a world shaped by Big Data, it's not the reams of information we gather but the networks they illuminate that's the newest addition to science's index of things, says Adam Frank.
6 min read
What to Do When Genius Fails
Nautilus
UNLIMITED
What to Do When Genius Fails
Oct 17, 2014
4 min read
Statisticians' Call To Arms: Reject Significance And Embrace Uncertainty!
NPR
UNLIMITED
Statisticians' Call To Arms: Reject Significance And Embrace Uncertainty!
Mar 20, 2019
4 min read
Grid Modeling Overview: Four Types of Models Guiding the Transition to Clean Electricity
Union of Concerned Scientists
UNLIMITED
Grid Modeling Overview: Four Types of Models Guiding the Transition to Clean Electricity
Apr 25, 2022
6 min read
This Lens-free Microscope Fits On A Fingertip
Futurity
UNLIMITED
This Lens-free Microscope Fits On A Fingertip
Mar 5, 2018
3 min read
Quantum Simulators An Overview
Techfastly
UNLIMITED
Quantum Simulators An Overview
Oct 1, 2021
4 min read
Electric Vehicle Owners Could Sell Power Back To The Grid
Futurity
UNLIMITED
Electric Vehicle Owners Could Sell Power Back To The Grid
Nov 17, 2021
2 min read
How Spooky Science Helps Us Peer Inside The Planets
All About Space
UNLIMITED
How Spooky Science Helps Us Peer Inside The Planets
Dec 3, 2020
An assistant professor of computational science at the EPFL research centre in Lausanne, Switzerland, involved in the current research on metallic hydrogen. Could you explain how the machine-learning techniques used in your research work? Why were th
1 min read
How Quantum Computing Can Fight Climate Change
APC
UNLIMITED
How Quantum Computing Can Fight Climate Change
Nov 28, 2022
8 min read
Quantum Computing Is Here…with One Small Caveat
PC Pro Magazine
UNLIMITED
Quantum Computing Is Here…with One Small Caveat
Jan 4, 2024
7 min read
Quantum Computing and The Rise Of Machine Learning
Techfastly
UNLIMITED
Quantum Computing and The Rise Of Machine Learning
Oct 1, 2021
2 min read
Memristor Setup Could Make Computer Chips More Efficient
Futurity
UNLIMITED
Memristor Setup Could Make Computer Chips More Efficient
Jul 31, 2018
A new way of arranging advanced computer components called memristors on a chip could pave the way for their use in general computing. This could cut energy consumption by a factor of 100. Using memristors would improve performance in low power envir
2 min read
Quantum Computing Is Here… With One Small Caveat
APC
UNLIMITED
Quantum Computing Is Here… With One Small Caveat
Feb 5, 2024
8 min read
How Quantum Computing Can Fight Climate Change
PC Pro Magazine
UNLIMITED
How Quantum Computing Can Fight Climate Change
Oct 8, 2022
8 min read
New Quantum Algorithms Finally Crack Nonlinear Equations
Quanta
UNLIMITED
New Quantum Algorithms Finally Crack Nonlinear Equations
Jan 5, 2021
4 min read
Data Centers Aren’t The Energy Hogs We Thought
Futurity
UNLIMITED
Data Centers Aren’t The Energy Hogs We Thought
Feb 28, 2020
2 min read
Ceramic Design with Artificial Intelligence
Ceramics: Art and Perception
UNLIMITED
Ceramic Design with Artificial Intelligence
Sep 29, 2023
Technology determines design in different phases of time, and must adapt to corresponding methods and media. With the continuous development of science and technology, traditional ceramic technology and culture faces on-going transformation and upgra
8 min read
Business applications For Quantum computing
Rotman Management
UNLIMITED
Business applications For Quantum computing
May 1, 2022
COMPUTERS DO ARITHMETIC. Underlying every amazing application of computers today is math, calculated using binary digits or ‘bits.’ The original computers of the early 1950s could perform about 465 multiplications per second — much faster than the ‘h
11 min read
Under The Hood
GP Racing UK
UNLIMITED
Under The Hood
Oct 17, 2024
Elsewhere in this issue we discuss the difficulties of relating experimental aerodynamics to what is happening to the physical car on track, and how our tools for aerodynamic research all have their limitations (p52). During this we refer to computat
4 min read
Comparing Time Series Data Like A Pro
Linux Format
UNLIMITED
Comparing Time Series Data Like A Pro
Jun 1, 2021
8 min read
System Shaves 75% Off Electric Vehicle Battery Test Time
Futurity
UNLIMITED
System Shaves 75% Off Electric Vehicle Battery Test Time
Jun 29, 2022
3 min read
Different Training Could Cut AI Power Use By 30%
Futurity
UNLIMITED
Different Training Could Cut AI Power Use By 30%
Nov 19, 2024
A less wasteful way to train large language models, such as the GPT series, finishes in the same amount of time for up to 30% less energy, according to a new study. The approach could save enough energy to power 1.1 million US homes in 2026, based on
2 min read
Test Gets Quantum Computers To Check Their Own Work
Futurity
UNLIMITED
Test Gets Quantum Computers To Check Their Own Work
Nov 18, 2019
3 min read
Clever CAD Coding For Clients And Cigars
Linux Format
UNLIMITED
Clever CAD Coding For Clients And Cigars
Apr 2, 2024
Credit: http://openscad.org Tam Hanna’s minimal creative capability makes him ideally suited to teaching all kinds of workarounds for problems that require the use of creativity. Catch up by ordering back issues on page 58! The experiments performed
7 min read
Why The Future Needs Optical Data Centres
PC Pro Magazine
UNLIMITED
Why The Future Needs Optical Data Centres
Sep 10, 2020
9 min read
Tiny Device Forces Us To Rethink ‘What Is A Computer’?
Futurity
UNLIMITED
Tiny Device Forces Us To Rethink ‘What Is A Computer’?
Jun 22, 2018
Researchers have developed a computer device that measures just 0.3 mm to a side—dwarfed by a grain of rice. IBM’s announcement that they had produced the world’s smallest computer back in March raised a few eyebrows at the University of Michigan, ho
3 min read
AI Could Mine The Past For Faster, Better Weather Forecasts
Futurity
UNLIMITED
AI Could Mine The Past For Faster, Better Weather Forecasts
Dec 17, 2020
2 min read
Is The Future sustainable?
PC Pro Magazine
UNLIMITED
Is The Future sustainable?
Jun 8, 2023
8 min read
This Material Makes Beautiful, Potentially Useful Rainbows
Futurity
UNLIMITED
This Material Makes Beautiful, Potentially Useful Rainbows
Sep 8, 2021
2 min read
Loop The Loop
Racecar Engineering
UNLIMITED
Loop The Loop
Oct 1, 2021
5 min read

Related categories

Skip carousel

Reviews for Applied Longitudinal Analysis

Rating: 3 out of 5 stars

3/5

2 ratings0 reviews

Book preview

Applied Longitudinal Analysis - Garrett M. Fitzmaurice

Part I

Introduction to Longitudinal and Clustered Data

Chapter 1

Longitudinal and Clustered Data

1.1 INTRODUCTION

Research on statistical methods for the design and analysis of human investigations expanded explosively in the second half of the twentieth century. Beginning in the early 1950s, the U.S. government shifted a substantial part of its research support from military to biomedical research. The legislative foundation for the modern National Institutes of Health (NIH), the Public Health Service Act, was passed in 1944 and NIH grew rapidly throughout the 1950s and 1960s. During these golden years of NIH expansion, the entire NIH budget grew from $8 million in 1947 to more than $1 billion in 1966. The NIH sponsored many of the important epidemiologic studies and clinical trials of that period, including the influential Framingham Heart Study (Dawber et al., 1951; Dawber, 1980).

The typical focus of these early studies was morbidity and, especially, mortality. Investigators sought to identify the causes of early death and to evaluate the effectiveness of treatments for delaying death and morbidity. In the Framingham Heart Study, participants were seen at two-year intervals. Survival outcomes during successive two-year periods were treated as independent events and modeled using multiple logistic regression. The successful use of multiple logistic regression in this setting, and the recognition that it could be applied to case-control data, led to widespread use of this methodology beginning in the 1960s. The analysis of time-to-event data was revolutionized by the seminal 1972 paper of D. R. Cox, describing the proportional hazards model (Cox, 1972). This paper was followed by a rich and important body of work that established the conceptual basis and the computational tools for modern survival analysis.

Although the design of the Framingham Heart Study and other cohort studies called for periodic measurement of the patient characteristics thought to be determinants of chronic disease, interest in the levels and patterns of change of those characteristics over time was initially limited. As the research advanced, however, investigators began to ask questions about the behavior of these risk factors. In the Framingham Heart Study, for example, investigators began to ask whether blood pressure levels in childhood were predictive of hypertension in adult life. In the Coronary Artery Risk Development in Young Adults (CARDIA) Study, investigators sought to identify the determinants of the transition from normotensive or normocholesterolemic status in early adult life to hypertension and hypercholesterolemia in middle age (Friedman et al., 1988). In the treatment of arthritis, asthma, and other diseases that are not typically life-threatening, investigators began to study the effects of treatments on the level and change over time in measures of severity of disease. Similar questions were being posed in every disease setting. Investigators began to follow populations of all ages over time, both in observational studies and clinical trials, to understand the development and persistence of disease and to identify factors that alter the course of disease development.

This interest in the temporal patterns of change in human characteristics came at a period when advances in computing power made new and more computationally intensive approaches to statistical analysis available at the desktop. Thus, in the early 1980s, Laird and Ware proposed the use of the EM algorithm to fit a class of linear mixed effects models appropriate for the analysis of repeated measurements (Laird and Ware, 1982); Jennrich and Schluchter (1986) proposed a variety of alternative algorithms, including Fisher-scoring and Newton–Raphson algorithms. Later in the decade, Liang and Zeger introduced the generalized estimating equations in the biostatistical literature and proposed a family of generalized linear models for fitting repeated observations of binary and counted data (Liang and Zeger, 1986; Zeger and Liang, 1986). Many other investigators writing in the biomedical, educational, and psychometric literature contributed to the rapid development of methodology for the analysis of these longitudinal data. The past 30 years have seen considerable progress in the development of statistical methods for the analysis of longitudinal data. Despite these important advances, methods for the analysis of longitudinal data have been somewhat slow to move into the mainstream. This book bridges the gap between theory and application by presenting a comprehensive description of methods for the analysis of longitudinal data accessible to a broad range of readers.

1.2 LONGITUDINAL AND CLUSTERED DATA

The defining feature of longitudinal studies is that measurements of the same individuals are taken repeatedly through time, thereby allowing the direct study of change over time. The primary goal of a longitudinal study is to characterize the change in response over time and the factors that influence change. With repeated measures on individuals, one can capture within-individual change. Indeed, the assessment of within-subject changes in the response over time can only be achieved within a longitudinal study design. For example, in a cross-sectional study, where the response is measured at a single occasion, one can only obtain estimates of between-individual differences in the response. That is, a cross-sectional study may allow comparisons among sub-populations that happen to differ in age, but it does not provide any information about how individuals change during the corresponding period.

To highlight this important distinction between cross-sectional and longitudinal study designs, consider the following simple example. Body fatness in girls is thought to increase just before or around menarche, leveling off approximately 4 years after menarche. Suppose that investigators are interested in determining the increase in body fatness in girls after menarche. In a cross-sectional study design, investigators might obtain measurements of percent body fat on two separate groups of girls: a group of 10-year-old girls (a pre-menarcheal cohort) and a group of 15-year-old girls (a post-menarcheal cohort). In this cross-sectional study design, direct comparison of the average percent body fat in the two groups of girls can be made using a two-sample (unpaired) t-test. This comparison does not provide an estimate of the change in body fatness as girls age from 10 to 15 years. The effect of growth or aging, an inherently within-individual effect, simply cannot be estimated from a cross-sectional study that does not obtain measures of how individuals change with time. In a cross-sectional study the effect of aging is potentially confounded with possible cohort effects. Put in a slightly different way, there are many characteristics that differentiate girls in these two different age groups that could distort the relationship between age and body fatness. On the other hand, a longitudinal study that measures a single cohort of girls at both ages 10 and 15 can provide a valid estimate of the change in body fatness as girls age. In the longitudinal study the analysis is based on a paired t-test, using the difference or change in percent body fat within each girl as the outcome variable. This within-individual comparison provides a valid estimate of the change in body fatness as girls age from 10 to 15 years. Moreover, since each girl acts as her own control, changes in percent body fat throughout the duration of the study are estimated free of any between-individual variation in body fatness.

A distinctive feature of longitudinal data is that they are clustered. In longitudinal studies the clusters are composed of the repeated measurements obtained from a single individual at different occasions. Observations within a cluster will typically exhibit positive correlation, and this correlation must be accounted for in the analysis. Longitudinal data also have a temporal order; the first measurement within a cluster necessarily comes before the second measurement, and so on. The ordering of the repeated measures has important implications for analysis. There are, however, many studies in the health sciences that are not longitudinal in this sense but which give rise to data that are clustered or cluster-correlated. For example, clustered data commonly arise when intact groups are randomized to health interventions or when naturally occurring groups in the population are randomly sampled. An example of the former is group-randomized trials. In a group-randomized trial, also known as a cluster-randomized trial, groups of individuals, rather than each individual alone, are randomized to different treatments or health interventions. Data on the health outcomes of interest are obtained on all individuals within a group. Alternatively, clustered data can arise from random sampling of naturally occurring groups in the population. Families, households, hospital wards, medical practices, neighborhoods, and schools are all instances of naturally occurring clusters in the population that might be the primary sampling units in a study. Finally, clustered data can arise when data on the health outcome of interest are simultaneously obtained either from multiple raters or from different measurement instruments.

In all these examples of clustered data, we might reasonably expect that measurements on units within a cluster are more similar than the measurements on units in different clusters. The degree of clustering can be expressed in terms of correlation among the measurements on units within the same cluster. This correlation invalidates the crucial assumption of independence that is the cornerstone of so many standard statistical techniques. Instead, statistical models for clustered data must explicitly describe and account for this correlation. Because longitudinal data are a special case of clustered data, albeit with a natural ordering of the measurements within a cluster, this book includes a description of modern methods of analysis for clustered data, more broadly defined. Indeed, one of the goals of this book is to demonstrate that methods for the analysis of longitudinal data are, more or less, special cases of more general regression methods for clustered data. As a result a comprehensive understanding of methods for the analysis of longitudinal data provides the basis for a broader understanding of methods for analyzing the wide range of clustered data that commonly arises in studies in the biomedical and health sciences.

The examples described above consider only a single level of clustering, for example, repeated measurements on individuals. More recently investigators have developed methodology for the analysis of multilevel data, in which observations may be clustered at more than one level. For example, the data may consist of repeated measurements on patients clustered by clinic. Alternatively, the data may consist of observations on children nested within classrooms, nested within schools. Although the analysis of multilevel data is not the primary focus of this book, multilevel data are discussed in Chapter 22.

Interest in the analysis of longitudinal and multilevel data continues to grow. New and more flexible models have been developed and advances in computation, such as Markov chain Monte Carlo (MCMC) methods, have allowed greater flexibility in model specification. Moreover, improvements in statistical software packages, especially SAS, Stata, SPSS, R, and S-Plus, have made these models much more accessible for use in routine data analysis. Despite these advances, however, methods for the analysis of longitudinal data are not widely used and are seen to be accessible only to statisticians with specialized expertise.

We believe that the methodology for the analysis of longitudinal data can be much more widely understood and applied. It is our hope that this book will help make that possible. It provides a comprehensive introduction to methods for the analysis of longitudinal data, written for a reader with a basic knowledge of statistics and a strong background in regression analysis. The book does not require a high level of mathematical preparation but does assume a willingness to read and consider mathematical ideas.

1.3 EXAMPLES

To highlight some of the distinctive features of longitudinal and clustered data, we introduce four examples drawn from studies in the biomedical sciences. These four examples will be used later in the book to illustrate different analytic approaches. Additional examples, also drawn from studies in the biomedical and health sciences, will be introduced in later chapters of the book.

1.3.1 Treatment of Lead-Exposed Children (TLC) Trial

Exposure to lead can produce cognitive impairment, especially among young children and infants. A young child exposed to high levels of lead may experience various adverse health effects, including hyperactivity, hearing or memory loss, learning disabilities, and damage to the brain and nervous system. Although the use of lead as an additive in gasoline has been discontinued, at least in the United States, resulting in a dramatic reduction in airborne lead levels, a small percentage of children continue to be exposed to lead at levels that can produce impairment. Much of this exposure is due to deteriorating lead-based paint (e.g., chipping and peeling paint) in older homes. Lead was used as a pigment and drying agent in alkyd oil-based paint. While the United States government banned the use of lead-based paint in housing in 1978, many homes built before 1978 contain lead-based paint. When lead-based paint deteriorates, it becomes lead paint chips, which can be eaten by young children, and lead-contaminated paint dust, which can be ingested by young children during normal teething and hand-to-mouth behavior. The U.S. Centers for Disease Control and Prevention (CDC) has concluded that children with blood lead levels above 10 micrograms per deciliter (μg/dL) of whole blood are at risk of adverse health effects.

Lead poisoning in children is treatable in the sense that there are medical interventions, known as chelation treatments, that can help a child to excrete the lead that has been ingested. Until recently chelation treatment of children with high levels of blood lead was administered by injection and required hospitalization. A new chelating agent, succimer, enhances urinary excretion of lead and has the distinct advantage that it can be given orally, rather than by injection. In the 1990s the Treatment of Lead-Exposed Children (TLC) Trial Group conducted a placebo-controlled, randomized trial of succimer in children with confirmed blood lead levels of 20 to 44 μg/dL, levels well above the CDC’s threshold for concern about the adverse health effects of exposure to lead (Treatment of Lead-Exposed Children (TLC) Trial Group, 2000; Rogan et al., 2001). The children were aged 12 to 33 months at enrollment and lived in deteriorating inner city housing. The mean age of the children at randomization was 2 years and the mean blood lead level was 26 μg/dL. Children received up to three 26-day courses of succimer or placebo and were followed for 3 years.

Table 1.1 presents data on blood lead levels at baseline, week 1, week 4, and week 6 for 10 randomly selected children from the study. The mean blood lead levels at each measurement occasion for a random subset of 100 children, broken down by treatment group, are presented in Table 1.2. As expected, due to randomization, the mean response at baseline is similar in the two treatment groups. However, there are discernible differences in the patterns of change in the mean response over time. A graphical presentation of the mean blood lead levels at each occasion is displayed in Figure 1.1. Note that at week 1 there appears to be a dramatic drop in initial blood lead levels among the children treated with succimer. However, this is followed by a rebound in blood lead levels, as lead stored in the bones and tissues is mobilized and a new equilibrium is achieved. In contrast, for the children treated with placebo, the trend in the mean response over time is relatively flat.

Fig. 1.1 Plot of mean blood lead levels at baseline, week 1, week 4, and week 6 in the succimer and placebo groups.

Table 1.1 Blood lead levels (μg/dL) at baseline, week 1, week 4, and week 6 for 10 randomly selected children from the TLC trial.

Table 1.2 Mean blood lead levels (and standard deviation) at baseline, week 1, week 4, and week 6 for children from the TLC trial.

1.3.2 Muscatine Coronary Risk Factor Study

In 1998 the American Heart Association (AHA) announced that obesity had been added to the AHA’s list of major preventable risk factors for coronary heart disease. These major preventable risk factors include smoking, high blood cholesterol, high blood pressure, and sedentary lifestyle. Unlike risk factors that cannot be altered, such as heredity, increasing age, and being male, obesity is a risk factor that many individuals can alter and control. The medical definition of obesity is quite simple: an excess of body fat. Obesity is primarily caused by consuming too many calories and not getting enough physical exercise. Obesity can lead to higher blood cholesterol and triglyceride levels, lower HDL cholesterol (HDL cholesterol, the good cholesterol, has been linked to lower risk of coronary heart disease), and higher blood pressure. Thus obesity can contribute to higher coronary risk in a variety of different ways.

Public health scientists now accept that obesity is a chronic disease, just like high blood pressure or high blood cholesterol. Its causes are a complex, individualized combination of genetics, behavior, and lifestyle. There is also increased awareness that obese children are at increased risk for obesity as adults.

In 1970 researchers from the University of Iowa began to examine the links between child and adult coronary health. Of particular interest were the associations between coronary risk factors in youth and coronary disease in adults. The Muscatine Coronary Risk Factor (MCRF) study, a longitudinal survey of school-age children in Muscatine, Iowa, had the goal of examining the development and persistence of risk factors for coronary disease in children (Woolson and Clarke, 1984; Lauer et al., 1997). In the MCRF study, weight and height measurements of five cohorts of children, initially aged 5–7, 7–9, 9–11, 11–13, and 13–15 years, were obtained biennially from 1977 to 1981. Data were collected on 4856 boys and girls. On the basis of a comparison of their weight to age-gender specific norms, children were classified as obese or not obese. One objective was to determine whether the prevalence of obesity increases with age and whether patterns of change in obesity are the same for boys and girls.

A summary of the obesity data for children in one of the five cohorts, who were 7–9 years old in 1977, is presented in Table 1.3. Because all the variables are discrete, the data can be summarized as counts in a contingency table. For example, the first 8 rows of Table 1.3 provide a count of the number of children with each of the 8 (or 2³) possible sequences of binary responses over the three measurement occasions. A similar table could be constructed for each of the remaining four cohorts of children. Note that although each child was eligible to participate in all three surveys, the data are incomplete for many children. Less than 40% of the children provided complete data at all three measurement occasions. For convenience, in Table 1.3 the missingness of obesity is treated as a third category of the obesity status variable.

Table 1.3 Obesity status of cohort of children, aged 7–9 at entry, from the Muscatine study.

1.3.3 Clinical Trial of an Anti-epileptic Drug

Epilepsy is a chronic neurologic disorder that may result from brain injury, developmental malformation, or a genetic abnormality. It is characterized by recurrent seizures caused by sudden, excessive electrical activity in the brain. Seizures are classified as generalized, in which the electrical discharge occurs throughout the brain, and partial onset, wherein the electrical activity is localized.

Data for the third example come from a placebo-controlled clinical trial of 59 epileptics conducted by Leppik et al. (1987). Patients with partial seizures were enrolled in a randomized clinical trial of the anti-epileptic drug, progabide. Participants in the study were randomized to either progabide or a placebo, as an adjuvant to the standard anti-epileptic chemotherapy. Progabide is an anti-epileptic drug whose primary mechanism of action is to enhance gamma-aminobutyric acid (GABA) content; GABA is the primary inhibitory neurotransmitter in the brain.

Prior to receiving treatment, baseline data on the number of epileptic seizures during the preceding 8-week interval were recorded. Counts of epileptic seizures during 2-week intervals before each of four successive post-randomization clinic visits were recorded. The average rates of seizures (per week) at baseline and in the four post- randomization visits are presented in Table 1.4. A graphical presentation of the average rates of seizures at each occasion in the progabide and placebo groups is displayed in Figure 1.2. The main goal of the study was to compare the changes in the average rates of seizures in the two groups.

Fig. 1.2 Mean rate of seizures (per week) at baseline, week 2, week 4, week 6, and week 8 in the progabide and placebo groups.

Table 1.4 Mean rate of seizures per week (and standard deviation) at baseline, week 2, week 4, week 6, and week 8 in the clinical trial of progabide.

1.3.4 Connecticut Child Surveys

There is now accumulating evidence that the rates of psychiatric disorders in children are substantial, with reported population prevalence rates of childhood psychopathology ranging from 12% to 22%. However, children are considered to be unreliable in reporting on their own psychopathology. As a result many contemporary surveys of childhood psychopathology use proxy informants, usually a child’s parent (or primary caregiver) and teacher, to report on the child’s psychiatric status. In numerous studies the agreement among multiple informant reports on the child’s psychopathology has been found to be poor. It is thought that much of this disagreement is less a result of the unreliability of the informant reports than of true differences in children’s behaviors and emotions across different situations and settings, notably in the home and school. A central issue in studies of risk factors for childhood psychopathology is utilization of the information obtained about the child’s mental health status from multiple sources or informants.

Data for our example come from two parallel epidemiological surveys that assessed the mental health and service needs of children, aged 6 to 11, in rural and urban communities in Connecticut (Zahner et al., 1992, 1993). The first survey, the New Haven Child Survey (NHCS), was conducted in 1986 and 1987 in New Haven, Connecticut, a predominantly minority metropolitan center. The second survey, the Eastern Connecticut Child Survey (ECCS) was conducted in 1988 and 1989 and replicated the NHCS in a non-metropolitan planning region covering the eastern third of Connecticut. The two studies used comparable survey procedures. In particular, they used parallel questionnaires designed to be self-administered by the children’s parents and teachers. Children’s emotional and behavioral problems were assessed with the Child Behavior Checklist (CBCL) and the Teacher’s Report Form (TRF), 118-item symptom inventories covering problems commonly seen in child guidance clinics. The CBCL and TRF scales do not provide diagnoses of psychiatric disorders; instead, they provide broad-band measures of emotional (or internalizing) and behavioral (or externalizing) disturbance. The CBCL and TRF scale scores can be dichotomized at published clinical cut-points.

Thus the New Haven Child Survey and the Eastern Connecticut Child Survey provided both a parent’s and a teacher’s report of psychiatric disturbance in the child as assessed by parallel forms of a standardized psychiatric symptom checklist. These data provide multiple source (here, from two sources: the parent and teacher) information on the psychiatric outcome variable of interest. Of note, these data are cross-sectional but the two sources of information about each child’s psychopathology are likely to be positively correlated. Thus data from the Connecticut Child Surveys are an example of clustered, but not longitudinal, data. In this setting, unlike a typical longitudinal study, the major interest of the analysis is not in changes in the response over time. Instead, the major focus of the analysis is on the effects of subject-specific covariates on the outcome.

Table 1.5 displays social and demographic characteristics of the children and the overall rates of externalizing disturbance as determined by CBCL and TRF scale scores in the clinical range.

Table 1.5 Frequency distribution for variables from the Connecticut Child Surveys.

The four examples considered in this section differ in terms of outcome variable, study design, and goals or objectives of the analysis. In the first example from the TLC trial, the outcome variable, blood lead level, is continuous. In the second example from the MCRF study, the outcome variable, obesity status, is binary. In the third example from the clinical trial of progabide, the outcome variable is a count. These three examples illustrate the diverse types of longitudinal data that arise in the health and medical sciences. A notable feature of the second example is the amount of missing data. Missing data are a common problem in longitudinal studies in the health sciences. As we will discuss in later chapters, one will need to examine the reasons for any missingness to determine the validity of inferences about changes in the response over time. Next, consider the design of these studies. The first and third examples are experiments, where the treatments have been chosen by the investigators and randomly assigned to the study participants. The second example is an observational study where the study participants are followed forward in time to observe the outcome variable at future time points; however, unlike the randomized clinical trial, the investigators cannot directly control the comparability of groups (here, males and females). While the first three examples involve longitudinal study designs, the fourth example is a cross-sectional observational study. In the Connecticut Child Surveys, variables are measured at a single time point on a sample of children. Because information on the outcome variable of interest is obtained from two sources (the parent and teacher), these data are also clustered. Finally, we note that the goals of the analysis are similar for the first three examples: characterize the change in the outcome variable over time and the factors that influence change. In the fourth example, however, the objective of the analysis is not to characterize change in the outcome variable over time. Instead, the goal is to examine the effects of subject-specific covariates on the outcome. In later chapters we describe modern methods for analyzing diverse types of longitudinal data arising from both experiments and observational studies. Because longitudinal data are a special case of clustered data, we also describe methods of analysis for clustered data, more broadly defined.

1.4 REGRESSION MODELS FOR CORRELATED RESPONSES

In the last 30 years we have seen remarkable advances in methods for analyzing longitudinal and clustered data. In particular, we now have a broad and flexible class of models for correlated data based on a regression paradigm. Indeed, all the methods that are described in later chapters can be thought of as regression models for correlated responses. In this section we provide motivation for the regression paradigm for correlated responses.

Regression models are widely used and provide a very general and versatile approach for analyzing data. Our use of the term regression model here is not strictly limited to the standard linear regression model for a continuous response variable. Instead, we use this term more broadly to refer to any model that describes the dependence of the mean of a response variable on a set of covariates in terms of some form of regression equation. While the simplest case is the familiar linear regression model for a continuous response variable, there are many possible generalizations. For example, regression models have been developed for other response variables, such as binary responses or counts. For the binary response variable, linear logistic regression has been widely used for many applications. For counts, Poisson or log-linear regression is often appropriate. Another important generalization is to observations that cannot be assumed to be statistically independent of one another, that is, regression models for correlated responses. In later chapters we consider both kinds of generalizations of the standard linear regression model.

Note that the term linear has appeared in all three of the examples of regression models considered so far. Linearity in this setting has a very precise meaning and refers to the fact that all of these models for the mean (or some transformation of the mean) are linear in the regression parameters. For example, letting Y denote the response variable and X a covariate, the following three models for the mean response

equation

and

equation

are all cases where the mean is linear in the regression parameters (where E(Y|X) denotes the conditional mean or expectation of Y given X). All three models are linear in the regression parameters, even if the latter two are non-linear in the covariate. In this book we only consider models where the mean response, or some suitable transformation of the mean response (e.g., log transformation in Poisson regression), is linear in the regression parameters. We do not consider models that are fundamentally non-linear in the regression parameters. For example, the following two models

equation

and

equation

are cases where the mean is non-linear in the regression parameters. However, we remind the reader that our focus on models that are linear in the regression parameters does not preclude relationships between the mean response and covariates that are curvilinear or non-linear. This type of non-linearity can be accommodated by taking appropriate transformations of the mean response (e.g., log transformation in Poisson regression) and the covariates (e.g., log(dose)), and/or by including polynomials. For example, a quadratic trend in the mean response over time can be incorporated by including both time and time² in the regression model. The inclusion of transformed covariates in no way violates the linearity of the regression model; that is, the model is still linear in the regression parameters.

As noted earlier, we use the term regression model to refer to any model that describes the dependence of the response variable on a set of covariates in some form of regression equation. In particular, the regression parameters express how the mean of the response variable depends on the covariates. For example, in the case of the linear regression model for a continuous response, the regression coefficients express the dependence of the mean of the outcome in terms of a linear combination of the covariates. In the linear logistic model for a binary response, the regression coefficients express the dependence of the log odds of a positive response in terms of a linear combination of the covariates. Note, however, that the log odds is simply a non-linear transformation of the mean or probability of a positive response. Thus in both cases the mean of the response variable, or some appropriate transformation of the mean, is related to a linear combination of the covariates.

One appealing aspect of the regression paradigm concerns the nature of the explanatory variables. A feature of the regression modeling approach is that it can incorporate mixtures of discrete and continuous covariates in a relatively seamless fashion. That is, the covariates can be continuous (and often referred to as quantitative), such as body weight, age, time, and dose. Furthermore the mean response, or any suitable transformation of the mean, can be related to a continuous covariate in a curvilinear or non-linear fashion by simply taking an appropriate transformation of the covariate or by the inclusion of polynomials (e.g., time and time²). Alternatively, the covariates can be discrete (or qualitative), such as gender and treatment group. Finally, regression models can include mixtures of discrete and continuous covariates, and products among them. As a result, within a regression paradigm, it is no more difficult to analyze longitudinal data arising from a carefully designed experiment with a single qualitative covariate or factor (e.g., a randomized placebo-controlled longitudinal clinical trial) than from an observational study where there are many covariates, some of which are discrete, the others continuous. Of note, in the latter case, regression models can often be used to distinguish within- and between-subject trends in the response (e.g., longitudinal versus cross-sectional effects of age); this topic will be discussed in greater depth in later chapters.

Regression models can usually be formulated in such a way that certain regression parameters have interpretations that bear directly on the scientific question of main interest. For example, in a regression model for data from a longitudinal clinical trial, a particular regression coefficient can be given an interpretation in terms of the constant rate of change in the mean response over time in one of the treatment groups. Alternatively, the absence (or setting to zero) of a particular regression coefficient can be given an interpretation in terms of two treatment groups having the same underlying rate of change in the response variable over time.

So far we have emphasized that it is not necessary to distinguish whether the covariates are continuous or discrete (or a mixture of the two) within a regression paradigm. However, from a purely historical perspective, linear models for a continuous response with only discrete covariates have often been referred to as analysis of variance (ANOVA) models. In contrast, linear models for a continuous response with only continuous covariates have often been referred to as linear regression models. Indeed, some textbooks and courses in statistics present linear regression and analysis of variance as almost distinct analytic procedures. A large part of the reason for this arbitrary distinction is historical. Analysis of variance had its earliest roots in agricultural applications, especially carefully designed experiments where the responses (e.g., crop yield) could be indexed by one or more classifying factors (e.g., plot, crop variety) or qualitative experimental factors (e.g., different types of fertilizers). In contrast, linear regression was initially developed for the analysis of observational data. Some of the earliest applications of linear regression can be traced back to astronomy. By their very nature the data arising from studies in astronomy were purely observational (e.g., the positions and magnitudes of the heavenly bodies) and not the product of experimental manipulations. As a result of their somewhat different historical roots, ANOVA and linear regression have often been presented as almost distinct procedures, intended for the analysis of data arising from studies that differ in design (experimental versus observational) and the nature of the covariates (discrete versus continuous). Later it was recognized that linear regression is a very general model that incorporates analysis of variance as a special case.

Thus, although many of the commonly used statistical models for correlated data were originally developed for data arising from studies that differed in design, aims, and the nature of the covariates, almost all of these developments fall within the regression paradigm for correlated data. So from a purely pedagogical perspective, it is not necessary to distinguish methods for analyzing longitudinal or correlated data arising from observational studies and from studies with experimental designs. From this point of view, we have purposely chosen not to focus on many of the early developments in methodology for analyzing correlated data, for example, the repeated measures ANOVA and multivariate analysis of variance (MANOVA). Instead, we focus on a more general and versatile regression paradigm that encompasses most, if not all, of the earlier developments as special cases but can also handle all of the complexities that arise in applications. When viewed as special cases within the regression paradigm, the underlying (and often unrealistic) assumptions made by many of the earliest methods for analyzing correlated data are more readily understood.

In summary, we view the regression paradigm as a very flexible and versatile approach for analyzing longitudinal and correlated data arising from many different types of studies. Regression models can provide a parsimonious description or explanation of how the mean response in a longitudinal study changes with time, and how these changes are related to covariates of interest. Thus our use of regression models is primarily intended for descriptive purposes, that is, for determining the most salient aspects of patterns of change in the mean response. While this does not necessarily preclude their use as a possible explanation of the underlying probabilistic data generating mechanism that might have produced the repeated responses, the latter is not considered to be the main focus of the analysis. Instead, our primary goal is to provide a simple description of the discernible patterns of change in the response over time, and their relation to covariates, via regression coefficients that bear directly on the scientific questions of main interest.

1.5 ORGANIZATION OF THE BOOK

The book is organized into five main parts. The first part, consisting of Chapters 1 and 2, provides the reader with an overview of the most salient aspects of longitudinal data. In Chapter 2, we introduce some notation and many of the analytic issues that arise with longitudinal data. We discuss the main features that distinguish longitudinal data from cross-sectional data. We highlight the major goals and objective of longitudinal analysis. We consider the aspect of longitudinal data that complicates their analysis, namely the correlation among repeated measures on the same individuals. We provide some intuition for how and why the correlation arises in longitudinal data and the potential consequences of ignoring it in the analysis.

The second part, consisting of Chapters 3 through 10, focuses on methods for analyzing longitudinal data when the response variable is continuous and assumed to have an approximate multivariate normal (or Gaussian) distribution. In Chapter 3, we introduce a general linear regression model for longitudinal data. We present a broad overview of different approaches for modeling the mean response over time and for accounting for the correlation among repeated measures on the same individual. These topics are discussed in much greater depth in subsequent chapters. In Chapter 4, we discuss estimation, via the method of maximum likelihood (ML), and inference concerning the regression coefficients and the covariance among the repeated measures. Longitudinal data present us with two aspects of the data that require modeling: the mean response over time and the covariance among repeated measures on the same individuals. In Chapters 5 and 6, the emphasis is on modeling the mean response. Two main approaches are distinguished: the analysis of response profiles (Chapter 5) and parametric or semiparametric curves (Chapter 6). In Chapter 7, we discuss models for the covariance in longitudinal data and develop an overall modeling strategy that takes account of the interdependence between the models for the mean and covariance. Chapter 8 introduces a very flexible class of models for analyzing longitudinal data known as linear mixed effects models. These models assume that some subset of the regression parameters vary randomly from one individual to another, thereby accounting for sources of natural heterogeneity in the population. Specifically, the mean response is modeled as a combination of fixed effects that are assumed to be shared by all individuals, and random effects that are unique or specific to a particular individual. In Chapter 9, we discuss an alternative, but closely related, class of regression models for longitudinal data known as linear fixed effects models. These models treat the subject-specific effects as fixed rather than random. We review the main features of linear fixed effects models for longitudinal data and discuss their potential advantages and disadvantages relative to linear mixed effects models. In Chapter 10, we discuss residual diagnostics for assessing the adequacy of models for longitudinal data and for detecting outlying observations and/or outlying individuals.

The chapters in the second part of the book cover many of the well-established methods for the analysis of longitudinal data and provide the foundation for future chapters that focus on discrete response variables (e.g., repeated binary responses and repeated count data). The third part, consisting of Chapters 11 through 16, focuses on methods for analyzing longitudinal data with outcomes that are not continuous. When the response is discrete, linear models are no longer appropriate for relating the mean to covariates. Instead, we consider extensions of generalized linear models for longitudinal data. In Chapter 11, we review the most salient features of generalized linear models for a single, univariate response; in later chapters, we discuss how generalized linear models can be extended to handle longitudinal responses. In generalized linear models a suitable non-linear transformation of the mean response is related to the covariates. However, this non-linearity raises some additional issues concerning the interpretation of the regression coefficients. In Chapters 12 through 15, we present two classes of models for analyzing discrete longitudinal data that account for the correlation among repeated measures in fundamentally different ways. In Chapter 16, we compare and contrast these two classes of models. One of the underlying themes emphasized in Chapters 12 through 16 concerns how different models for discrete longitudinal data have somewhat different targets of inferences. Thus, to ensure that the regression parameters bear directly on the question of scientific interest, greater care is needed in the choice of model for discrete longitudinal data.

The fourth part of the book, consisting of Chapters 17 and 18, addresses the issue of missing data in longitudinal studies. In Chapter 17, we review the assumptions about missing data required to ensure that the methods discussed in earlier chapters provide valid inferences. Two methods for handling missing data, multiple imputation and inverse probability weighted methods, are discussed in detail in Chapter 18.

The final part of the book, consisting of Chapters 19 through 22, focuses on a number of advanced topics. In Chapter 19, we discuss smoothing methods for longitudinal analysis that allow greater flexibility for the form of the relationship between the mean response and the covariates. This chapter focuses on the connection between penalized splines and linear mixed effects models. Chapter 20 considers the design of a longitudinal study, focusing on the determination of sample size and power. In Chapter 21, we discuss regression models for repeated measures and related designs and emphasize how the methods discussed in earlier chapters can be applied in these settings. In Chapter 22, we present an overview of methods for analyzing multilevel data. Chapters 21 and 22 demonstrate how regression models for longitudinal data are special cases of general regression models for correlated data, more broadly defined.

1.6 FURTHER READING

The presentation of methodology for the analysis of longitudinal data in subsequent chapters assumes that the reader has a basic knowledge of statistics and a strong background in regression analysis. A useful review of introductory statistical principles and methods, targeted at applied researchers, can be found in the books by Pagano and Gauvreau (2000) and Altman (1990). A comprehensive overview of regression concepts can be found in Kleinbaum et al. (1998) and Gelman and Hill (2007); a more advanced presentation of similar topics can be found in Neter et al. (1996).

Chapter 2

Longitudinal Data: Basic Concepts

2.1 INTRODUCTION

In this chapter we present a broad overview of the main objectives of longitudinal analysis and some of the defining features of longitudinal data. Our primary goal is to emphasize that the major focus of the analysis of longitudinal data is on the assessment of within-individual changes in the response variable over time. That is, longitudinal analysis is concerned with estimating how individuals change throughout the duration of the study and examining the factors that influence heterogeneity among individuals in how they change over time. We also review the most salient features of longitudinal study designs, introduce some notation for longitudinal data, and highlight the main aspects of longitudinal data that complicate their analysis. Many of the concepts and issues introduced here will be discussed in much greater depth in later chapters of the book.

2.2 OBJECTIVES OF LONGITUDINAL ANALYSIS

In the health sciences, longitudinal studies play an important role in enhancing our understanding of the development and persistence of disease. There is much natural heterogeneity among individuals in terms of how diseases develop and progress. This heterogeneity is due to genetic, environmental, social, and behavioral factors. A longitudinal study design permits the discovery of individual characteristics that can explain these inter-individual differences in changes in health outcomes over time.

The distinguishing feature of longitudinal studies is that the study participants are measured repeatedly throughout the duration of the study, thereby permitting the direct assessment of changes in the response variable over time. In cross-sectional studies, where measurements are obtained at only a single point in time, it is not possible to assess individual changes on the basis of a single snapshot of the individual’s response taken at a given time. Thus the defining feature of a longitudinal study is that two or more observations of the response variable, taken at different times, are made on at least some of the study participants. Typically, although not always, longitudinal study designs call for a fixed number of repeated measurements to be made on all study participants at a set of common time points. The occasions of measurement are not necessarily distributed evenly throughout the duration of the study.

By obtaining measurements of the same individuals repeatedly through time, longitudinal studies can address fundamental questions concerning the assessment of within-individual changes in the response variable. The main goal, indeed the raison d’ětre, of a longitudinal study, is to characterize the change in the response over time. While the measurement of within-individual changes is a fundamental objective of a longitudinal study, it is also of interest to determine whether these within-individual changes in the response are related to selected covariates. For example, in the Treatment of Lead-Exposed Children Trial, introduced in Chapter 1, repeated measures of blood lead levels were obtained at baseline (or week 0), week 1, week 4, and week 6, thereby allowing assessment of within-individual changes in blood lead levels over a six-week period. In this study it was not simply of interest to describe the overall pattern of within-individual changes in blood lead levels over time but also to relate these changes to the assigned treatment (placebo versus succimer).

In its most elementary form, a measure of the observed within-individual change in the response can be conceptualized in terms of simple change scores or difference scores, for example, the differences between post-treatment and pre-treatment measurements of the response. The main objective of a longitudinal analysis is to describe trends in these within-individual changes in the response and to relate these changes to selected covariates (e.g., treatment group). This simple notion of within-individual change extends naturally from difference scores to more general response trajectories over time. For example, a difference score happens to be proportional to the slope (or constant rate of change) of a linear response trajectory. However, other kinds of response trajectories, for example, piecewise linear or curvilinear, can be used to parsimoniously smooth and summarize within-individual changes in the response throughout the duration of the study. In either case the fundamental ideas remain the same: we want to assess and describe within-individual changes in the response over time via comparison of measurements on the same individual taken later in time with those taken earlier.

A longitudinal analysis of within-individual changes proceeds in two conceptually distinct stages. First, within-individual change in the response is characterized in terms of some appropriate summary of the changes in the repeated measurements on each individual during the period of observation (e.g., using difference scores or some form of response trajectory). Second, these estimates of within-individual changes are then related to inter-individual differences in selected covariates. Although these two stages of the analysis are conceptually distinct, they can be combined in a statistical model for longitudinal data. That is, a single statistical model for longitudinal data can be used both to capture how individuals change over time and to relate within-individual changes in the response to selected covariates.

For example, in the Treatment of Lead-Exposed Children Trial the investigators were interested in assessing changes in blood lead levels over time. In particular, they wanted to determine whether chelation treatment with succimer reduced blood lead levels over time relative to any changes in the placebo group. This study question can be addressed in an analysis that compares the two treatment groups in terms of the differences between post-treatment and pre-treatment measurements of blood lead levels. Although the major objective of the analysis is quite clear, there are many ways to construct and test hypotheses concerning treatment effects on changes in blood lead levels over time. For instance, the two treatment groups can be compared in terms of all post-treatment changes in the mean blood lead levels from baseline (or pre-treatment). Alternatively, the two treatment groups can be compared in terms of the rate of decline of blood lead levels over time, where the rate of decline is expressed in terms of a slope. Thus, although the scientific question of interest has a seemingly simple formulation in terms of whether changes in blood lead levels are affected by treatment, there are many different ways to proceed with a longitudinal analysis of these data. The choice of one analytic approach over another will usually depend on statistical considerations (e.g., issues of precision), the design of the study, and the specific scientific question of interest. These are topics that will be discussed in more detail in later chapters of the book.

Finally, it is an inescapable fact that the assessment of within-subject changes in the response over time can be achieved only within a longitudinal study design. A cross-sectional study simply cannot estimate how individuals change over time since the response is measured at only a single occasion. A longitudinal study can estimate how individuals change and also do so with great precision because each individual acts as his or her own control. By comparing each individual’s responses at two or more occasions, a longitudinal analysis can remove extraneous, but unavoidable, sources of variability among individuals. The key point here is that there is natural heterogeneity among individuals in many extraneous variables. Although these extraneous variables are not of any substantive interest, they can potentially have an impact on the response variable. The beauty of a longitudinal study design is that any extraneous factors (regardless of whether they have been measured) that influence the response, and whose influence persists but remains relatively stable throughout the duration of the study (e.g., gender, socioeconomic status, and many genetic, environmental, social, and behavioral factors), are eliminated or blocked out when an individual’s responses are compared at two or more occasions. By eliminating these major sources of variability or noise from the estimation of within-individual change, a very precise estimate of change can often be obtained.

In summary, the fundamental objective of a longitudinal analysis is the assessment of within-individual changes in the response and the explanation of systematic differences among individuals in their changes. Given that certain individuals change more (or less) than others, the goal of a longitudinal analysis is to determine whether these individuals have larger or smaller values on selected covariates. Finally, in some longitudinal studies, it may also be of interest to make predictions about how specific individuals change over time. In the latter case, longitudinal studies permit more reliable prediction by borrowing information from all individuals to better predict within-individual change over time for a specific individual.

2.3 DEFINING FEATURES OF LONGITUDINAL DATA

At this point we need to introduce some terminology that will be used throughout the remainder of the book. We also introduce some notation for longitudinal data and highlight the main aspects of longitudinal data that complicate their analysis, namely the correlation among repeated observations obtained on the same individual.

2.3.1 Terminology

In a longitudinal study the participants, or, more generally, the units being studied, are referred to as individuals or subjects. In many, but certainly not all, longitudinal studies, the individuals are human subjects. In other longitudinal studies, the individuals may be animals (e.g., laboratory mice or rats). Depending on the specific context, we use the terms individuals and subjects interchangeably to refer to the participants in a longitudinal study. As mentioned earlier, in a longitudinal study individuals are measured repeatedly at different occasions or times. Later we will introduce some notation that can distinguish the responses from different individuals in a longitudinal study as well as the repeated measurements on any particular individual. Thus, adopting the terminology introduced so far, the defining feature of a longitudinal study design is that measurements of the response variable are taken on the same individuals at several occasions.

The number of repeated observations, and their timing, can vary widely from one longitudinal study to another. For example, a clinical trial designed to examine the efficacy of a new analgesic agent may take repeated measures of a self-reported pain scale at baseline and at the end of six 15-minute intervals. This would result in seven repeated measures that are equally separated in time. On the other hand, an observational study of human growth may take measurements of height and weight at 3-month intervals from birth to age 2 years, followed by yearly observations from infancy through young adulthood. By design, the latter study would result in a sequence of repeated measures of height and weight that are unequally separated in time. In both of these examples, the number and the timing of the repeated measurements are the same for all individuals, regardless of whether the occasions of measurement are equally or unequally distributed throughout the duration of the study. Loosely borrowing statistical terminology from the field of experimental design, we refer to the latter studies as being balanced over time; that is, all individuals have the same number of repeated measurements obtained at a common set of occasions.

It is an almost inescapable feature of longitudinal studies in the health sciences, especially those where the repeated measurements extend over a relatively long duration, that some individuals will miss their scheduled visit or date of observation. In some studies this may necessitate that observations be made some time before or after the scheduled time. Consequently the sequence of observation times is no longer common to all individuals in the study due to mistimed measurements. In that case we refer to the data as being unbalanced over time; that is, the repeated measurements are not obtained at a common set of occasions. Unbalanced longitudinal designs are commonplace when the longitudinal study involves retrospectively collected data (e.g., longitudinal data obtained from medical record databases). Alternatively, highly unbalanced longitudinal data can arise when it is of interest to define the timings of the measurements relative to some benchmark event that occurs during the follow-up period. For example, in a study examining changes in body fat in girls before and after menarche (to be discussed in Section 8.8), the study was designed to begin annual follow-up measurements of body fat prior to menarche and continue for four years after menarche. Although this study design is balanced if the timing of measurements is defined as the time since the baseline measurement, the data are inherently unbalanced if the timing of measurements is defined as the time since an individual experienced menarche. Thus longitudinal studies that are balanced over time when the timing of measurements is defined according to one origin can become highly unbalanced when time is defined in terms of a different origin.

Although longitudinal designs that are unbalanced over time often arise due to happenstance, they are sometimes planned by the investigators. In a rotating panel study design, which is commonly used in health surveys to reduce response burden, individuals rotate in and out of the study after providing a pre-determined number of repeated measures. For example, two or more panels of individuals are measured repeatedly for a restricted number of occasions, with the first measurement for each panel of individuals being staggered. Thus some individuals rotate out (either temporarily or permanently) of the sample, whereas other individuals rotate into the sample. The primary motivation for this type of study design is to reduce costs and the overall burden of participating in the study for any individual, while providing observations at every occasion for some pre-determined proportion of the sample. An important characteristic of the rotating panel design is that the number and timing of the measurements is pre-determined and by design. Furthermore the decision about whether to obtain a measurement on an individual at any specific occasion is pre-determined a priori by the investigators and is not related to the response variable.

Missing data are a common and challenging problem in longitudinal studies. Indeed, missing data are the rule, not the exception, in longitudinal studies in the health sciences. For example, study participants do not always appear for a scheduled observation, or they may simply leave the study before its completion. When some observations are missing, the data are necessarily unbalanced over time, since not all individuals have the same number of repeated measurements obtained at a common set of occasions. However, to distinguish missing data in a longitudinal study from other kinds of unbalanced data, such data sets are often referred to as being incomplete. This distinction is important and emphasizes the fact that an intended measurement on an individual could not be obtained.

One of the consequences of lack of balance and/or missing data is that it requires some care to recover within-individual change. For example, consider a setting where each individual is measured on each of n occasions. Then consider plotting the mean response at each occasion. Differences in the mean response over time measure the within-individual change. This is because the difference in the means is also the mean of the differences when each subject is measured at every occasion. When data are missing, and especially when there is attrition of subjects whose responses are different from those who remain in the study, then a plot of the mean response over time can be misleading; changes over time may reflect the pattern of missingness or the attrition, and not within-individual change. As we will discuss in later chapters, one will need to examine assumptions and the appropriateness of the analysis carefully to determine the validity of the inferences with unbalanced designs and/or missing data. Although the methods discussed in this book are designed to handle unbalanced designs and missing data, it is worth keeping in mind that it is always preferable to have balanced designs, because these designs can only capture within-individual change.

When longitudinal data are incomplete, there are ramifications for their analysis that go beyond whether a particular statistical method can handle unbalanced longitudinal data. First, when there are missing data, it should be intuitively clear that there must necessarily be some loss of information. Thus there is a price to be paid in terms of efficiency or the precision with which changes over time can be estimated. However, besides causing inefficiency, in some circumstances missing data can introduce bias in the estimates of change. As a result, when longitudinal data are incomplete, the reasons for any missingness must be carefully considered. In Chapters 17 and 18 we discuss some of the consequences of incomplete data in longitudinal studies. In all subsequent chapters we allow for missing data but implicitly make assumptions about the reasons for any missingness. These assumptions are discussed in Section 4.3 and spelled out in greater detail in Chapter 17.

In summary, longitudinal data can be balanced and complete when all individuals are measured at a common set of occasions and there are no missing data. In our experience, longitudinal data in the health sciences are rarely balanced and complete unless the subjects lack human volition (e.g., laboratory rats) or the length of the study is relatively short (e.g., a longitudinal study of the efficacy of an analgesic where the repeated measurements can be obtained in a single study visit). It is far more common to have longitudinal

Enjoying the preview?

Page 1 of 1

Applied Longitudinal Analysis

About this ebook

Garrett M. Fitzmaurice

Related authors

Related to Applied Longitudinal Analysis

Titles in the series (100)

Probability and Conditional Expectation: Fundamentals for the Empirical Sciences

Linear Statistical Inference and its Applications

Applications of Statistics to Industrial Experimentation

Measurement Errors in Surveys

Time Series Analysis: Nonstationary and Noninvertible Distribution Theory

Time Series Analysis with Long Memory in View

Robust Correlation: Theory and Applications

Theory of Ridge Regression Estimation with Applications

Measuring Agreement: Models, Methods, and Applications

Modern Experimental Design

Nonlinear Statistical Models

Methods for Statistical Data Analysis of Multivariate Observations

Sequential Stochastic Optimization

Theory of Probability: A critical introductory treatment

Statistics and Causality: Methods for Applied Empirical Research

Forecasting with Univariate Box - Jenkins Models: Concepts and Cases

A Course in Time Series Analysis

Multiple Imputation for Nonresponse in Surveys

Fundamentals of Queueing Theory

Nonparametric Finance

Computation for the Analysis of Designed Experiments

Periodically Correlated Random Sequences: Spectral Theory and Practice

Aspects of Multivariate Statistical Theory

Business Survey Methods

Statistical Models and Methods for Lifetime Data

Fundamental Statistical Inference: A Computational Approach

Linear Regression Analysis

The Statistical Analysis of Failure Time Data

Statistical Methods for the Analysis of Biomedical Data

Statistical Modeling by Wavelets

Related ebooks

Methods of Multivariate Analysis

Case Studies in Bayesian Statistical Modelling and Analysis

Statistics and Causality: Methods for Applied Empirical Research

Simulation for Data Science with R

Matrix Operations for Engineers and Scientists: An Essential Guide in Linear Algebra

An Elementary Introduction to Statistical Learning Theory

A Course in Statistics with R

An Introduction to Econometric Theory

Nonlinear Parameter Optimization Using R Tools

Latent Variable Models and Factor Analysis: A Unified Approach

R in Action, Third Edition: Data analysis and graphics with R and Tidyverse

Experiments with Mixtures: Designs, Models, and the Analysis of Mixture Data

Spatio-temporal Design: Advances in Efficient Data Acquisition

Methods and Applications of Statistics in Clinical Trials, Volume 1: Concepts, Principles, Trials, and Designs

Biostatistics Using JMP: A Practical Guide

Random Data: Analysis and Measurement Procedures

Practical Statistics Simply Explained

Practical Data Analysis - Second Edition

Mastering Text Mining with R

Practical Data Science with R, Second Edition

Learning Predictive Analytics with R

Causality: Statistical Perspectives and Applications

Statistical Modeling by Wavelets

R: Data Analysis and Visualization

Cluster Analysis

Beginning Statistics with Data Analysis

Beginning R: The Statistical Programming Language

Applied Data Mining for Forecasting Using SAS

The Real Work of Data Science: Turning data into information, better decisions, and stronger organizations

Building a Recommendation System with R

Medical For You

The Obesity Code: Unlocking the Secrets of Weight Loss (Why Intermittent Fasting Is the Key to Controlling Your Weight)

What Happened to You?: Conversations on Trauma, Resilience, and Healing

Brain on Fire: My Month of Madness

Natural Remedies Complete Collection: Barbara O'Neill Lost Book Containing Over 1000 Recipes in this Ultimate Guide to ALL of Dr. Barbara O’Neill’s Studies on Living a Whole Self Sustain Lifestyle.

The Vagina Bible: The Vulva and the Vagina: Separating the Myth from the Medicine

The Lost Book of Simple Herbal Remedies: Discover over 100 herbal Medicine for all kinds of Ailment, Inspired By Dr. Barbara O'Neill

Women With Attention Deficit Disorder: Embrace Your Differences and Transform Your Life

Adult ADHD: How to Succeed as a Hunter in a Farmer's World

The Little Book of Hygge: Danish Secrets to Happy Living

The Emotion Code: How to Release Your Trapped Emotions for Abundant Health, Love, and Happiness (Updated and Expanded Edition)

The Emperor of All Maladies: A Biography of Cancer

The Diabetes Code: Prevent and Reverse Type 2 Diabetes Naturally