Methods in Epidemiologic Research is a comprehensive text covering the key principles and methods used in epidemiologic research. It is written primarily for researchers and graduate students in epidemiology, but the material is equally applicable to those in related disciplines (eg public health).
Chapters 1 through 6 focus on basic epidemiologic principles. Chapters 7– 11 focus on study design issues for observational studies and controlled trials. There has been much discussion over the past decade about the need for epidemiologists to thoroughly report their research findings (and by doing so this will help ensure high-quality study designs in the future) and we have cited the summary recommendations in these chapters.
Chapters 14– 19 cover a range of multivariable models. Chapter 19 (Modelling Survival Data) attempts to provide a comprehensive coverage of the most commonly used methods in the analysis of time-to-event data.
Chapters 20– 23 deal with the issue of clustered data, including a thorough description of methods for analysing repeated measures data. Chapters 24– 30 cover a range of specialised topics including: Bayesian methods (Chapter 24— contributed by Henrik Stryhn in collaboration with William Browne), two chapters on presenting and analysing spatial data (Chapter 25 and 26— contributed by Javier Sanchez and Dirk Pfeiffer), an introduction to infectious disease epidemiology (Chapter 27— contributed by Graham Medley in collaboration with Ian Dohoo), and meta-analysis (Chapter 28).
Supplementary materials for this text will all be made available at upei.ca/mer . These will include datasets, computer programs for all examples presented (initially Stata “do files” with the expectation that programs for other statistical packages will be added later).
All of the datasets used in these examples are described in the text (Chapter 31) and are available through upei.ca/mer . Virtually all of the examples have been analysed using the statistical program Stata™— a program which provides a unique combination of statistical and epidemiological tools and which we use extensively in our teaching.
TABLE OF CONTENTS
1. INTRODUCTION AND CAUSAL CONCEPTS
1.1
Introduction
2
1.2
A brief history of multiple causation concepts
2
1.3
A brief history of scientific inference
6
1.4
Key components of epidemiologic research
9
1.5
Seeking causes
10
1.6
Models of causation
11
1.7
Counterfactual concepts of causation for a single exposure
18
1.8
Experimental versus observational evidence of causation
22
1.9
Constructing a causal diagram
23
1.10
Causal criteria
25
2. SAMPLING
2.1
Introduction
35
2.2
Non-probability sampling
36
2.3
Probability sampling
39
2.4
Simple random sample
39
2.5
Systematic random sample
40
2.6
Stratified random sample
40
2.7
Cluster sampling
41
2.8
Multistage sampling
42
2.9
Targeted (risk-based) sampling
43
2.10
Analysis of survey data
44
2.11
Sample-size determination
48
2.12
Sampling to detect disease
55
3. QUESTIONNAIRE DESIGN
3.1
Introduction
62
3.2
Designing the question
64
3.3
Open question
65
3.4
Closed question
65
3.5
Wording the question
69
3.6
Structure of questionnaires
69
3.7
Pre-testing questionnaires
70
3.8
Validation
71
3.9
Response Rate
71
3.10
Data-coding and editing
72
4. MEASURES OF DISEASE FREQUENCY
4.1
Introduction
78
4.2
Count, proportion, odds and rate
78
4.3
Incidence
79
4.4
Calculating risk
80
4.5
Calculating incidence rates
81
4.6
Relationship between risk and rate
83
4.7
Prevalence
84
4.8
Mortality statistics
85
4.9
Other measures of disease frequency
85
4.10
Standard errors and confidence intervals
87
4.11
Standardisation of risks and rates
89
5. SCREENING AND DIAGNOSTIC TESTS
5.1
Introduction
96
5.2
Attributes of the test per se
96
5.3
The ability of a test to detect disease or health
104
5.4
Predictive values
107
5.5
Interpreting test results that are measured on a continuous scale
109
5.6
Using multiple tests
115
5.7
Evaluation of diagnostic tests
117
5.8
Evaluation when there is no gold standard
121
5.9
Other considerations in test evaluation
125
5.10
Sample size requirements
127
5.11
Herd-level testing
128
5.12
Use of pooled samples
130
6. MEASURES OF ASSOCIATION
6.1
Introduction
140
6.2
Measures of association
141
6.3
Measures of effect
144
6.4
Study design and measures of association
147
6.5
Hypothesis testing and confidence intervals
147
6.6
Multivariable estimation of measures of association
152
7. INTRODUCTION TO OBSERVATIONAL STUDIES
7.1
Introduction
156
7.2
A unified approach to study design
159
7.3
Descriptive studies
161
7.4
Observational studies
162
7.5
Cross-sectional studies
164
7.6
Estimating incidence from one or more cross-sectional studies
168
7.7
Inferential limitations of cross-sectional studies
169
7.8
Repeated cross-sectional versus cohort studies
170
7.9
Reporting of observational studies
171
8. COHORT STUDIES
8.1
Introduction
180
8.2
Selecting the study group
182
8.3
The exposure
186
8.4
Disease as exposure
190
8.5
Ensuring exposed and non-exposed groups are comparable
190
8.6
Follow-up period
191
8.7
Measuring the outcome
191
8.8
Analysis
192
8.9
Reporting of cohort studies
194
9. CASE-CONTROL STUDIES
9.1
Introduction
202
9.2
The study base
202
9.3
The case series
205
9.4
Principles of control selection
207
9.5
Selecting controls in risk-based designs
207
9.6
Selecting controls in rate-based designs
209
9.7
Other sources of controls
214
9.8
The number of controls per case
215
9.9
The number of control groups
215
9.10
Exposure and covariate assessment
216
9.11
Keeping the cases and controls comparable
216
9.12
Analysis of case-control data
217
9.13
Reporting guidelines for case-control studies
218
10. HYBRID STUDY DESIGNS
10.1
Introduction
224
10.2
Case-crossover studies
224
10.3
Case-case studies
228
10.4
Case-case-control studies
229
10.5
Case-series studies
231
10.6
Case-cohort studies
233
10.7
Case-only studies
235
10.8
Two-stage sampling designs
237
11. CONTROLLED STUDIES
11.1
Introduction
244
11.2
Background, objectives, and summary trial design
246
11.3
Participants: the study group
247
11.4
Specifying the intervention
250
11.5
Measuring the outcome
251
11.6
Sample size
252
11.7
Allocation of study subjects
254
11.8
Follow-up/compliance
258
11.9
Statistical methods and analysis
259
11.10
Conclusions
262
11.11
Clinical trial designs for prophylaxis of communicable organisms
262
11.12
Reporting of clinical trials
265
12. VALIDITY IN OBSERVATIONAL STUDIES
12.1
Introduction
276
12.2
Selection bias
277
12.3
Examples of selection bias
281
12.4
Reducing selection bias
287
12.5
Information bias
288
12.6
Bias from misclassification
290
12.7
Validation studies to correct misclassification
297
12.8
Measurement error
297
12.9
Errors in surrogate measures of exposure
299
12.10
The impact of information bias on sample size
299
13. CONFOUNDING: DETECTION AND CONTROL
13.1
Introduction
308
13.2
Control of confounding prior to data analysis
311
13.3
Matching on confounders
311
13.4
Detection of confounding
316
13.5
Analytic control of confounding
322
13.6
Multivariable modelling to control confounding
328
13.7
Other approaches to control confounding and estimate causal effects
328
13.8
Propensity scores for controlling confounding
335
13.9
External adjustment and sensitivity analysis for unmeasured confounders
340
13.10
Understanding causal relationships
342
13.11
Summary of effects of extraneous variables
351
14. LINEAR REGRESSION
14.1
Introduction
360
14.2
Regression analysis
360
14.3
Hypothesis testing and effect estimation
362
14.4
Nature of the X-variables
368
14.5
Detecting highly correlated (collinear) variables
374
14.6
Detecting and modelling interaction
376
14.7
Causal interpretation of a multivariable linear model
377
14.8
Evaluating the least squares model
379
14.9
Evaluating the major assumptions
385
14.10
Assessment of individual observations
390
14.11
Time-series data
396
15. MODEL-BUILDING STRATEGIES
15.1
Introduction
402
15.2
Steps in building a model
403
15.3
Building a causal model
403
15.4
Reducing the number of predictors
404
15.5
The problem of missing values
408
15.6
Effects of continuous predictors
411
15.7
Identifying interaction terms of interest
418
15.8
Building the model
418
15.9
Evaluate the reliability of the model
423
15.10
Presenting the results
424
16. LOGISTIC REGRESSION
16.1
Introduction
430
16.2
The logistic model
430
16.3
Odds and odds ratios
431
16.4
Fitting a logistic regression model
432
16.5
Assumptions in logistic regression
433
16.6
Likelihood ratio statistics
434
16.7
Wald tests
436
16.8
Interpretation of coefficients
436
16.9
Assessing interaction and confounding
439
16.10
Model-building
441
16.11
Generalised linear models
444
16.12
Evaluating logistic regression models
445
16.13
Sample size considerations
455
16.14
Exact logistic regression
456
16.15
Conditional logistic regression for matched studies
456
17. MODELLING ORDINAL AND MULTINOMIAL DATA
17.1
Introduction
462
17.2
Overview of models
462
17.3
Multinomial logistic regression
466
17.4
Modelling ordinal data
470
17.5
Proportional odds model (constrained cumulative logit model)
471
17.6
Adjacent-category model
475
17.7
Continuation-ratio model
476
18. MODELLING COUNT AND RATE DATA
18.1
Introduction
480
18.2
The Poisson distribution
481
18.3
Poisson regression model
482
18.4
Interpretation of coefficients
483
18.5
Evaluating Poisson regression models
485
18.6
Negative binomial regression
488
18.7
Problems with zero counts
496
19. MODELLING SURVIVAL DATA
19.1
Introduction
502
19.2
Non-parametric analyses
507
19.3
Actuarial life tables
507
19.4
Kaplan-Meier estimate of survivor function
510
19.5
Nelson-Aalen estimate of cumulative hazard
512
19.6
Statistical inference in non-parametric analyses
512
19.7
Survivor, failure and hazard functions
514
19.8
Semi-parametric analyses
519
19.9
Parametric models
536
19.10
Accelerated failure time models
541
19.11
Frailty models and clustering
545
19.12
Multiple outcome event data
551
19.13
Discrete-time survival analysis
552
19.14
Sample sizes for survival analyses
557
20. INTRODUCTION TO CLUSTERED DATA
20.1
Introduction
564
20.2
Clustering arising from the data structure
564
20.3
Effects of clustering
570
20.4
Simulation studies on the impact of clustering
574
20.5
Introduction to methods for dealing with clustering
576
21. MIXED MODELS FOR CONTINUOUS DATA
21.1
Introduction
588
21.2
Linear mixed model
588
21.3
Random slopes
594
21.4
Contextual effects
598
21.5
Statistical analysis of linear mixed models
601
22. MIXED MODELS FOR DISCRETE DATA
22.1
Introduction
616
22.2
Logistic regression with random effects
617
22.3
Poisson regression with random effects
621
22.4
Generalised linear mixed model
623
22.5
Statistical analysis of GLMMs
630
22.6
Summary remarks on analysis of discrete clustered data
639
23. REPEATED MEASURES DATA
23.1
Introduction
646
23.2
Univariate and multivariate approaches to repeated measures data
648
23.3
Linear mixed models with correlation structure
654
23.4
Mixed models for discrete repeated measures data
662
23.5
Generalised estimating equations
665
24. INTRODUCTION TO BAYESIAN ANALYSIS
24.1
Introduction
676
24.2
Bayesian analysis
676
24.3
Markov chain Monte Carlo (MCMC) estimation
680
24.4
Statistical analysis based on MCMC estimation
685
24.5
Extensions of Bayesian and MCMC Modelling
689
25. ANALYSIS OF SPATIAL DATA: INTRODUCTION AND VISUALISATION
25.1
Introduction
702
25.2
Spatial data
702
25.3
Spatial data analysis
705
25.4
Additional topics
711
26. ANALYSIS OF SPATIAL DATA
26.1
Introduction
718
26.2
Issues specific to statistical analysis of spatial data
718
26.3
Exploratory spatial analysis
720
26.4
Global spatial clustering
728
26.5
Localised spatial cluster detection
735
26.6
Space-time association
738
26.7
Modelling
742
27. CONCEPTS OF INFECTIOUS DISEASE EPIDEMIOLOGY
27.1
Introduction
754
27.2
Infection vs disease
756
27.3
Transmission
758
27.4
Mathematical modelling of infectious disease transmission
760
27.5
Methods of control of infectious disease
763
27.6
Estimating R0 and other parameters
766
27.7
Developing more complex models
771
27.8
Using models
773
27.9
Summary
775
28. SYSTEMATIC REVIEWS AND META-ANALYSIS
28.1
Introduction
780
28.2
Narrative reviews
780
28.3
Systematic Reviews
781
28.4
Meta-analysis – Introduction
785
28.5
Fixed- and random-effects models
786
28.6
Presentation of results
789
28.7
Heterogeneity
791
28.8
Publication bias
798
28.9
Influential studies
801
28.10
Outcome scales and data issues
801
28.11
Meta-analysis of observational studies
804
28.12
Meta-analysis of diagnostic tests
806
28.13
Use of meta-analysis
807
29. ECOLOGICAL AND GROUP-LEVEL STUDIES
29.1
Introduction
814
29.2
Rationale for group level studies
815
29.3
Types of ecologic variable
816
29.4
Issues related to modelling approaches in ecologic studies
817
29.5
The linear model in the context of ecologic studies
818
29.6
Issues related to inferences
819
29.7
Sources of ecologic bias
820
29.8
Analysis of ecologic data
825
29.9
Non-ecologic group-level studies
826
30. A STRUCTURED APPROACH TO DATA ANALYSIS
30.1
Introduction
834
30.2
Data-collection sheets
834
30.3
Data coding
835
30.4
Data entry
835
30.5
Keeping track of files
836
30.6
Keeping track of variables
836
30.7
Program mode versus interactive processing
837
30.8
Data-editing
838
30.9
Data verification
839
30.10
Data processing—outcome variable(s)
839
30.11
Data processing—predictor variables
840
30.12
Data processing—multilevel data
840
30.13
Unconditional associations
841
30.14
Keeping track of your analyses
841