Table of Contents

Methods in Epidemiologic Research is a comprehensive text covering the key principles and methods used in epidemiologic research. It is written primarily for researchers and graduate students in epidemiology, but the material is equally applicable to those in related disciplines (eg public health).

Chapters 1 through 6 focus on basic epidemiologic principles. Chapters 711 focus on study design issues for observational studies and controlled trials. There has been much discussion over the past decade about the need for epidemiologists to thoroughly report their research findings (and by doing so this will help ensure high-quality study designs in the future) and we have cited the summary recommendations in these chapters.

Chapters 1419 cover a range of multivariable models. Chapter 19 (Modelling Survival Data) attempts to provide a comprehensive coverage of the most commonly used methods in the analysis of time-to-event data.

Chapters 2023 deal with the issue of clustered data, including a thorough description of methods for analysing repeated measures data. Chapters 2430 cover a range of specialised topics including: Bayesian methods (Chapter 24contributed by Henrik Stryhn in collaboration with William Browne), two chapters on presenting and analysing spatial data (Chapter 25 and 26contributed by Javier Sanchez and Dirk Pfeiffer), an introduction to infectious disease epidemiology (Chapter 27contributed by Graham Medley in collaboration with Ian Dohoo), and meta-analysis (Chapter 28).

Supplementary materials for this text will all be made available at upei.ca/mer. These will include datasets, computer programs for all examples presented (initially Stata “do files” with the expectation that programs for other statistical packages will be added later).

All of the datasets used in these examples are described in the text (Chapter 31) and are available through upei.ca/mer. Virtually all of the examples have been analysed using the statistical program Stata™a program which provides a unique combination of statistical and epidemiological tools and which we use extensively in our teaching.

TABLE OF CONTENTS

1. INTRODUCTION AND CAUSAL CONCEPTS
1.1 Introduction 2
1.2 A brief history of multiple causation concepts 2
1.3 A brief history of scientific inference 6
1.4 Key components of epidemiologic research 9
1.5 Seeking causes 10
1.6 Models of causation 11
1.7 Counterfactual concepts of causation for a single exposure 18
1.8 Experimental versus observational evidence of causation 22
1.9 Constructing a causal diagram 23
1.10 Causal criteria 25
2. SAMPLING
2.1 Introduction 35
2.2 Non-probability sampling 36
2.3 Probability sampling 39
2.4 Simple random sample 39
2.5 Systematic random sample 40
2.6 Stratified random sample 40
2.7 Cluster sampling 41
2.8 Multistage sampling 42
2.9 Targeted (risk-based) sampling 43
2.10 Analysis of survey data 44
2.11 Sample-size determination 48
2.12 Sampling to detect disease 55
3. QUESTIONNAIRE DESIGN
3.1 Introduction 62
3.2 Designing the question 64
3.3 Open question 65
3.4 Closed question 65
3.5 Wording the question 69
3.6 Structure of questionnaires 69
3.7 Pre-testing questionnaires 70
3.8 Validation 71
3.9 Response Rate 71
3.10 Data-coding and editing 72
4. MEASURES OF DISEASE FREQUENCY
4.1 Introduction 78
4.2 Count, proportion, odds and rate 78
4.3 Incidence 79
4.4 Calculating risk 80
4.5 Calculating incidence rates 81
4.6 Relationship between risk and rate 83
4.7 Prevalence 84
4.8 Mortality statistics 85
4.9 Other measures of disease frequency 85
4.10 Standard errors and confidence intervals 87
4.11 Standardisation of risks and rates 89
5. SCREENING AND DIAGNOSTIC TESTS
5.1 Introduction 96
5.2 Attributes of the test per se 96
5.3 The ability of a test to detect disease or health 104
5.4 Predictive values 107
5.5 Interpreting test results that are measured on a continuous scale 109
5.6 Using multiple tests 115
5.7 Evaluation of diagnostic tests 117
5.8 Evaluation when there is no gold standard 121
5.9 Other considerations in test evaluation 125
5.10 Sample size requirements 127
5.11 Herd-level testing 128
5.12 Use of pooled samples 130
6. MEASURES OF ASSOCIATION
6.1 Introduction 140
6.2 Measures of association 141
6.3 Measures of effect 144
6.4 Study design and measures of association 147
6.5 Hypothesis testing and confidence intervals 147
6.6 Multivariable estimation of measures of association 152
7. INTRODUCTION TO OBSERVATIONAL STUDIES
7.1 Introduction   156
7.2 A unified approach to study design   159
7.3 Descriptive studies   161
7.4 Observational studies   162
7.5 Cross-sectional studies   164
7.6 Estimating incidence from one or more cross-sectional studies   168
7.7 Inferential limitations of cross-sectional studies   169
7.8 Repeated cross-sectional versus cohort studies   170
7.9 Reporting of observational studies   171
8. COHORT STUDIES
8.1 Introduction 180
8.2 Selecting the study group 182
8.3 The exposure 186
8.4 Disease as exposure 190
8.5 Ensuring exposed and non-exposed groups are comparable 190
8.6 Follow-up period 191
8.7 Measuring the outcome 191
8.8 Analysis 192
8.9 Reporting of cohort studies 194
9. CASE-CONTROL STUDIES
9.1 Introduction 202
9.2 The study base 202
9.3 The case series 205
9.4 Principles of control selection 207
9.5 Selecting controls in risk-based designs 207
9.6 Selecting controls in rate-based designs 209
9.7 Other sources of controls 214
9.8 The number of controls per case 215
9.9 The number of control groups 215
9.10 Exposure and covariate assessment 216
9.11 Keeping the cases and controls comparable 216
9.12 Analysis of case-control data 217
9.13 Reporting guidelines for case-control studies 218
10. HYBRID STUDY DESIGNS
10.1 Introduction 224
10.2 Case-crossover studies 224
10.3 Case-case studies 228
10.4 Case-case-control studies 229
10.5 Case-series studies 231
10.6 Case-cohort studies 233
10.7 Case-only studies 235
10.8 Two-stage sampling designs 237
11. CONTROLLED STUDIES
11.1 Introduction 244
11.2 Background, objectives, and summary trial design 246
11.3 Participants: the study group 247
11.4 Specifying the intervention 250
11.5 Measuring the outcome 251
11.6 Sample size 252
11.7 Allocation of study subjects 254
11.8 Follow-up/compliance 258
11.9 Statistical methods and analysis 259
11.10 Conclusions 262
11.11 Clinical trial designs for prophylaxis of communicable organisms 262
11.12 Reporting of clinical trials 265
12. VALIDITY IN OBSERVATIONAL STUDIES
12.1 Introduction 276
12.2 Selection bias 277
12.3 Examples of selection bias 281
12.4 Reducing selection bias 287
12.5 Information bias 288
12.6 Bias from misclassification 290
12.7 Validation studies to correct misclassification 297
12.8 Measurement error 297
12.9 Errors in surrogate measures of exposure 299
12.10 The impact of information bias on sample size 299
13. CONFOUNDING: DETECTION AND CONTROL
13.1 Introduction 308
13.2 Control of confounding prior to data analysis 311
13.3 Matching on confounders 311
13.4 Detection of confounding 316
13.5 Analytic control of confounding 322
13.6 Multivariable modelling to control confounding 328
13.7 Other approaches to control confounding and estimate causal effects 328
13.8 Propensity scores for controlling confounding 335
13.9 External adjustment and sensitivity analysis for unmeasured confounders 340
13.10 Understanding causal relationships 342
13.11 Summary of effects of extraneous variables 351
14. LINEAR REGRESSION
14.1 Introduction 360
14.2 Regression analysis 360
14.3 Hypothesis testing and effect estimation 362
14.4 Nature of the X-variables 368
14.5 Detecting highly correlated (collinear) variables 374
14.6 Detecting and modelling interaction 376
14.7 Causal interpretation of a multivariable linear model 377
14.8 Evaluating the least squares model 379
14.9 Evaluating the major assumptions 385
14.10 Assessment of individual observations 390
14.11 Time-series data 396
15. MODEL-BUILDING STRATEGIES
15.1 Introduction 402
15.2 Steps in building a model 403
15.3 Building a causal model 403
15.4 Reducing the number of predictors 404
15.5 The problem of missing values 408
15.6 Effects of continuous predictors 411
15.7 Identifying interaction terms of interest 418
15.8 Building the model 418
15.9 Evaluate the reliability of the model 423
15.10 Presenting the results 424
16. LOGISTIC REGRESSION
16.1 Introduction 430
16.2 The logistic model 430
16.3 Odds and odds ratios 431
16.4 Fitting a logistic regression model 432
16.5 Assumptions in logistic regression 433
16.6 Likelihood ratio statistics 434
16.7 Wald tests 436
16.8 Interpretation of coefficients 436
16.9 Assessing interaction and confounding 439
16.10 Model-building 441
16.11 Generalised linear models 444
16.12 Evaluating logistic regression models 445
16.13 Sample size considerations 455
16.14 Exact logistic regression 456
16.15 Conditional logistic regression for matched studies 456
17. MODELLING ORDINAL AND MULTINOMIAL DATA
17.1 Introduction 462
17.2 Overview of models 462
17.3 Multinomial logistic regression 466
17.4 Modelling ordinal data 470
17.5 Proportional odds model (constrained cumulative logit model) 471
17.6 Adjacent-category model 475
17.7 Continuation-ratio model 476
18. MODELLING COUNT AND RATE DATA
18.1 Introduction 480
18.2 The Poisson distribution 481
18.3 Poisson regression model 482
18.4 Interpretation of coefficients 483
18.5 Evaluating Poisson regression models 485
18.6 Negative binomial regression 488
18.7 Problems with zero counts 496
19. MODELLING SURVIVAL DATA
19.1 Introduction 502
19.2 Non-parametric analyses 507
19.3 Actuarial life tables 507
19.4 Kaplan-Meier estimate of survivor function 510
19.5 Nelson-Aalen estimate of cumulative hazard 512
19.6 Statistical inference in non-parametric analyses 512
19.7 Survivor, failure and hazard functions 514
19.8 Semi-parametric analyses 519
19.9 Parametric models 536
19.10 Accelerated failure time models 541
19.11 Frailty models and clustering 545
19.12 Multiple outcome event data 551
19.13 Discrete-time survival analysis 552
19.14 Sample sizes for survival analyses 557
20. INTRODUCTION TO CLUSTERED DATA
20.1 Introduction 564
20.2 Clustering arising from the data structure 564
20.3 Effects of clustering 570
20.4 Simulation studies on the impact of clustering 574
20.5 Introduction to methods for dealing with clustering 576
21. MIXED MODELS FOR CONTINUOUS DATA
21.1 Introduction 588
21.2 Linear mixed model 588
21.3 Random slopes 594
21.4 Contextual effects 598
21.5 Statistical analysis of linear mixed models 601
22. MIXED MODELS FOR DISCRETE DATA
22.1 Introduction 616
22.2 Logistic regression with random effects 617
22.3 Poisson regression with random effects 621
22.4 Generalised linear mixed model 623
22.5 Statistical analysis of GLMMs 630
22.6 Summary remarks on analysis of discrete clustered data 639
23. REPEATED MEASURES DATA
23.1 Introduction 646
23.2 Univariate and multivariate approaches to repeated measures data 648
23.3 Linear mixed models with correlation structure 654
23.4 Mixed models for discrete repeated measures data 662
23.5 Generalised estimating equations 665
24. INTRODUCTION TO BAYESIAN ANALYSIS
24.1 Introduction 676
24.2 Bayesian analysis 676
24.3 Markov chain Monte Carlo (MCMC) estimation 680
24.4 Statistical analysis based on MCMC estimation 685
24.5 Extensions of Bayesian and MCMC Modelling 689
25. ANALYSIS OF SPATIAL DATA: INTRODUCTION AND VISUALISATION
25.1 Introduction 702
25.2 Spatial data 702
25.3 Spatial data analysis 705
25.4 Additional topics 711
26. ANALYSIS OF SPATIAL DATA
26.1 Introduction 718
26.2 Issues specific to statistical analysis of spatial data 718
26.3 Exploratory spatial analysis 720
26.4 Global spatial clustering 728
26.5 Localised spatial cluster detection 735
26.6 Space-time association 738
26.7 Modelling 742
27. CONCEPTS OF INFECTIOUS DISEASE EPIDEMIOLOGY
27.1 Introduction 754
27.2 Infection vs disease 756
27.3 Transmission 758
27.4 Mathematical modelling of infectious disease transmission 760
27.5 Methods of control of infectious disease 763
27.6 Estimating R0 and other parameters 766
27.7 Developing more complex models 771
27.8 Using models 773
27.9 Summary 775
28. SYSTEMATIC REVIEWS AND META-ANALYSIS
28.1 Introduction 780
28.2 Narrative reviews 780
28.3 Systematic Reviews 781
28.4 Meta-analysis – Introduction 785
28.5 Fixed- and random-effects models 786
28.6 Presentation of results 789
28.7 Heterogeneity 791
28.8 Publication bias 798
28.9 Influential studies 801
28.10 Outcome scales and data issues 801
28.11 Meta-analysis of observational studies 804
28.12 Meta-analysis of diagnostic tests 806
28.13 Use of meta-analysis 807
29. ECOLOGICAL AND GROUP-LEVEL STUDIES
29.1 Introduction 814
29.2 Rationale for group level studies 815
29.3 Types of ecologic variable 816
29.4 Issues related to modelling approaches in ecologic studies 817
29.5 The linear model in the context of ecologic studies 818
29.6 Issues related to inferences 819
29.7 Sources of ecologic bias 820
29.8 Analysis of ecologic data 825
29.9 Non-ecologic group-level studies 826
30. A STRUCTURED APPROACH TO DATA ANALYSIS
30.1 Introduction 834
30.2 Data-collection sheets 834
30.3 Data coding 835
30.4 Data entry 835
30.5 Keeping track of files 836
30.6 Keeping track of variables 836
30.7 Program mode versus interactive processing 837
30.8 Data-editing 838
30.9 Data verification 839
30.10 Data processing—outcome variable(s) 839
30.11 Data processing—predictor variables 840
30.12 Data processing—multilevel data 840
30.13 Unconditional associations 841
30.14 Keeping track of your analyses 841
   31. DESCRIPTION OF DATASETS 843
   GLOSSARY AND TERMINOLOGY 861