Faculty of Health Sciences Confounding og stratificeret analyse Susanne Rosthøj Biostatistisk Afdeling Institut for Folkesundhedsvidenskab Københavns Universitet sr@biostat.ku.dk
Kursets form Seks fredage fra kl 9.15-16.. Kurset består af forelæsninger, computerøvelser og litteraturlæsning. Hjemmeside : https://ifsv.sund.ku.dk/biostat/autobiostat/index.php/course:mph2013 Laust Hvas Mortensen vil undervise i epidemiologi. Eksamen (gruppe) afholdes d. 14. juni. Målet er at I skal blive i stand til at udføre egne multiple regressionsanalyser af epidemiologiske studier og kritisk vurdere analyser i den epidemiologiske litteratur. 2 / 28
Program for dag 1 Uafhængighed mellem to variable Confounding Odds-ratio Mantel Haenszel stratificeret analyse test for association test for effektmodifikation Introduktion til SAS 3 / 28
Materiale til dag 1 Kirkwood & Sterne : Essential Medical Statistics. Kapitel 16.6-16.7, 17.1-17.2, 17.4, 18. Svend Juul : Kapitel 8.2. Eventuelt: Silva : Cancer Epidemiology. http://www.iarc.fr/en/publications/pdfs-online/epi/cancerepi/index.php Kapitel 14: Dealing with confounding in the analysis. 4 / 28
Epidemiology Study of distribution and determinants of disease frequency in human populations. We need: measures of disease frequency. Typically: disease outcome is binary and we may use: risk, rate, odds, prevalence. We want to compare these among exposed and non-exposed persons and, more generally to relate these to exposure variables / determinants and other explanatory variables. 5 / 28
Association between two categorical variables The Framingham study : Cohorte study of citizens aged 30-59 in Framingham Town in 1948, Massachusetts. 20 years of follow-up. Is there an association between sex and the risk of Coronary Heart Disease (CHD)? CHD 0 1 Females 616 (85.6%) 104 (14.4%) 720 Males 479 (74.5%) 164 (25.5%) 643 1095 (80.3%) 268 (19.7%) 1363 6 / 28
Independence Hypothesis : No difference in risk of CHD between males and females. CHD 0 1 Females 616 (85.6%) 104 (14.4%) 720 Males 479 (74.5%) 164 (25.5%) 643 1095 (80.3%) 268 (19.7%) 1363 We would expect the same proportion with CHD: Expected no of males : 0.197 720 = 268 1363720 = 141.6 Expected no of females : 0.197 643 = 268 1363643 = 126.4 Similarly for the proportions without CHD. 7 / 28
The chi-square-test The chi-square (χ 2 ) test is used to evaluate whether two categorical variables are associated: x 11 x 12 x 1l m 1 x 21 x 22 x 2l m 2 x k1 x k2 x kl m k n 1 n 2 n l N The expected number in each cell: (row total column total / total). E rs = ns N m r The chi-square test measures the distance between observed and expected values. 8 / 28
The chi-square-test Test statistic: χ 2 = r,s (x rs E rs ) 2 E rs Evaluation: H 0 is rejected for large values of χ 2. When all expected values are greater than 5, the p-value can be determined in a χ 2 distribution with df = (k 1) (m 1). The test is called the (Pearson) chi-square test. If some of the expected values are less than 5, Fishers exact test is performed instead. 9 / 28
Is there an association between sex and CHD? Table with observed and expected numbers: CHD 0 1 Females 616 (578.4) 104 (141.6) 720 Males 479 (516.6) 164 (126.4) 643 1095 268 1363 χ 2 = (616 578.4)2 + 578.4 = 26.31 (104 141.6)2 141.6 + (479 516.6)2 516.6 + (164 126.4)2 126.4 P-value (df=(2-1) (2-1)=1) : P<.0001. 10 / 28
Quantification of the difference in risk Risk of CHD for males: p 1 164/643 = 0.26 Risk of CHD for females: p 2 104/720 = 0.14 Odds of CHD for males: p 1 /(1 p 1 ) 164/479 = 0.34( 1 : 3) Odds of CHD for females: p 2 /(1 p 2 ) 104/616 = 0.17( 1 : 6) Quantification of the effect of sex on CHD: Absolute risk reduction (ARR): p 1 p 2 0.12 Relative risk (RR) : p 1 /p 2 1.77 Odds-ratio (OR): p 1 /(1 p 1 )/(p 2 /(1 p 2 )) 2.03. When p 1 and p 2 are small (<0.1) : RR OR. We have seen a difference in risk of CHD for males and females: p 1 p 2 ie. ARR > 0, RR 1, OR 1 11 / 28
Calculation of odds ratio in 2 2-tables Odds ratio: Sex no CHD CHD Total Females 616 104 720 Males 479 164 643 Total 1095 268 1363 OR = 164 643 / 479 643 104 720 / 616 720 = 164 616 104 479 = 2.03 The odds of CHD for males is the double of the odds of CHD for females. What is the odds of CHD for females vs males? 12 / 28
Confidence interval for odds ratio First compute ln(or) = ln( 616 164 479 104 ) = ln(2.03) = 0.708. 95% confidence interval for ln(or), from L 1 to L 2 where: L 1 = 0.708 1.96 1 616 + 1 164 + 1 479 + 1 104 = 0.708 0.273 = 0.569 L 2 = 0.708 + 0.273 = 0.847 95% confidence interval for OR: exp(l 1 ) = 1.54 to exp(l 2 ) = 2.66 13 / 28
Confounding. Do we always get a fair comparison between the groups? Males Young Females Young Old Old Not necessarily - a randomly selected exposed person tends to be older than a randomly chosen non-exposed. This is a problem if age is a risk factor for the outcome. 14 / 28
Confounding. A variable C is a potential confounder for the relation: E O if it is 1) related to the exposure: E C 2) an independent risk factor for the outcome: C O 3) not a consequence of the exposure: E C O That is: E C O 15 / 28
Adjustment for confounding using stratification Example: Relationship of age and systolic blood pressure to prevalence of MI in a sample of individuals in the Israeli Ischemic Heart Disease Study (Kahn & Sempos (1989)). Myocardial infarction Present Absent Total SBP 140 29 711 740 SBP< 140 27 1244 1271 Total 56 1955 2011 OR = 29 1244 711 27 = 1.88, CI = (1.10; 3.20), P = 0.02. 16 / 28
Combined analysis over strata Stratified analysis / the Mantel-Haenszel method. We have a series (here two) of two by two tables: one from each stratum. stratum 1 stratum k a b a b c d c d n n In each stratum, we can estimate odds ratio by a d b c = a d/n b c/n 17 / 28
Combined analysis over strata A common odds ratio for all strata may be estimated by w OR OR common = w e.g. all w = 1 k corresponding to the mean of the individual ORs. For weights w = b c n, the weighted average of separate ORs OR MH = b c n b c n a d b c is the Mantel-Haenszel estimator. In the example: = a d n b c n = 18 / 28 9 73 203 + 20 1171 1808 115 6 203 + 596 21 1808 = 1.57 = OR MH
The Mantel-Haenszel test Hypothesis : OR MH = 1. In each stratum we calculate the expected number of events among exposed: OBServed = a (a+b)(a+c) EXPected = n = E(a) (independence) SD = (a+b)(c+d)(a+c)(b+d) n 2 (n 1) = SD(a) The combined Mantel-Haenszel test statistic is ( a E(a)) 2 (SD(a)) 2 = X 2 MH χ 2 1 under H 0 In the example: ( (9 + 20) ( 15 124 19 / 28 203 + 41 616 1808 )) 2 15 124 79 188 (203) 2 202 + 41 616 1192 1767 (1808) 2 1807 = (29 23.13)2 12.32 = 2.80, P = 0.09
Interpretation OR MH is an estimate of the association between exposure (SBP) and outcome (prevalence of MI), adjusted for the confounder (age). XMH 2 is a test statistic for no association between exposure and outcome, adjusted for the confounder. 20 / 28
Confidence limits for common odds ratio 1) calculate ln(or MH ) ln(1.57) = 0.451 2) calculate: L 1 = ln(or MH ) 1.96 SD and L 2 = ln(or MH ) + 1.96 SD where SD = ln(or MH ) = X MH 2 0.451 2.80 = 0.270 that is: L 1 = 0.451 1.96 0.270 = 0.077 L 2 = 0.451 + 1.96 0.270 = 0.979 3) The 95% confidence limits for OR MH are from exp(l 1 ) = 0.93 to exp(l 2 ) = 2.66 21 / 28
Confounder adjustment for a binary outcome 1. Stratified analysis: Kirkwood & Sterne, Chapter 18. 2. Logistic regression analysis: Kirkwood & Sterne, Chapter 19-20. Using logistic regression it is also possible to estimate/test the effect of an exposure on an outcome adjusted for other variables. 22 / 28
When is the stratified analysis sensible? In the stratified analysis, we average the individual OR s from the separate strata. This makes sense if the individual OR s point in the same direction in all strata, i.e. = if there is no interaction between exposure and stratification variable on the outcome = if there is no effect-modification of the stratification variable on the relation between exposure and outcome 23 / 28
Tests for no interaction Age 60 MI cases MI negative Total SBP 140 9 115 124 SBP< 140 6 73 79 Total 15 188 203 OR=0.95 Age<60 MI cases MI negative Total SBP 140 20 596 616 SBP< 140 21 1171 1192 Total 41 1767 1808 OR=1.87 Interaction? (= Effect-modification?) Are the separate OR s, 0.95 and 1.87 different? Can be tested using logistic regression or the Breslow-Day test for homogeneity. 24 / 28
Breslow-Day s test of no interaction Hypothesis : Constant OR over all strata i.e. OR 1 = OR 2 = = OR k The Breslow-Day test compare observed and expected numbers in a chi-square statistic. df=k-1 (why?). Formulas are more involved. We need SAS! We find χ 2 = 1.16, df=1, P=0.28. I.e. no evidence of effect modification. 25 / 28
SAS-øvelser De første 9 spørgsmål drejer sig alle om Vietnam studiet (jvf. artikel af Laust, Batty et al. (2008)). SAS programmet vietnam1.sas læser data filen vietnam1.txt fra hjemmesiden. 26 / 28 1) Download SAS-filen og kør programmet. Gennemgå SAS-programmet og output-filen og prøv at forstå programmet. Hvilke variable er til rådighed? Hvor mange personer er der data på? Kan du forstå alle variablene? 2) Optæl hvor mange der er døde af hhv interne (sygdom el selvmord) og eksterne årsager (brug proc freq). 3) Definer en ny variabel doed som er 1 hvis personen er død (dvs vi ser bort fra dødsårsag). 4) Definer en ny variabel som inddeler IQ-variablen i to grupper: 1) Lav til normal IQ ( 110) og 2) høj IQ (> 110). 5) Er der en association mellem IQ (inddelt i to grupper) og død?
6) Se nærmere på aldersvariablen ved hjælp af proc univariate. Hvad tror du enheden er? Definer en ny aldersvariabel som er alder (ved examination) i år. Definer derefter en ny aldersvariabel hvor alderen er inddelt i tre grupper 36, 36-40, 40 år. Hvor mange individer er der i hver af disse tre grupper? 7) Beregn gennemsnitsalderen for hver aldersgruppe vha proc means. 8) Sammenlign ved brug af proc freq risikoen for død for lav/normal og høj IQ justeret for alder (i grupper). Fortolk Breslow-Day testet for ingen interaktion. 9) Sammenlign på tilsvarende risikoen for død for de to IQ-grupper justeret for etnicitet (1=hvid, 2=sort, 3,4,5=andet) hvor de tre sidste grupper er samlet i en gruppe. Fungerer denne analyse godt (hvad er problemet)? 27 / 28
10) SAS programmet israeli.sas svarer til eksemplet fra forelæsningen vedr. blodtryk og prevalens af MI. Kør programmet og rekonstruer resultaterne, som blev gennemgået, dvs. a. OR, CI og test af blodtryk-mi relationen (ujusteret) b. separate OR er i strata med 95% sikkerhedsintervaller c. Mantel-Haenszel estimatoren for den justerede OR med sikkerhedsinterval og test d. Fortolk Breslow-Day testet for ingen interaktion. 28 / 28