Confounding og stratificeret analyse

Relaterede dokumenter
Statistik for MPH: 7

Lineær og logistisk regression

Statistik for MPH: oktober Attributable risk, bestemmelse af stikprøvestørrelse (Silva: , )

Mantel-Haenszel analyser. Stratificerede epidemiologiske analyser

X M Y. What is mediation? Mediation analysis an introduction. Definition

Basic statistics for experimental medical researchers

Postoperative komplikationer

Reexam questions in Statistics and Evidence-based medicine, august sem. Medis/Medicin, Modul 2.4.

Statistik ved Bachelor-uddannelsen i folkesundhedsvidenskab. Mantel-Haenszel analyser

Analyse af binære responsvariable

Department of Public Health. Case-control design. Katrine Strandberg-Larsen Department of Public Health, Section of Social Medicine

OBSERVERENDE UNDERSØGELSER. Kim Overvad Institut for Epidemiologi og Socialmedicin Aarhus Universitet Forår 2002

applies equally to HRT and tibolone this should be made clear by replacing HRT with HRT or tibolone in the tibolone SmPC.

Dag 6: Interaktion. Overlevelsesanalyse

Vina Nguyen HSSP July 13, 2008

Faculty of Health Sciences. Logistisk regression: Kvantitative forklarende variable

Generalized Probit Model in Design of Dose Finding Experiments. Yuehui Wu Valerii V. Fedorov RSU, GlaxoSmithKline, US

MPH specialmodul i epidemiologi og biostatistik. SAS. Introduktion til SAS. Eksempel: Blodtryk og fedme

Årsagsteori. Kim Overvad Afdeling for Epidemiologi Institut for Folkesundhed Aarhus Universitet April 2011

Statistikøvelse Kandidatstudiet i Folkesundhedsvidenskab 28. September 2004

Overlevelse efter AMI. Hvilken betydning har følgende faktorer for risikoen for ikke at overleve: Køn og alder betragtes som confoundere.

Morten Frydenberg 26. april 2004

Kursus 02323: Introducerende Statistik. Forelæsning 12: Forsøgsplanlægning. Peder Bacher

Statistik ved Bachelor-uddannelsen i folkesundhedsvidenskab. Stratificerede analyser

Logistisk regression

Brystkræftscreening og overdiagnostik hvordan forstår vi stigningen i incidens?

Opgavebesvarelse, Basalkursus, uge 3

MPH specialmodul i epidemiologi og biostatistik. SAS. Introduktion til SAS. Eksempel: Blodtryk og fedme

Faculty of Health Sciences. Basal Statistik. Logistisk regression mm. Lene Theil Skovgaard. 5. marts 2018

Logistisk regression. Statistik Kandidatuddannelsen i Folkesundhedsvidenskab

9. Chi-i-anden test, case-control data, logistisk regression.

Logistisk regression

Det kunne godt se ud til at ikke-rygere er ældre. Spredningen ser ud til at være nogenlunde ens i de to grupper.

Ikke-parametriske tests

Opgavebesvarelse, Basalkursus, uge 3

Logistisk regression

Linear Programming ١ C H A P T E R 2

Statistik ved Bachelor-uddannelsen i folkesundhedsvidenskab. Eksamensopgave E05. Socialklasse og kronisk sygdom

Skriftlig eksamen Science statistik- ST501

Studiedesign. Rikke Guldberg Ulrik Schiøler Kesmodel Øjvind Lidegaard

Hvor skal man udføre akutmedicinsk forskning? Finn E. Nielsen Forskningslektor, overlæge, dr.med. MPA, MAppStat

Statistik ved Bachelor-uddannelsen i folkesundhedsvidenskab. Uafhængighedstestet

Introduktion til Statistik. Forelæsning 12: Inferens for andele. Peder Bacher

EPIDEMIOLOGI MODUL 7. April Søren Friis Institut for Epidemiologisk Kræftforskning Kræftens Bekæmpelse DAGENS PROGRAM

Generelle lineære modeller

Morten Frydenberg 14. marts 2006

Skriftlig Eksamen Kombinatorik, Sandsynlighed og Randomiserede Algoritmer (DM528)

Epidemiologisk design I. Eksperimentelle undersøgelser. Epidemiologisk design II. Randomiserede undersøgelser. Randomisering I.

Logistisk Regression - fortsat

Project Step 7. Behavioral modeling of a dual ported register set. 1/8/ L11 Project Step 5 Copyright Joanne DeGroat, ECE, OSU 1

Hyppigheds- og associationsmål. Kim Overvad Afdeling for Epidemiologi Institut for Folkesundhed Aarhus Universitet Februar 2011

Help / Hjælp

RE-EKSAMEN I EPIDEMIOLOGISKE METODER IT & Sundhed, 2. semester

Financial Literacy among 5-7 years old children

Epidemiologisk design I. Eksperimentelle undersøgelser. Epidemiologisk design II. Randomiserede undersøgelser. Randomisering II

Eksamen Bacheloruddannelsen i Medicin med industriel specialisering

Logistisk regression. Basal Statistik for medicinske PhD-studerende November 2008

Vores mange brugere på musskema.dk er rigtig gode til at komme med kvalificerede ønsker og behov.

Agenda. The need to embrace our complex health care system and learning to do so. Christian von Plessen Contributors to healthcare services in Denmark

Danish Language Course for International University Students Copenhagen, 12 July 1 August Application form

Erik Parner Sektion for Biostatistik. Biostatistisk metode et par eksempler

Log-lineære modeller. Analyse af symmetriske sammenhænge mellem kategoriske variable. Ordinal information ignoreres.

PARALLELIZATION OF ATTILA SIMULATOR WITH OPENMP MIGUEL ÁNGEL MARTÍNEZ DEL AMOR MINIPROJECT OF TDT24 NTNU

Sport for the elderly

Det er muligt at chekce følgende opg. i CodeJudge: og

Hypoteser om mere end to stikprøver ANOVA. k stikprøver: (ikke ordinale eller højere) gælder også for k 2! : i j

Hvordan får vi bugt med det fedmefremmende samfund?

Skriftlig Eksamen Beregnelighed (DM517)

Ved undervisningen i epidemiologi/statistik den 8. og 10. november 2011 vil vi lægge hovedvægten på en fælles diskussion af følgende fire artikler:

IBM Network Station Manager. esuite 1.5 / NSM Integration. IBM Network Computer Division. tdc - 02/08/99 lotusnsm.prz Page 1

Effektmålsmodifikation

Man indlæser en såkaldt frequency-table i SAS ved følgende kommandoer:

Region Hovedstaden. Forskningscenter for Forebyggelse og Sundhed. Salt og Sundhed. Ulla Toft Forskningscenter for Forebyggelse og Sundhed

Opgavebesvarelse, logistisk regression

Transkript:

Faculty of Health Sciences Confounding og stratificeret analyse Susanne Rosthøj Biostatistisk Afdeling Institut for Folkesundhedsvidenskab Københavns Universitet sr@biostat.ku.dk

Kursets form Seks fredage fra kl 9.15-16.. Kurset består af forelæsninger, computerøvelser og litteraturlæsning. Hjemmeside : https://ifsv.sund.ku.dk/biostat/autobiostat/index.php/course:mph2013 Laust Hvas Mortensen vil undervise i epidemiologi. Eksamen (gruppe) afholdes d. 14. juni. Målet er at I skal blive i stand til at udføre egne multiple regressionsanalyser af epidemiologiske studier og kritisk vurdere analyser i den epidemiologiske litteratur. 2 / 28

Program for dag 1 Uafhængighed mellem to variable Confounding Odds-ratio Mantel Haenszel stratificeret analyse test for association test for effektmodifikation Introduktion til SAS 3 / 28

Materiale til dag 1 Kirkwood & Sterne : Essential Medical Statistics. Kapitel 16.6-16.7, 17.1-17.2, 17.4, 18. Svend Juul : Kapitel 8.2. Eventuelt: Silva : Cancer Epidemiology. http://www.iarc.fr/en/publications/pdfs-online/epi/cancerepi/index.php Kapitel 14: Dealing with confounding in the analysis. 4 / 28

Epidemiology Study of distribution and determinants of disease frequency in human populations. We need: measures of disease frequency. Typically: disease outcome is binary and we may use: risk, rate, odds, prevalence. We want to compare these among exposed and non-exposed persons and, more generally to relate these to exposure variables / determinants and other explanatory variables. 5 / 28

Association between two categorical variables The Framingham study : Cohorte study of citizens aged 30-59 in Framingham Town in 1948, Massachusetts. 20 years of follow-up. Is there an association between sex and the risk of Coronary Heart Disease (CHD)? CHD 0 1 Females 616 (85.6%) 104 (14.4%) 720 Males 479 (74.5%) 164 (25.5%) 643 1095 (80.3%) 268 (19.7%) 1363 6 / 28

Independence Hypothesis : No difference in risk of CHD between males and females. CHD 0 1 Females 616 (85.6%) 104 (14.4%) 720 Males 479 (74.5%) 164 (25.5%) 643 1095 (80.3%) 268 (19.7%) 1363 We would expect the same proportion with CHD: Expected no of males : 0.197 720 = 268 1363720 = 141.6 Expected no of females : 0.197 643 = 268 1363643 = 126.4 Similarly for the proportions without CHD. 7 / 28

The chi-square-test The chi-square (χ 2 ) test is used to evaluate whether two categorical variables are associated: x 11 x 12 x 1l m 1 x 21 x 22 x 2l m 2 x k1 x k2 x kl m k n 1 n 2 n l N The expected number in each cell: (row total column total / total). E rs = ns N m r The chi-square test measures the distance between observed and expected values. 8 / 28

The chi-square-test Test statistic: χ 2 = r,s (x rs E rs ) 2 E rs Evaluation: H 0 is rejected for large values of χ 2. When all expected values are greater than 5, the p-value can be determined in a χ 2 distribution with df = (k 1) (m 1). The test is called the (Pearson) chi-square test. If some of the expected values are less than 5, Fishers exact test is performed instead. 9 / 28

Is there an association between sex and CHD? Table with observed and expected numbers: CHD 0 1 Females 616 (578.4) 104 (141.6) 720 Males 479 (516.6) 164 (126.4) 643 1095 268 1363 χ 2 = (616 578.4)2 + 578.4 = 26.31 (104 141.6)2 141.6 + (479 516.6)2 516.6 + (164 126.4)2 126.4 P-value (df=(2-1) (2-1)=1) : P<.0001. 10 / 28

Quantification of the difference in risk Risk of CHD for males: p 1 164/643 = 0.26 Risk of CHD for females: p 2 104/720 = 0.14 Odds of CHD for males: p 1 /(1 p 1 ) 164/479 = 0.34( 1 : 3) Odds of CHD for females: p 2 /(1 p 2 ) 104/616 = 0.17( 1 : 6) Quantification of the effect of sex on CHD: Absolute risk reduction (ARR): p 1 p 2 0.12 Relative risk (RR) : p 1 /p 2 1.77 Odds-ratio (OR): p 1 /(1 p 1 )/(p 2 /(1 p 2 )) 2.03. When p 1 and p 2 are small (<0.1) : RR OR. We have seen a difference in risk of CHD for males and females: p 1 p 2 ie. ARR > 0, RR 1, OR 1 11 / 28

Calculation of odds ratio in 2 2-tables Odds ratio: Sex no CHD CHD Total Females 616 104 720 Males 479 164 643 Total 1095 268 1363 OR = 164 643 / 479 643 104 720 / 616 720 = 164 616 104 479 = 2.03 The odds of CHD for males is the double of the odds of CHD for females. What is the odds of CHD for females vs males? 12 / 28

Confidence interval for odds ratio First compute ln(or) = ln( 616 164 479 104 ) = ln(2.03) = 0.708. 95% confidence interval for ln(or), from L 1 to L 2 where: L 1 = 0.708 1.96 1 616 + 1 164 + 1 479 + 1 104 = 0.708 0.273 = 0.569 L 2 = 0.708 + 0.273 = 0.847 95% confidence interval for OR: exp(l 1 ) = 1.54 to exp(l 2 ) = 2.66 13 / 28

Confounding. Do we always get a fair comparison between the groups? Males Young Females Young Old Old Not necessarily - a randomly selected exposed person tends to be older than a randomly chosen non-exposed. This is a problem if age is a risk factor for the outcome. 14 / 28

Confounding. A variable C is a potential confounder for the relation: E O if it is 1) related to the exposure: E C 2) an independent risk factor for the outcome: C O 3) not a consequence of the exposure: E C O That is: E C O 15 / 28

Adjustment for confounding using stratification Example: Relationship of age and systolic blood pressure to prevalence of MI in a sample of individuals in the Israeli Ischemic Heart Disease Study (Kahn & Sempos (1989)). Myocardial infarction Present Absent Total SBP 140 29 711 740 SBP< 140 27 1244 1271 Total 56 1955 2011 OR = 29 1244 711 27 = 1.88, CI = (1.10; 3.20), P = 0.02. 16 / 28

Combined analysis over strata Stratified analysis / the Mantel-Haenszel method. We have a series (here two) of two by two tables: one from each stratum. stratum 1 stratum k a b a b c d c d n n In each stratum, we can estimate odds ratio by a d b c = a d/n b c/n 17 / 28

Combined analysis over strata A common odds ratio for all strata may be estimated by w OR OR common = w e.g. all w = 1 k corresponding to the mean of the individual ORs. For weights w = b c n, the weighted average of separate ORs OR MH = b c n b c n a d b c is the Mantel-Haenszel estimator. In the example: = a d n b c n = 18 / 28 9 73 203 + 20 1171 1808 115 6 203 + 596 21 1808 = 1.57 = OR MH

The Mantel-Haenszel test Hypothesis : OR MH = 1. In each stratum we calculate the expected number of events among exposed: OBServed = a (a+b)(a+c) EXPected = n = E(a) (independence) SD = (a+b)(c+d)(a+c)(b+d) n 2 (n 1) = SD(a) The combined Mantel-Haenszel test statistic is ( a E(a)) 2 (SD(a)) 2 = X 2 MH χ 2 1 under H 0 In the example: ( (9 + 20) ( 15 124 19 / 28 203 + 41 616 1808 )) 2 15 124 79 188 (203) 2 202 + 41 616 1192 1767 (1808) 2 1807 = (29 23.13)2 12.32 = 2.80, P = 0.09

Interpretation OR MH is an estimate of the association between exposure (SBP) and outcome (prevalence of MI), adjusted for the confounder (age). XMH 2 is a test statistic for no association between exposure and outcome, adjusted for the confounder. 20 / 28

Confidence limits for common odds ratio 1) calculate ln(or MH ) ln(1.57) = 0.451 2) calculate: L 1 = ln(or MH ) 1.96 SD and L 2 = ln(or MH ) + 1.96 SD where SD = ln(or MH ) = X MH 2 0.451 2.80 = 0.270 that is: L 1 = 0.451 1.96 0.270 = 0.077 L 2 = 0.451 + 1.96 0.270 = 0.979 3) The 95% confidence limits for OR MH are from exp(l 1 ) = 0.93 to exp(l 2 ) = 2.66 21 / 28

Confounder adjustment for a binary outcome 1. Stratified analysis: Kirkwood & Sterne, Chapter 18. 2. Logistic regression analysis: Kirkwood & Sterne, Chapter 19-20. Using logistic regression it is also possible to estimate/test the effect of an exposure on an outcome adjusted for other variables. 22 / 28

When is the stratified analysis sensible? In the stratified analysis, we average the individual OR s from the separate strata. This makes sense if the individual OR s point in the same direction in all strata, i.e. = if there is no interaction between exposure and stratification variable on the outcome = if there is no effect-modification of the stratification variable on the relation between exposure and outcome 23 / 28

Tests for no interaction Age 60 MI cases MI negative Total SBP 140 9 115 124 SBP< 140 6 73 79 Total 15 188 203 OR=0.95 Age<60 MI cases MI negative Total SBP 140 20 596 616 SBP< 140 21 1171 1192 Total 41 1767 1808 OR=1.87 Interaction? (= Effect-modification?) Are the separate OR s, 0.95 and 1.87 different? Can be tested using logistic regression or the Breslow-Day test for homogeneity. 24 / 28

Breslow-Day s test of no interaction Hypothesis : Constant OR over all strata i.e. OR 1 = OR 2 = = OR k The Breslow-Day test compare observed and expected numbers in a chi-square statistic. df=k-1 (why?). Formulas are more involved. We need SAS! We find χ 2 = 1.16, df=1, P=0.28. I.e. no evidence of effect modification. 25 / 28

SAS-øvelser De første 9 spørgsmål drejer sig alle om Vietnam studiet (jvf. artikel af Laust, Batty et al. (2008)). SAS programmet vietnam1.sas læser data filen vietnam1.txt fra hjemmesiden. 26 / 28 1) Download SAS-filen og kør programmet. Gennemgå SAS-programmet og output-filen og prøv at forstå programmet. Hvilke variable er til rådighed? Hvor mange personer er der data på? Kan du forstå alle variablene? 2) Optæl hvor mange der er døde af hhv interne (sygdom el selvmord) og eksterne årsager (brug proc freq). 3) Definer en ny variabel doed som er 1 hvis personen er død (dvs vi ser bort fra dødsårsag). 4) Definer en ny variabel som inddeler IQ-variablen i to grupper: 1) Lav til normal IQ ( 110) og 2) høj IQ (> 110). 5) Er der en association mellem IQ (inddelt i to grupper) og død?

6) Se nærmere på aldersvariablen ved hjælp af proc univariate. Hvad tror du enheden er? Definer en ny aldersvariabel som er alder (ved examination) i år. Definer derefter en ny aldersvariabel hvor alderen er inddelt i tre grupper 36, 36-40, 40 år. Hvor mange individer er der i hver af disse tre grupper? 7) Beregn gennemsnitsalderen for hver aldersgruppe vha proc means. 8) Sammenlign ved brug af proc freq risikoen for død for lav/normal og høj IQ justeret for alder (i grupper). Fortolk Breslow-Day testet for ingen interaktion. 9) Sammenlign på tilsvarende risikoen for død for de to IQ-grupper justeret for etnicitet (1=hvid, 2=sort, 3,4,5=andet) hvor de tre sidste grupper er samlet i en gruppe. Fungerer denne analyse godt (hvad er problemet)? 27 / 28

10) SAS programmet israeli.sas svarer til eksemplet fra forelæsningen vedr. blodtryk og prevalens af MI. Kør programmet og rekonstruer resultaterne, som blev gennemgået, dvs. a. OR, CI og test af blodtryk-mi relationen (ujusteret) b. separate OR er i strata med 95% sikkerhedsintervaller c. Mantel-Haenszel estimatoren for den justerede OR med sikkerhedsinterval og test d. Fortolk Breslow-Day testet for ingen interaktion. 28 / 28