Statistik for MPH: oktober Attributable risk, bestemmelse af stikprøvestørrelse (Silva: , )

Relaterede dokumenter
Statistik for MPH: 7

Statistik for MPH: november Attributable risk, bestemmelse af stikprøvestørrelse (Silva: , )

Basic statistics for experimental medical researchers

Kursus 02323: Introducerende Statistik. Forelæsning 12: Forsøgsplanlægning. Peder Bacher

Generalized Probit Model in Design of Dose Finding Experiments. Yuehui Wu Valerii V. Fedorov RSU, GlaxoSmithKline, US

Vina Nguyen HSSP July 13, 2008

X M Y. What is mediation? Mediation analysis an introduction. Definition

Department of Public Health. Case-control design. Katrine Strandberg-Larsen Department of Public Health, Section of Social Medicine

applies equally to HRT and tibolone this should be made clear by replacing HRT with HRT or tibolone in the tibolone SmPC.

Reexam questions in Statistics and Evidence-based medicine, august sem. Medis/Medicin, Modul 2.4.

Measuring the Impact of Bicycle Marketing Messages. Thomas Krag Mobility Advice Trafikdage i Aalborg,

Besvarelser til Lineær Algebra Reeksamen Februar 2017

Confounding og stratificeret analyse

OBSERVERENDE UNDERSØGELSER. Kim Overvad Institut for Epidemiologi og Socialmedicin Aarhus Universitet Forår 2002

Skriftlig Eksamen Kombinatorik, Sandsynlighed og Randomiserede Algoritmer (DM528)

On the complexity of drawing trees nicely: corrigendum

Kommentarer til spørgsmålene til artikel 1: Ethnic differences in mortality from sudden death syndrome in New Zealand, Mitchell et al., BMJ 1993.

Løsning til eksaminen d. 29. maj 2009

University of Copenhagen Faculty of Science Written Exam April Algebra 3

DoodleBUGS (Hands-on)

Morten Frydenberg 25. april 2006

Project Step 7. Behavioral modeling of a dual ported register set. 1/8/ L11 Project Step 5 Copyright Joanne DeGroat, ECE, OSU 1

Statistik. Statistik. Hvad er Statistik? Hvad er Statistik? Hvad er Statistik? 1. Hvad er statistik? 2. Mennesker som måleinstrumenter

Vores mange brugere på musskema.dk er rigtig gode til at komme med kvalificerede ønsker og behov.

Sport for the elderly

Oversigt. 1 Motiverende eksempel - energiforbrug. 2 Hypotesetest (Repetition) 3 Two-sample t-test og p-værdi. 4 Konfidensinterval for forskellen

University of Copenhagen Faculty of Science Written Exam - 3. April Algebra 3

Løsning til eksaminen d. 14. december 2009

Skriftlig Eksamen Beregnelighed (DM517)

IBM Network Station Manager. esuite 1.5 / NSM Integration. IBM Network Computer Division. tdc - 02/08/99 lotusnsm.prz Page 1

Measuring Evolution of Populations

DONG-område Resten af landet

Agenda. The need to embrace our complex health care system and learning to do so. Christian von Plessen Contributors to healthcare services in Denmark

Brystkræftscreening og overdiagnostik hvordan forstår vi stigningen i incidens?

Årsagsteori. Kim Overvad Afdeling for Epidemiologi Institut for Folkesundhed Aarhus Universitet April 2011

A multimodel data assimilation framework for hydrology

Kursus 02402/02323 Introduktion til statistik. Forelæsning 13: Et overblik over kursets indhold. Klaus K. Andersen og Per Bruun Brockhoff

Cross-Sectorial Collaboration between the Primary Sector, the Secondary Sector and the Research Communities

Help / Hjælp

Trolling Master Bornholm 2012

User Manual for LTC IGNOU

Skriftlig Eksamen Beregnelighed (DM517)

Statistik ved Bachelor-uddannelsen i folkesundhedsvidenskab. Uafhængighedstestet

Aktivering af Survey funktionalitet

25. april Probability of Developing Coronary Heart Disease in 6 years. Women (Aged 35-70) 160 No Yes

Linear Programming ١ C H A P T E R 2

GUIDE TIL BREVSKRIVNING

Kvant Eksamen December timer med hjælpemidler. 1 Hvad er en continuous variable? Giv 2 illustrationer.

Introduktion til Statistik. Forelæsning 10: Inferens for andele. Peder Bacher

RoE timestamp and presentation time in past

Skriftlig Eksamen Diskret matematik med anvendelser (DM72)

Mantel-Haenszel analyser. Stratificerede epidemiologiske analyser

Applications. Computational Linguistics: Jordan Boyd-Graber University of Maryland RL FOR MACHINE TRANSLATION. Slides adapted from Phillip Koehn

CS 4390/5387 SOFTWARE V&V LECTURE 5 BLACK-BOX TESTING - 2

Sign variation, the Grassmannian, and total positivity

Introduktion til Statistik. Forelæsning 12: Inferens for andele. Peder Bacher

Ikke-parametriske tests

1 enote 1: Simple plots og deskriptive statistik. 2 enote2: Diskrete fordelinger. 3 enote 2: Kontinuerte fordelinger

Privat-, statslig- eller regional institution m.v. Andet Added Bekaempelsesudfoerende: string No Label: Bekæmpelsesudførende

Kursus 02402/02323 Introducerende Statistik. Forelæsning 6: Sammenligning af to grupper

Skriftlig eksamen Science statistik- ST501

Sustainable use of pesticides on Danish golf courses

University of Copenhagen Faculty of Science Written Exam - 8. April Algebra 3

Financial Literacy among 5-7 years old children

Brug sømbrættet til at lave sjove figurer. Lav fx: Få de andre til at gætte, hvad du har lavet. Use the nail board to make funny shapes.

Logistisk Regression - fortsat

PARALLELIZATION OF ATTILA SIMULATOR WITH OPENMP MIGUEL ÁNGEL MARTÍNEZ DEL AMOR MINIPROJECT OF TDT24 NTNU

Fejlbeskeder i SMDB. Business Rules Fejlbesked Kommentar. Validate Business Rules. Request- ValidateRequestRegist ration (Rules :1)

Engelsk. Niveau D. De Merkantile Erhvervsuddannelser September Casebaseret eksamen. og

IBM WebSphere Operational Decision Management

How Long Is an Hour? Family Note HOME LINK 8 2

Trolling Master Bornholm 2016 Nyhedsbrev nr. 6

Bookingmuligheder for professionelle brugere i Dansehallerne

Ovl. Hans Mørch Jensen Prof. L. V. Kessing. Prof. Ø. Lidegaard Prof. P. K. Andersen PhD, MD, L. H. Pedersen Biostatistiker Randi Grøn

9. Chi-i-anden test, case-control data, logistisk regression.

Analyseinstitut for Forskning

Krav til bestyrelser og arbejdsdeling med direktionen

Bilag. Resume. Side 1 af 12

Statistical information form the Danish EPC database - use for the building stock model in Denmark

Confounding for viderekommende. Laust H Mortensen, Department of Social Medicine University of Copenhagen

Breaking Industrial Ciphers at a Whim MATE SOOS PRESENTATION AT HES 11

Implementing SNOMED CT in a Danish region. Making sharable and comparable nursing documentation

Trolling Master Bornholm 2013

Projekt DATA step view

Exam questions in Statistics and evidence-based medicine, spring sem. Medis/Medicin, Modul 2.4.

The complete construction for copying a segment, AB, is shown above. Describe each stage of the process.

The X Factor. Målgruppe. Læringsmål. Introduktion til læreren klasse & ungdomsuddannelser Engelskundervisningen

Particle-based T-Spline Level Set Evolution for 3D Object Reconstruction with Range and Volume Constraints

NOTIFICATION. - An expression of care

Molio specifications, development and challenges. ICIS DA 2019 Portland, Kim Streuli, Molio,

Jens Olesen, MEd Fysioterapeut, Klinisk vejleder Specialist i rehabilitering

Logistisk regression

Kalkulation: Hvordan fungerer tal? Jan Mouritsen, professor Institut for Produktion og Erhvervsøkonomi

To the reader: Information regarding this document

Motorway effects on local population and labor market

Løsning eksamen d. 15. december 2008

Nyhedsmail, december 2013 (scroll down for English version)

Trolling Master Bornholm 2014

ATEX direktivet. Vedligeholdelse af ATEX certifikater mv. Steen Christensen

Business Rules Fejlbesked Kommentar

Transkript:

Statistik for MPH: 7 29. oktober 2015 www.biostat.ku.dk/~pka/mph15 Attributable risk, bestemmelse af stikprøvestørrelse (Silva: 333-365, 381-383) Per Kragh Andersen 1

Fra den 6. uges statistikundervisning: skulle jeg gerne 1. forstå, at de parametre, som estimeres ved hjælp af logistisk regression kan fortolkes som odds ratio (henholdsvis: ln(odds ratio)), 2. forstå, at for en kategorisk forklarende variabel er disse OR r beregnet i forhold til en valgt reference kategori, 3. forstå, at for en kvantitativ forklarende variabel beskriver disse OR r, hvor meget odds stiger, når den forklarende variabel stiger 1 enhed, 4. forstå, at OR r fra modeller med flere forklarende variable er indbyrdes justerede, 5. forstå, at når der er flere forklarende variable i spil, er der mange mulige måder at vælge modellen på. 6. forstå, at log(rate ratioer), estimeret i enten Cox- eller Poisson- regression har tilsvarende egenskaber 7. forstå, at forskel mellem middelværdier estimeret i lineær regression også har tilsvarende egenskaber 2

Fra den 6. uges statistikundervisning behøver jeg derimod ikke nødvendigvis: 1. at have forstået, hvordan regressionsanalyseprogrammet opnår de estimerede parametre og deres SD/sikkerhedsinterval fra datasættet og modellen 2. at have forstået, hvad de præcise forudsætninger er for modellerne, og hvordan de kontrolleres 3. at have forstået, hvordan interaktion/effekt modifikation håndteres ved hjælp af regressionsanalyse 3

Attributable risks, AR, (excess fractions) Example: Lung cancer Exposure A (cigarette smoking) RR A = 10 Exposure B (uranium mining) RR B = 20 Which exposure has the greatest public health impact? Suppose that Q A = 40% of the population smokes Q B = 0.04% of the population mines uranium Attributable risks are measures which combine relative risk and exposure prevalence. Two types of AR (or excess fractions) (Silva, pp. 97-99, 356-62, 381-83): 1. AR among exposed (Silva: excess fraction %) 2. AR in the total population (Silva: population excess fraction %) 4

Exposed B A Population C Non-Exposed D 5

Notation T = A + B + C + D (total population size) Q = A+B T C P 0 = C+D A P e = A+B RR = P e P 0 P T = A+C T = Q P e + (1 Q) P 0 (because Q P e + (1 Q)P 0 = A+B A T A+B + C+D C T C+D ) = A+C T (proportion exposed= exposure prevalence ) (risk among non-exposed) (risk among exposed) (relative risk) (risk in total population) 6

AR among exposed For some of those in the exposed group, disease occurrence will not be due to exposure : P e = Risk due to exposure + P 0, (if P e P 0, i.e., if RR 1). AR (among exposed) = Proportion of risk among exposed which is due to exposure = P e P 0 P e = RR 1 RR. 7

AR in total population (PAR) (more important concept of the two) P AR = Proportion of risk in total population which is due to exposure = P T P 0 P T = QP e+(1 Q)P 0 P 0 QP e +(1 Q)P 0 = = Q (P e P 0 ) P 0 +Q (P e P 0 ) (Divide by P 0 ) Q (RR 1) 1+Q (RR 1). Estimation: estimate Q by q = a+b n, estimate RR (or use OR). Confidence limits: exist (but will be skipped here) See Table 16.2 in Silva, p.361: P AR for combinations of Q and RR. 8

9

Exercise Calculate AR s from the following table on current cigarette smoking and lung cancer mortality among US veterans. Smoking Events No-events Total Current cigarette 1116 700652 701768 All others 426 1015573 1015999 Total 1542 1716225 1717767 10

Solution: Prevalence of smoking q = AR = p e p 0 p e = 701768 1717767 1116 701768 426 1015999 1116 701768 = 0.41 = 0.736 RR = p e p 0 = 3.79 P AR = = 0.53 0.41 (3.79 1) 1 + 0.41 (3.79 1) 11

Sample size determination. When planning investigations: How many persons are needed? For what purpose? (1) To obtain a given precision of an estimate: Silva, Section 15.3. (2) To obtain a given power of a test (the most common situation): Silva, Section 15.2. (1) is rarely used in practice and will be skipped here. Instead, a slightly different approach to (2) (than in Silva s book) but leading to the same results. 12

Testing; power. Example: We study pregnant women with pre-eclampsia and wish to compare two treatments with respect to the risk of some pregnancy outcome, e.g. preterm birth. We want to be pretty certain to detect a treatment (exposure) effect of D (a risk difference) - what do we mean by pretty certain? We need the statistical concept of the power of a test. If we test using a given level of significance α (i.e. 5%) and if the true treatment difference is D then we want to have a large probability of rejecting the null hypothesis: D = 0. This probability, 1 β is the power, often set to at least 80%. Note: β is called the Type 2 error risk and α is called the Type 1 error risk. 13

Accept Reject H 0 correct Type 1 error α H 0 wrong Type 2 error β power 1 β In general: the larger power we want and the smaller α we use, the larger needs n to be. (However, we always have α = 0.05!) The smaller D, the larger needs n to be. To find n, a good guess of the risk in the control group (p 1 ) is needed. Letting p 2 = p 1 D, then it can be shown that n = p 1(1 p 1 ) + p 2 (1 p 2 ) D 2 f(α, β) is the number of women needed in each group. 14

Here, f(α, β) is given by: α β 0.01 0.05 0.10 0.05 17.8 13.0 10.8 0.10 14.9 10.5 8.6 0.15 13.0 9.0 7.2 0.20 11.7 7.9 6.2 0.25 10.6 6.9 5.4 Example 1: p 1 = 0.15, D = 0.07, α = 0.05, β = 0.20 Then, in each group we need: n = 0.15 0.85+0.08 0.92 0.07 2 7.9 = 324. Example 2: p 1 = 0.1, RR = 1.5, α = 0.05, β = 0.20 Then: p 2 = p 1 RR = 0.15, D = 0.05 and n = 0.1 0.9+0.15 0.85 0.05 2 7.9 = 687. 15

Finding the power based on the sample size Some times, the maximally obtainable sample size is given and we wish to assess how large the power is for some given value of the treatment difference D. The relationship is still given by: n = p 1(1 p 1 )+p 2 (1 p 2 ) D 2 f(α, β). E.g. n = 500 in each group and p 1 = 0.05, D = 0.05 (i.e., p 2 = 0.1) gives 500 = 0.05 0.95+0.1 0.9 0.05 2 f(α, β) or f(α, β) = 500/55 = 9.09 or β 0.15 if α = 0.05 (because the number in the α = 0.05 column in the table closest to 9.09 is 9.0 corresponding to β = 0.15). That is, the power is 0.85. 16

Unequal group sizes. If the two groups do not have the same size: first compute the total size N = 2n as if the two groups were equally large, then compute k = n 1 /n 2 = the ratio between the group sizes the total number needed is then N = N (1+k)2 4k. Example. If, in the first example above, group 1 is twice as big as group 2: N = 2 324 = 648 k = 2 N = N (1+k)2 4k = 648 9 8 = 729, i.e. n 1 = 486, n 2 = 243. 17

Case control studies To do sample size calculations for case-control studies one may use the fact that disease OR = exposure OR. Example: Unemployment and heart disease. Suppose that 20% of cases are exposed (out of job) we can afford n = 300 cases and n = 300 controls we believe that OR = 2 What is the power when α = 0.05? 18

If p 1 = 0.2 = prob(exposed among cases), then odds for exposure among cases is p 1 1 p 1 = 0.25. Since OR = 2, odds for exposure among controls must be 0.25/2=0.125, and p 2 = prob(exposure among controls) is The standard formula then gives p 2 = 0.125 1 + 0.125 = 0.11. n = 300 = f(0.05, β) 0.2 0.8 + 0.11 0.89 (0.2 0.11) 2, leading to f(0.05, β) = 9.4, and from the table we find that the power is between 80% and 90%. 19

Fra den 7. uges statistikundervisning: skulle jeg gerne 1. forstå, hvordan man med størrelsen population attributable risk P AR = Q (RR 1) 1 + Q (RR 1) kan kombinere hyppigheden, Q, af en risikofaktor og dens effekt, RR, til et folkesundhedsvidenskabeligt relevant mål for, hvor stor en andel af et observeret antal sygdomstilfælde som kan tilskrives risikofaktoren 2. kunne vurdere, hvor stor en stikprøve der behøves for at kunne afsløre en given forskel mellem to hyppigheder med en given styrke 20

Fra den 7. uges statistikundervisning behøver jeg derimod ikke nødvendigvis: 1. at have forstået, hvordan formlen for stikprøvestørrelsen n er fremkommet 21