Statistik for MPH: november Attributable risk, bestemmelse af stikprøvestørrelse (Silva: , )

Relaterede dokumenter
Statistik for MPH: 7

Statistik for MPH: oktober Attributable risk, bestemmelse af stikprøvestørrelse (Silva: , )

Basic statistics for experimental medical researchers

Generalized Probit Model in Design of Dose Finding Experiments. Yuehui Wu Valerii V. Fedorov RSU, GlaxoSmithKline, US

X M Y. What is mediation? Mediation analysis an introduction. Definition

Kursus 02323: Introducerende Statistik. Forelæsning 12: Forsøgsplanlægning. Peder Bacher

Vina Nguyen HSSP July 13, 2008

Confounding og stratificeret analyse

Department of Public Health. Case-control design. Katrine Strandberg-Larsen Department of Public Health, Section of Social Medicine

OBSERVERENDE UNDERSØGELSER. Kim Overvad Institut for Epidemiologi og Socialmedicin Aarhus Universitet Forår 2002

Reexam questions in Statistics and Evidence-based medicine, august sem. Medis/Medicin, Modul 2.4.

Kommentarer til spørgsmålene til artikel 1: Ethnic differences in mortality from sudden death syndrome in New Zealand, Mitchell et al., BMJ 1993.

applies equally to HRT and tibolone this should be made clear by replacing HRT with HRT or tibolone in the tibolone SmPC.

Measuring the Impact of Bicycle Marketing Messages. Thomas Krag Mobility Advice Trafikdage i Aalborg,

Brystkræftscreening og overdiagnostik hvordan forstår vi stigningen i incidens?

Besvarelser til Lineær Algebra Reeksamen Februar 2017

Cross-Sectorial Collaboration between the Primary Sector, the Secondary Sector and the Research Communities

Mantel-Haenszel analyser. Stratificerede epidemiologiske analyser

Dag 6: Interaktion. Overlevelsesanalyse

Skriftlig Eksamen Kombinatorik, Sandsynlighed og Randomiserede Algoritmer (DM528)

DoodleBUGS (Hands-on)

Statistik. Statistik. Hvad er Statistik? Hvad er Statistik? Hvad er Statistik? 1. Hvad er statistik? 2. Mennesker som måleinstrumenter

On the complexity of drawing trees nicely: corrigendum

Løsning til eksaminen d. 29. maj 2009

Årsagsteori. Kim Overvad Afdeling for Epidemiologi Institut for Folkesundhed Aarhus Universitet April 2011

Agenda. The need to embrace our complex health care system and learning to do so. Christian von Plessen Contributors to healthcare services in Denmark

25. april Probability of Developing Coronary Heart Disease in 6 years. Women (Aged 35-70) 160 No Yes

Vores mange brugere på musskema.dk er rigtig gode til at komme med kvalificerede ønsker og behov.

University of Copenhagen Faculty of Science Written Exam April Algebra 3

Postoperative komplikationer

Jens Olesen, MEd Fysioterapeut, Klinisk vejleder Specialist i rehabilitering

University of Copenhagen Faculty of Science Written Exam - 3. April Algebra 3

Measuring Evolution of Populations

Kvant Eksamen December timer med hjælpemidler. 1 Hvad er en continuous variable? Giv 2 illustrationer.

IBM Network Station Manager. esuite 1.5 / NSM Integration. IBM Network Computer Division. tdc - 02/08/99 lotusnsm.prz Page 1

Sport for the elderly

GUIDE TIL BREVSKRIVNING

RoE timestamp and presentation time in past

Skriftlig Eksamen Beregnelighed (DM517)

Aktivering af Survey funktionalitet

Kursus 02402/02323 Introduktion til statistik. Forelæsning 13: Et overblik over kursets indhold. Klaus K. Andersen og Per Bruun Brockhoff

Help / Hjælp

User Manual for LTC IGNOU

Project Step 7. Behavioral modeling of a dual ported register set. 1/8/ L11 Project Step 5 Copyright Joanne DeGroat, ECE, OSU 1

Ovl. Hans Mørch Jensen Prof. L. V. Kessing. Prof. Ø. Lidegaard Prof. P. K. Andersen PhD, MD, L. H. Pedersen Biostatistiker Randi Grøn

A multimodel data assimilation framework for hydrology

Skriftlig Eksamen Beregnelighed (DM517)

Lineær og logistisk regression

Applications. Computational Linguistics: Jordan Boyd-Graber University of Maryland RL FOR MACHINE TRANSLATION. Slides adapted from Phillip Koehn

Morten Frydenberg 25. april 2006

Analyseinstitut for Forskning

Patientinddragelse i forskning. Lars Henrik Jensen Overlæge, ph.d., lektor

Confounding for viderekommende. Laust H Mortensen, Department of Social Medicine University of Copenhagen

Løsning til eksaminen d. 14. december 2009

Implementing SNOMED CT in a Danish region. Making sharable and comparable nursing documentation

Trolling Master Bornholm 2012

Kalkulation: Hvordan fungerer tal? Jan Mouritsen, professor Institut for Produktion og Erhvervsøkonomi

Sign variation, the Grassmannian, and total positivity

DONG-område Resten af landet

Faculty of Health Sciences. Styrkeberegninger Poisson regression Overlevelsesanalyse

Privat-, statslig- eller regional institution m.v. Andet Added Bekaempelsesudfoerende: string No Label: Bekæmpelsesudførende

Oversigt. 1 Motiverende eksempel - energiforbrug. 2 Hypotesetest (Repetition) 3 Two-sample t-test og p-værdi. 4 Konfidensinterval for forskellen

Bilag. Resume. Side 1 af 12

Statistik ved Bachelor-uddannelsen i folkesundhedsvidenskab. Uafhængighedstestet

Sustainable use of pesticides on Danish golf courses

The X Factor. Målgruppe. Læringsmål. Introduktion til læreren klasse & ungdomsuddannelser Engelskundervisningen

CS 4390/5387 SOFTWARE V&V LECTURE 5 BLACK-BOX TESTING - 2

Engelsk. Niveau D. De Merkantile Erhvervsuddannelser September Casebaseret eksamen. og

Melbourne Mercer Global Pension Index

Exam questions in Statistics and evidence-based medicine, spring sem. Medis/Medicin, Modul 2.4.

Resultater. Formål. Results. Results. Må ikke indeholde. At fåf. kendskab til rapportering af resultater. beskrivelse

Erik Parner Sektion for Biostatistik. Biostatistisk metode et par eksempler

Strings and Sets: set complement, union, intersection, etc. set concatenation AB, power of set A n, A, A +

MPH specialmodul Epidemiologi og Biostatistik

Afdeling for Anvendt Matematik og Statistik December 2006

1 enote 1: Simple plots og deskriptive statistik. 2 enote2: Diskrete fordelinger. 3 enote 2: Kontinuerte fordelinger

Evaluating Germplasm for Resistance to Reniform Nematode. D. B. Weaver and K. S. Lawrence Auburn University

Krav til bestyrelser og arbejdsdeling med direktionen

Skriftlig Eksamen Diskret matematik med anvendelser (DM72)

Ekstraordinær Generalforsamling Vilvorde Kursuscenter 27. maj 2009

Portal Registration. Check Junk Mail for activation . 1 Click the hyperlink to take you back to the portal to confirm your registration

ATEX direktivet. Vedligeholdelse af ATEX certifikater mv. Steen Christensen

9. Chi-i-anden test, case-control data, logistisk regression.

Fejlbeskeder i SMDB. Business Rules Fejlbesked Kommentar. Validate Business Rules. Request- ValidateRequestRegist ration (Rules :1)

Brug sømbrættet til at lave sjove figurer. Lav fx: Få de andre til at gætte, hvad du har lavet. Use the nail board to make funny shapes.

Statistical information form the Danish EPC database - use for the building stock model in Denmark

Ikke-parametriske tests

Logistisk regression

Introduktion til Statistik. Forelæsning 10: Inferens for andele. Peder Bacher

PARALLELIZATION OF ATTILA SIMULATOR WITH OPENMP MIGUEL ÁNGEL MARTÍNEZ DEL AMOR MINIPROJECT OF TDT24 NTNU

Medinddragelse af patienter i forskningsprocessen. Hanne Konradsen Lektor, Karolinska Institutet Stockholm

Financial Literacy among 5-7 years old children

Logistisk Regression - fortsat

Molio specifications, development and challenges. ICIS DA 2019 Portland, Kim Streuli, Molio,

How Long Is an Hour? Family Note HOME LINK 8 2

To the reader: Information regarding this document

Trolling Master Bornholm 2014

Introduktion til Statistik. Forelæsning 12: Inferens for andele. Peder Bacher

Linear Programming ١ C H A P T E R 2

Skriftlig eksamen Science statistik- ST501

Transkript:

Statistik for MPH: 7 3. november 2016 www.biostat.ku.dk/~pka/mph16 Attributable risk, bestemmelse af stikprøvestørrelse (Silva: 333-365, 381-383) Per Kragh Andersen 1

Fra den 6. uges statistikundervisning: skulle jeg gerne 1. forstå, at de parametre, som estimeres ved hjælp af logistisk regression kan fortolkes som odds ratio (henholdsvis: ln(odds ratio)), 2. forstå, at for en kategorisk forklarende variabel er disse OR r beregnet i forhold til en valgt reference kategori, 3. forstå, at for en kvantitativ forklarende variabel beskriver disse OR r, hvor meget odds stiger, når den forklarende variabel stiger 1 enhed, 4. forstå, at OR r fra modeller med flere forklarende variable er indbyrdes justerede, 5. forstå, at når der er flere forklarende variable i spil, er der mange mulige måder at vælge modellen på. 6. forstå, at log(rate ratioer), estimeret i enten Cox- eller Poisson- regression har tilsvarende egenskaber 7. forstå, at forskel mellem middelværdier estimeret i lineær regression også har tilsvarende egenskaber 2

Fra den 6. uges statistikundervisning behøver jeg derimod ikke nødvendigvis: 1. at have forstået, hvordan regressionsanalyseprogrammet opnår de estimerede parametre og deres SD/sikkerhedsinterval fra datasættet og modellen 2. at have forstået, hvad de præcise forudsætninger er for modellerne, og hvordan de kontrolleres 3. at have forstået, hvordan interaktion/effekt modifikation håndteres ved hjælp af regressionsanalyse 3

Attributable risks, AR, (excess fractions) Example: Lung cancer Exposure A (cigarette smoking) RR A = 10 Exposure B (uranium mining) RR B = 20 Which exposure has the greatest public health impact? Suppose that Q A = 40% of the population smokes Q B = 0.04% of the population mines uranium Attributable risks are measures which combine relative risk and exposure prevalence. Two types of AR (or excess fractions) (Silva, pp. 97-99, 356-62, 381-83): 1. AR among exposed (Silva: excess fraction %) 2. AR in the total population (Silva: population excess fraction %) 4

Exposed B A Population C Non-Exposed D 5

Notation T = A + B + C + D (total population size) Q = A+B T C P 0 = C+D A P e = A+B RR = P e P 0 P T = A+C T = Q P e + (1 Q) P 0 (because Q P e + (1 Q)P 0 = A+B A T A+B + C+D C T C+D ) = A+C T (proportion exposed= exposure prevalence ) (risk among non-exposed) (risk among exposed) (relative risk) (risk in total population) 6

AR among exposed For some of those in the exposed group, disease occurrence will not be due to exposure : P e = Risk due to exposure + P 0, (if P e P 0, i.e., if RR 1). AR (among exposed) = Proportion of risk among exposed which is due to exposure = P e P 0 P e = RR 1 RR. 7

AR in total population (PAR) (more important concept of the two) P AR = Proportion of risk in total population which is due to exposure = P T P 0 P T = QP e+(1 Q)P 0 P 0 QP e +(1 Q)P 0 = = Q (P e P 0 ) P 0 +Q (P e P 0 ) (Divide by P 0 ) Q (RR 1) 1+Q (RR 1). Estimation: estimate Q by q = a+b n, estimate RR (or use OR). Confidence limits: exist (but will be skipped here) See Table 16.2 in Silva, p.361: P AR for combinations of Q and RR. 8

9

Exercise Calculate AR s from the following table on current cigarette smoking and lung cancer mortality among US veterans. Smoking Events No-events Total Current cigarette 1116 700652 701768 All others 426 1015573 1015999 Total 1542 1716225 1717767 10

Solution: Prevalence of smoking q = AR = p e p 0 p e = 701768 1717767 1116 701768 426 1015999 1116 701768 = 0.41 = 0.736 RR = p e p 0 = 3.79 P AR = = 0.53 0.41 (3.79 1) 1 + 0.41 (3.79 1) 11

Sample size determination. When planning investigations: How many persons are needed? For what purpose? (1) To obtain a given precision of an estimate: Silva, Section 15.3. (2) To obtain a given power of a test (the most common situation): Silva, Section 15.2. (1) is rarely used in practice and will be skipped here. Instead, a slightly different approach to (2) (than in Silva s book) but leading to the same results. 12

Why sample size/power calculations? If new data collection is involved then expenses will increase with sample size In clinical trials, it is unethical to treat more patients than necessary with a potentially inferior drug Funding agencies require it Even when using existing data bases to address a new question, it is important to assess whether the researcher s time could be spent better By necesssary sample size we mean that it should be sufficiently large to address the scientific question, i.e. the power for a relevant alternative should be sufficiently large not to overlook an important difference. 13

Testing; power. Example: We study pregnant women with pre-eclampsia and wish to compare two treatments with respect to the risk of some pregnancy outcome, e.g. preterm birth. We want to be pretty certain to detect a treatment (exposure) effect of D (a risk difference) - what do we mean by pretty certain? We need the statistical concept of the power of a test. If we test using a given level of significance α (i.e. 5%) and if the true treatment difference is D then we want to have a large probability of rejecting the null hypothesis: D = 0. This probability, 1 β is the power, often set to at least 80%. Note: β is called the Type 2 error risk and α is called the Type 1 error risk. 14

Accept Reject H 0 correct Type 1 error α H 0 wrong Type 2 error β power 1 β In general: the larger power we want and the smaller α we use, the larger needs n to be. (However, we always have α = 0.05!) The smaller D, the larger needs n to be. To find n, a good guess of the risk in the control group (p 1 ) is needed. Letting p 2 = p 1 D, it can be shown that n = p 1(1 p 1 ) + p 2 (1 p 2 ) D 2 f(α, β) is the number of women needed in each group. 15

Here, f(α, β) is given by: α β 0.01 0.05 0.10 0.05 17.8 13.0 10.8 0.10 14.9 10.5 8.6 0.15 13.0 9.0 7.2 0.20 11.7 7.9 6.2 0.25 10.6 6.9 5.4 Example 1: p 1 = 0.15, D = 0.07, α = 0.05, β = 0.20 Then, in each group we need: n = 0.15 0.85+0.08 0.92 0.07 2 7.9 = 324. Example 2: p 1 = 0.1, RR = 1.5, α = 0.05, β = 0.20 Then: p 2 = p 1 RR = 0.15, D = 0.05 and n = 0.1 0.9+0.15 0.85 0.05 2 7.9 = 687. 16

Finding the power based on the sample size Some times, the maximally obtainable sample size is given and we wish to assess how large the power is for some given value of the treatment difference D. The relationship is still given by: n = p 1(1 p 1 )+p 2 (1 p 2 ) D 2 f(α, β). E.g. n = 500 in each group and p 1 = 0.05, D = 0.05 (i.e., p 2 = 0.1) gives 500 = 0.05 0.95+0.1 0.9 0.05 2 f(α, β) or f(α, β) = 500/55 = 9.09 or β 0.15 if α = 0.05 (because the number in the α = 0.05 column in the table closest to 9.09 is 9.0 corresponding to β = 0.15). That is, the power is 0.85. 17

Unequal group sizes. If the two groups do not have the same size: first compute the total size N = 2n as if the two groups were equally large, then compute k = n 1 /n 2 = the ratio between the group sizes the total number needed is then N = N (1+k)2 4k. Example. If, in the first example above, group 1 is twice as big as group 2: N = 2 324 = 648 k = 2 N = N (1+k)2 4k = 648 9 8 = 729, i.e. n 1 = 486, n 2 = 243. 18

Case control studies To do sample size calculations for case-control studies one may use the fact that disease OR = exposure OR. Example: Unemployment and heart disease. Suppose that 20% of cases are exposed (out of job) we can afford n = 300 cases and n = 300 controls we believe that OR = 2 What is the power when α = 0.05? 19

If p 1 = 0.2 = prob(exposed among cases), then odds for exposure among cases is p 1 1 p 1 = 0.25. Since OR = 2, odds for exposure among controls must be 0.25/2=0.125, and p 2 = prob(exposure among controls) is The standard formula then gives p 2 = 0.125 1 + 0.125 = 0.11. n = 300 = f(0.05, β) 0.2 0.8 + 0.11 0.89 (0.2 0.11) 2, leading to f(0.05, β) = 9.4, and from the table we find that the power is between 80% and 90%. 20

Doing it in SAS proc power; twosamplefreq test = pchi groupproportions = (0.15 0.08) npergroup =. power=0.8; run; proc power; twosamplefreq test = pchi refproportion = 0.1 relativerisk = 1.5 npergroup =. power=0.8; run; 21

proc power; twosamplefreq test = pchi groupproportions = (0.1 0.05) npergroup = 500 power=.; run; proc power; twosamplefreq test = pchi groupproportions = (0.15 0.08) ntotal =. groupweights =(2 1) power=0.8; run; 22

proc power; twosamplefreq test = pchi refproportion = 0.2 oddsratio=0.5 npergroup = 300 power=.; run; 23

Fra den 7. uges statistikundervisning: skulle jeg gerne 1. forstå, hvordan man med størrelsen population attributable risk P AR = Q (RR 1) 1 + Q (RR 1) kan kombinere hyppigheden, Q, af en risikofaktor og dens effekt, RR, til et folkesundhedsvidenskabeligt relevant mål for, hvor stor en andel af et observeret antal sygdomstilfælde som kan tilskrives risikofaktoren 2. kunne vurdere, hvor stor en stikprøve der behøves for at kunne afsløre en given forskel mellem to hyppigheder med en given styrke 24

Fra den 7. uges statistikundervisning behøver jeg derimod ikke nødvendigvis: 1. at have forstået, hvordan formlen for stikprøvestørrelsen n er fremkommet 25

Oversigt over studietyper og de tilsvarende frekvens- og associationsmål. Justering for confounding. Design Frequency Comparing two Simple confounder Regression measure exposure groups adjustment analysis Cohort, risk p or risk ratio p 1 /p 0 Mantel-Haenszel logistic p fixed follow-up odds 1 p odds ratio stratified analysis regression (Cross-sectional) (prevalence) 2 2-table, χ 2 -test Cohort, rate r rate ratio r 1 /r 0 Mantel-Haenszel Cox (Poisson) varying follow-up χ 2 -test stratified analysis regression Case-control odds ratio Mantel-Haenszel logistic 2 2-table, χ 2 -test stratified analysis regression 26