Statistik for MPH: 7

Statistik for MPH: 7 3. november 2011 www.biostat.ku.dk/~pka/mph11 Attributable risk, bestemmelse af stikprøvestørrelse (Silva: 333-365, 381-383) Per Kragh Andersen 1

Fra den 6. uges statistikundervisning: skulle jeg gerne 1. forstå, at de parametre, som estimeres ved hjælp af logistisk regression kan fortolkes som odds ratio (henholdsvis: ln(odds ratio)), 2. forstå, at for en kategorisk forklarende variabel er disse OR r beregnet i forhold til en valgt reference kategori, 3. forstå, at for en kvantitativ forklarende variabel beskriver disse OR r, hvor meget odds stiger, når den forklarende variabel stiger 1 enhed, 4. forstå, at OR r fra modeller med flere forklarende variable er indbyrdes justerede, 5. forstå, at når der er flere forklarende variable i spil, er der mange mulige måder at vælge modellen på. 2

Fra den 6. uges statistikundervisning behøver jeg derimod ikke nødvendigvis: 1. at have forstået, hvordan det logistiske regressionsanalyseprogram opnår de estimerede OR r og deres SD/sikkerhedsinterval fra datasættet og modellen 2. at have forstået, hvad de præcise forudsætninger er for modellerne, og hvordan de kontrolleres 3. at have forstået, hvordan interaktion/effekt modifikation håndteres ved hjælp af logistisk regression 3

Attributable risks, AR, (excess fractions) Example: Lung cancer Exposure A (cigarette smoking) RR A = 10 Exposure B (uranium mining) RR B = 20 Which exposure has the greatest public health impact? Suppose that Q A = 40% of the population smokes Q B = 0.04% of the population mines uranium Attributable risks are measures which combine relative risk and exposure prevalence. Two types of AR (or excess fractions) (Silva, pp. 97-99, 356-62, 381-83): 1. AR among exposed (Silva: excess fraction %) 2. AR in the total population (Silva: population excess fraction %) 4

Exposed B A Population C Non-Exposed D 5

Notation T = A + B + C + D (total population size) Q = A+B T C P 0 = C+D A P e = A+B RR = P e P 0 P T = A+C T = Q P e + (1 Q) P 0 (because Q P e + (1 Q)P 0 = A+B A T A+B + C+D C T C+D ) = A+C T (proportion exposed= exposure prevalence ) (risk among non-exposed) (risk among exposed) (relative risk) (risk in total population) 6

AR among exposed For some of those in the exposed group, disease occurrence will not be due to exposure : P e = Risk due to exposure + P 0, (if P e P 0, i.e., if RR 1). AR (among exposed) = Proportion of risk among exposed which is due to exposure = P e P 0 P e = RR 1 RR. That is: the number of cases among exposed which is due to exposure is: Total number of cases among exposed AR (among exposed) = T Q P e RR 1 RR, where T Q P e = total number of cases among exposed. 7

AR in total population (PAR) (more important concept of the two) PAR = = Number of cases due to exp. Total number of cases TQP e( RR 1 RR ) TQP e +T(1 Q)P 0 (divide by TP 0 ) = Q(RR 1) Q RR+(1 Q) = Q(RR 1) 1+Q(RR 1) Alternative formula: PAR = P T P 0 P T. Estimation: estimate Q by q = a+b n, estimate RR (or use OR). Confidence limits: exist See Table 16.2 in Silva, p.361: PAR for combinations of Q and RR. 8

Current cigarette smoking and lung cancer mortality among US veterans. Smoking Events No-events Total Current cigarette 1116 700652 701768 All others 426 1015573 1015999 Total 1542 1716225 1717767 10

Solution: Prevalence of smoking q = AR = p e p 0 p e = 701768 1717767 1116 701768 426 1015999 1116 701768 = 0.41 = 0.736 RR = p e p 0 = 3.79 PAR = = 0.53 0.41 (3.79 1) 1 + 0.41 (3.79 1) 11

Sample size determination. When planning investigations: How many persons are needed? For what purpose? (1) To obtain a given precision of an estimate: Silva, Section 15.3. (2) To obtain a given power of a test (the most common situation): Silva, Section 15.2. (1) is rarely used in practice and will be skipped here. Instead, a slightly different approach to (2) (than in Silva s book) but leading to the same results. 12

Example: Testing; power. We study pregnant women with pre-eclampsia and wish to compare two treatments with respect to the risk of some pregnancy outcome, e.g. preterm birth. We want to be pretty certain to detect a treatment (exposure) effect of D (a risk difference) - what do we mean by pretty certain? We need the statistical concept of the power of a test. If we test using a given level of significance α (i.e. 5%) and if the true treatment difference is D then we want to have a large probability of rejecting the null hypothesis: D = 0. This probability, 1 β is the power, often set to at least 80%. Note: β is called the Type 2 error risk and α is called the Type 1 error risk. 13

Accept Reject H 0 correct Type 1 error α H 0 wrong Type 2 error β power 1 β In general: the larger power we want and the smaller α we use, the larger needs n to be. The smaller D, the larger needs n to be. To find n, a good guess of the risk in the control group (p 1 ) is needed. Letting p 2 = p 1 D, then n = p 1(1 p 1 ) + p 2 (1 p 2 ) D 2 f(α, β) is the number of women needed in each group. 14

Here, f(α, β) is given by: α β 0.01 0.05 0.10 0.05 17.8 13.0 10.8 0.10 14.9 10.5 8.6 0.15 13.0 9.0 7.2 0.20 11.7 7.9 6.2 0.25 10.6 6.9 5.4 Example: p 1 = 0.15, D = 0.07, α = 0.05, β = 0.20 Then, in each group we need: n = 0.15 0.85+0.08 0.92 0.07 2 7.9 = 324. Example: p 1 = 0.1, RR = 1.5, α = 0.05, β = 0.20 Then: p 2 = p 1 RR = 0.15, D = 0.05 and n = 0.1 0.9+0.15 0.85 0.05 2 7.9 = 687. 15

Finding the power based on the sample size Some times, the maximally obtainable sample size is given and we wish to assess how large the power is for some given value of the treatment difference D. The relationship is still given by: n = p 1(1 p 1 )+p 2 (1 p 2 ) D 2 f(α, β). E.g. n = 500 in each group and p 1 = 0.05, D = 0.05 (i.e., p 2 = 0.1) gives 500 = 0.05 0.95+0.1 0.9 0.05 2 f(α, β) or f(α, β) = 500/55 = 9.09 or β 0.15 if α = 0.05 (because the number in the α = 0.05 column in the table closest to 9.09 is 9.0 corresponding to β = 0.15). That is, the power is 0.85. 16

Unequal group sizes. If the two groups do not have the same size: first compute the total size N = 2n as if the two groups were equally large, then compute k = n 1 /n 2 = the ratio between the group sizes the total number needed is then N = N (1+k)2 4k. Example. If, in the first example above, group 1 is twice as big as group 2: N = 2 324 = 648 k = 2 N = N (1+k)2 4k = 648 9 8 = 729, i.e. n 1 = 486, n 2 = 243. 17

Fra den 7. uges statistikundervisning: skulle jeg gerne 1. forstå, hvordan man med størrelsen population attributable risk PAR = Q (RR 1) 1 + Q (RR 1) kan kombinere hyppigheden, Q, af en risikofaktor og dens effekt, RR, til et folkesundhedsvidenskabeligt relevant mål for, hvor stor en andel af et observeret antal sygdomstilfælde som kan tilskrives risikofaktoren 2. kunne vurdere, hvor stor en stikprøve der behøves for at kunne afsløre en given forskel mellem to hyppigheder med en given styrke 18