Basic statistics for experimental medical researchers Sample size calculations September 15th 2016 Christian Pipper Department of public health (IFSV) Faculty of Health and Medicinal Science (SUND) E-mail: pipper@sund.ku.dk (IFSV SUND) Basic statistics 1 / 10
Errors of statistical testing revisited The risk of type 1 error with one test Assume that H 0 is true (This is the assumption we make to calculate the p-value!) What is the probability of rejecting H 0 at a significance level α? type I-error (α/2) type I-error (α/2) 2 0 2 t s Conclusion: Significance level α = the risk of type 1 error with one test This has nothing to do with the size and design of our data! (IFSV SUND) Basic statistics 2 / 10
Errors of statistical testing revisited The risk of type 2 error with one test Assume the H 0 is false (in which case the alternative H A is true) What is the probability of accepting H 0 at a significance level α? power type 2 error (β) 0 2 η t s This depends crucially on the size and design of your data! Conclusion The Power 1 β = the risk of not committing a type 2 error (IFSV SUND) Basic statistics 3 / 10
Some intuition about the errors of statistical testing H 0 is rejected: a strong statement because H 0 is actually very likely to be false Since we are only committing a type 1 error (which is not very likely) H 0 is accepted: a somewhat weaker statement because H 0 is true If we are not committing a type 2 error that is (We honestly don t know how large that risk is) It could be a mere question of not having enough data Consequently this is something we need to address in the design phase of our study (IFSV SUND) Basic statistics 4 / 10
Test statistic behaviour under the alternative What is the distribution of the t test statistic t s under H 0? t-fordeling (small samples) Standard normal N(0,1) (not small samples) What is the distribution of t s under H A? (for not small samples) Normal distribution with mean η and standard deviation 1: N(η, 1) η = µ 1 µ 2 (1) σ 2 1 /n 1 + σ 2 2 /n 2 Example: From a pilot study we make a qualified guess that: mean difference µ1 µ 2 = 0.2 ; standard deviation σ 1 = σ 2 = 0.2 Mean of ts (insert into (1) and assume that n 1 = n 2 = n): η = 0.2 2 0.2 n (IFSV SUND) Basic statistics 5 / 10
Sample size calculation Ingredients Knowing mean difference and standard deviations we can determine η as a function of sample size for a given sample size and significance level we can thus determine the power, that is, the risk of rejecting H 0 under that alternative We can also go the other way to determine the sample size to obtain a given power (typically 80%) Goal: to find n so that under the given alternative H 0 is accepted with probability 1-power when we evaluate the p-value at a given significance level (IFSV SUND) Basic statistics 6 / 10
Sample size calculation in R Example continued Use the function power.t.test() plug in alternative in terms of mean difference (delta=0.2) and standard deviation (sd=0.2). plug in power (power=0.8) plug in significance level (sig.level=0.05) R code > power.t.test(delta=0.2,sd=0.2,power=0.8,sig.level=0.05) Two-sample t test power calculation n = 16.71477<---sample size delta = 0.2 sd = 0.2 sig.level = 0.05 power = 0.8 alternative = two.sided NOTE: n is number in *each* group (IFSV SUND) Basic statistics 7 / 10
Minimum detectable effect size Goal: to find the smallest mean difference for a given sample size and standard deviation so that H 0 is accepted with probability 1-power when we evaluate the p-value at a given significance level A feasibility calculation if you have restrictions on how large your sample size can be. R code > power.t.test(n=10,sd=0.2,power=0.8,sig.level=0.05) Two-sample t test power calculation n = 10 delta = 0.2649891 <---minimum detectable effect size sd = 0.2 sig.level = 0.05 power = 0.8 alternative = two.sided NOTE: n is number in *each* group (IFSV SUND) Basic statistics 8 / 10
The merits of sample size calculations A priori choices: Significance level and power (Chosen to control type 1 and 2 errors) Specific alternative in terms of values of µ 1 µ 2 and σ (Known from litterature or previous studies) A note on wishful thinking: Often the specific alternative is based on uninformed guessing rather than hard facts In such cases the sample size calculations should be used with caution The best thing you can do irrespective of any power calculation is to sample as much data as possible (IFSV SUND) Basic statistics 9 / 10
R-tutorial Execute the following R-code line by line and try to figure out what the code produces. #My 1000 t-test values my.t.test<-rt(1000,50) my.t.test[1:10] #They are t-distributed hist(my.t.test,prob=t) density<-dt((-10000:10000)/1000,50) points(y=density,x=(-10000:10000)/1000,type="l") #My corresponding p-values pvals<-2*pt(abs(my.t.test),50,lower.tail=f) hist(pvals,breaks=seq(0,1,by=0.05)) #Type I error: How many p-values are less than 5%: approx 5% length(pvals[pvals<0.05]) #Find sample size at a given relative effect size (mean difference/standard deviation), # power, and significance level power.t.test(delta=0.2/0.2,power=0.8,sig.level=0.05) #Find power at given relative effect size, sample-size, and significance level power.t.test(n=50,delta=0.2/0.2,sig.level=0.05) #Find the smallest detectable mean difference with #and significance level power.t.test(n=50,sd=0.2,power=0.8,sig.level=0.05) a given sd, sample size, power, (IFSV SUND) Basic statistics 10 / 10