Question I.1 (1) We use Method 3.8 from Chapter 3 to achieve



Relaterede dokumenter
Basic statistics for experimental medical researchers

Kursus 02323: Introducerende Statistik. Forelæsning 12: Forsøgsplanlægning. Peder Bacher

1 enote 1: Simple plots og deskriptive statistik. 2 enote 2: Diskrete fordelinger. 3 enote 2: Kontinuerte fordelinger

Oversigt. 1 Motiverende eksempel - energiforbrug. 2 Hypotesetest (Repetition) 3 Two-sample t-test og p-værdi. 4 Konfidensinterval for forskellen

Vina Nguyen HSSP July 13, 2008

Løsning eksamen d. 15. december 2008

Statistik for MPH: oktober Attributable risk, bestemmelse af stikprøvestørrelse (Silva: , )

1 enote 1: Simple plots og deskriptive statistik. 2 enote2: Diskrete fordelinger. 3 enote 2: Kontinuerte fordelinger

Statistik for MPH: 7

Kursus 02402/02323 Introducerende Statistik. Forelæsning 6: Sammenligning af to grupper

Besvarelser til Lineær Algebra Reeksamen Februar 2017

Løsning til eksaminen d. 29. maj 2009

Oversigt. 1 Motiverende eksempel - energiforbrug. 2 Hypotesetest (Repetition) 3 Two-sample t-test og p-værdi. 4 Konfidensinterval for forskellen

Afsnit E1 Konfidensinterval for middelværdi i normalfordeling med kendt standardafvigelse

Introduktion til Statistik. Forelæsning 12: Inferens for andele. Peder Bacher

Introduktion til Statistik. Forelæsning 10: Inferens for andele. Peder Bacher

Skriftlig Eksamen Kombinatorik, Sandsynlighed og Randomiserede Algoritmer (DM528)

Kvant Eksamen December timer med hjælpemidler. 1 Hvad er en continuous variable? Giv 2 illustrationer.

Oversigt. 1 Eksempel. 2 Fordelingen for gennemsnittet t-fordelingen. 3 Konfidensintervallet for µ Eksempel

Løsning til eksaminen d. 14. december 2009

Linear Programming ١ C H A P T E R 2

Generalized Probit Model in Design of Dose Finding Experiments. Yuehui Wu Valerii V. Fedorov RSU, GlaxoSmithKline, US

Kursus 02402/02323 Introduktion til statistik. Forelæsning 13: Et overblik over kursets indhold. Klaus K. Andersen og Per Bruun Brockhoff

DoodleBUGS (Hands-on)

Skriftlig Eksamen Beregnelighed (DM517)

Portal Registration. Check Junk Mail for activation . 1 Click the hyperlink to take you back to the portal to confirm your registration

Oversigt. Kursus 02402/02323 Introducerende Statistik. Forelæsning 5: Hypotesetest, power og modelkontrol - one sample

User Manual for LTC IGNOU

Reexam questions in Statistics and Evidence-based medicine, august sem. Medis/Medicin, Modul 2.4.

Oversigt. Course 02402/02323 Introducerende Statistik. Forelæsning 3: Kontinuerte fordelinger. Per Bruun Brockhoff

RoE timestamp and presentation time in past

Det er muligt at chekce følgende opg. i CodeJudge: og

q-værdien som skal sammenlignes med den kritiske Chi-i-Anden værdi p-værdien som skal sammenlignes med signifikansniveauet.

Analyseinstitut for Forskning

Skriftlig Eksamen Diskret matematik med anvendelser (DM72)

Forelæsning 11: Envejs variansanalyse, ANOVA

Økonometri: Lektion 4. Multipel Lineær Regression: F -test, justeret R 2 og aymptotiske resultater

On the complexity of drawing trees nicely: corrigendum

enote 3: Hypotesetests for én gruppe/stikprøve Introduktion til Statistik Forelæsning 5: Hypotesetest, power og modelkontrol - one sample Peder Bacher

Privat-, statslig- eller regional institution m.v. Andet Added Bekaempelsesudfoerende: string No Label: Bekæmpelsesudførende

Trolling Master Bornholm 2012

Sign variation, the Grassmannian, and total positivity

PARALLELIZATION OF ATTILA SIMULATOR WITH OPENMP MIGUEL ÁNGEL MARTÍNEZ DEL AMOR MINIPROJECT OF TDT24 NTNU

Oversigt. 1 Intro: Regneeksempel og TV-data fra B&O. 2 Model og hypotese. 3 Beregning - variationsopspaltning og ANOVA tabellen

Unitel EDI MT940 June Based on: SWIFT Standards - Category 9 MT940 Customer Statement Message (January 2004)

Project Step 7. Behavioral modeling of a dual ported register set. 1/8/ L11 Project Step 5 Copyright Joanne DeGroat, ECE, OSU 1

Trolling Master Bornholm 2014

Skriftlig eksamen Science statistik- ST501

Ikke-parametriske tests

How consumers attributions of firm motives for engaging in CSR affects their willingness to pay

Forelæsning 9: Inferens for andele (kapitel 10)

To the reader: Information regarding this document

k UAFHÆNGIGE grupper Oversigt 1 Intro eksempel 2 Model og hypotese 3 Beregning - variationsopspaltning og ANOVA tabellen 4 Hypotesetest (F-test)

enote 2: Kontinuerte fordelinger Introduktion til Statistik Forelæsning 3: Kontinuerte fordelinger Peder Bacher enote 2: Continuous Distributions

applies equally to HRT and tibolone this should be made clear by replacing HRT with HRT or tibolone in the tibolone SmPC.

Normalfordelingen. Statistik og Sandsynlighedsregning 2

Klasseøvelser dag 2 Opgave 1

Probabilistic properties of modular addition. Victoria Vysotskaya

STAT-UB.0103 Spring 2012 Homework Set 8 Solutions

Statistik for MPH: november Attributable risk, bestemmelse af stikprøvestørrelse (Silva: , )

KA 4.2 Kvantitative Forskningsmetoder Forår 2010

Forelæsning 4: Konfidensinterval for middelværdi (og spredning)

Tænk på a og b som to n 1 matricer. a 1 a 2 a n. For hvert i = 1,..., n har vi y i = x i β + u i.

Muligheder: NB: test for µ 1 = µ 2 i model med blocking ækvivalent med parret t-test! Ide: anskue β j som stikprøve fra normalfordeling.

Tidevandstabeller for danske farvande. Tide tables for Danish waters

Introduktion til Statistik. Forelæsning 5: Hypotesetest, power og modelkontrol - one sample. Peder Bacher

Brug sømbrættet til at lave sjove figurer. Lav fx: Få de andre til at gætte, hvad du har lavet. Use the nail board to make funny shapes.

Sport for the elderly

Statistik Lektion 20 Ikke-parametriske metoder. Repetition Kruskal-Wallis Test Friedman Test Chi-i-anden Test

Multivariate Extremes and Dependence in Elliptical Distributions

Program. t-test Hypoteser, teststørrelser og p-værdier. Hormonkonc.: statistisk model og konfidensinterval. Hormonkoncentration: data

WindPRO version Nov 2013 Printed/Page :45 / 1. SHADOW - Main Result

Module 3: Statistiske modeller

DET KONGELIGE BIBLIOTEK NATIONALBIBLIOTEK OG KØBENHAVNS UNIVERSITETS- BIBLIOTEK. Index

Trolling Master Bornholm 2015

Tegning af grafer. Grafen for en ligning (almindelig) Skriv ligningen ind. Højreklik og vælg Plots -> 2-D Plot of Right Side.

Vores mange brugere på musskema.dk er rigtig gode til at komme med kvalificerede ønsker og behov.

ECE 551: Digital System * Design & Synthesis Lecture Set 5

Kursus 02323: Introducerende Statistik. Forelæsning 8: Simpel lineær regression. Peder Bacher

Titel: Barry s Bespoke Bakery

Normalfordelingen. Statistik og Sandsynlighedsregning 2

Exam questions in Statistics and evidence-based medicine, spring sem. Medis/Medicin, Modul 2.4.

ATEX direktivet. Vedligeholdelse af ATEX certifikater mv. Steen Christensen

Linear regression. Statistical modelling. Gilles Guillot. September 17,

Help / Hjælp

Skriftlig Eksamen Beregnelighed (DM517)

Konfidensintervaller og Hypotesetest

Læs venligst Beboer information om projekt vandskade - sikring i 2015/2016

Engineering of Chemical Register Machines

Aarhus Universitet, Science and Technology, Computer Science. Exam. Wednesday 27 June 2018, 9:00-11:00

Oversigt. 1 Motiverende eksempel: Højde-vægt. 2 Lineær regressionsmodel. 3 Mindste kvadraters metode (least squares)

Aktivering af Survey funktionalitet

Engelsk. Niveau D. De Merkantile Erhvervsuddannelser September Casebaseret eksamen. og

Naturvidenskabelig Bacheloruddannelse Forår 2006 Matematisk Modellering 1 Side 1

DK - Quick Text Translation. HEYYER Net Promoter System Magento extension

Løsning til eksamen d.27 Maj 2010

Hypoteser om mere end to stikprøver ANOVA. k stikprøver: (ikke ordinale eller højere) gælder også for k 2! : i j

University of Copenhagen Faculty of Science Written Exam - 3. April Algebra 3

Shooting tethered med Canon EOS-D i Capture One Pro. Shooting tethered i Capture One Pro 6.4 & 7.0 på MAC OS-X & 10.8

Trolling Master Bornholm 2014

Transkript:

Correct answers: 35132 25225 11354 53441 12141 32235 Exercise I Question I.1 (1) We use Method 3.8 from Chapter 3 to achieve And since, qt(0.95, 24) ## [1] 1.710882 90.4 ± t 0.95 10.3 25 We get that 10.3 2 90.4 ± 1.711 = [86.88; 93.92] 25 is the correct the result, which also can be found explicitly as: 90.4 + c(-1, 1)*qt(0.95, 24)*sqrt(10.3^2/25) ## [1] 86.87558 93.92442 So the correct answer is 3. Check of all answers: 90.4 + c(-1,1) * 2.064 * sqrt(10.3^2/24) ## [1] 86.06048 94.73952 90.4 + c(-1,1) * 1.711 * sqrt(10.3/24) ## [1] 89.27911 91.52089 90.4 + c(-1,1) * 1.711 * sqrt(10.3^2/25) ## [1] 86.87534 93.92466 90.4 + c(-1,1) * 1.960 * sqrt(10.3/25) 1

## [1] 89.14193 91.65807 90.4 + c(-1,1) * 2.064 * sqrt(10.3/25) ## [1] 89.07518 91.72482 Question I.2 (2) We use Method 3.18 from Chapter 3 in the standard deviation version: And since, qchisq(c(0.025, 0.975), 24) ## [1] 12.40115 39.36408 We get that [ 24 10.3 χ 2 0.975 ; ] 24 10.3 χ 2 0.025 [ 24 10.3 39.364 ; 24 10.3 12.401 ] = [8.043; 14.33] is the correct the result, which also can be found explicitly as: sqrt(24*10.3^2/qchisq(0.975, 24)) ## [1] 8.042532 sqrt(24*10.3^2/qchisq(0.025, 24)) ## [1] 14.32887 So the correct answer is 5. Check of all answers: sqrt(24)*10.3/sqrt(36.415) ## [1] 8.361856 sqrt(24)*10.3/sqrt(13.848) ## [1] 13.55968 24*10.3^2/36.415 2

## [1] 69.92064 24*10.3^2/13.848 ## [1] 183.8648 25*10.3^2/37.652 ## [1] 70.44115 25*10.3^2/14.611 ## [1] 181.5242 5*10.3/sqrt(40.646) ## [1] 8.077897 5*10.3/sqrt(13.120) ## [1] 14.21806 sqrt(24)*10.3/sqrt(39.364) ## [1] 8.04254 sqrt(24)*10.3/sqrt(12.401) ## [1] 14.32895 3

Question I.3 (3) We use the one-sample CI sample size formula, Method 3.44: (And since the width of the CI is twice the margin of error, we have that ME = 2) And since, n = z2 0.995 10.3 2 2 2 qnorm(0.995) ## [1] 2.575829 We get that n = So the correct answer is 1. Check all solutions 2.576^2*10.3^2 / 2^2 ## [1] 175.9974 1.960^2*10.3^2 / 4^2 ## [1] 25.47221 1.645^2*10.3^2 / 2^2 ## [1] 71.77055 1.960^2*10.3^2 / 2^2 ## [1] 101.8888 2.262^2*10.3^2 / 4^2 ## [1] 33.92655 2.576 10.32 2 2 = 176.0 4

Exercise II Question II.1 (4) This is a standard normal probability question: Which can be found as: P (X > 100) = P (Z > 1-pnorm(100, 90.4, 10.3) ## [1] 0.1756582 (100-90.4)/10.3 ## [1] 0.9320388 1-pnorm((100-90.4)/10.3) ## [1] 0.1756582 So the correct answer is 3. 100 90.4 ) = P (Z > 0.932) 10.3 Question II.2 (5) We use the (Welch) two-sample t-test, Method 3.59 and Method 3.57: t obs = And the degrees of freedom: Hence, the p-value is: since: 2*(1-pt(1.42, 42.2)) ## [1] 0.1629536 2*(1-pt(1.42, 42)) ## [1] 0.1629884 5.2 (10.32 + 15.2 2 )/25 = 5.2 10.32 + 15.2 2 /5 = 1.42 ν = ( ) 2 10.3 2 25 + 15.22 25 (10.3 2 /25) 2 24 + (15.22 /25) 2 24 2P (T > 1.42) = 0.163 = 42.2 So we cannot reject the null hypothesis, as this p-value is larger than 0.05. So the correct answer is 2. 5

Question II.3 (6) This is treated in Section 3.2.4. The rst R-call is nding the power for n 1 = n 2 = 50 - not asked for. The last R-call is using α = 0.10 og power 0.95 - not asked for. The correct R-call is the second one among the three options: power.t.test(power = 0.90, delta = 5, sd = 10, sig.level = 0.05) ## ## Two-sample t test power calculation ## ## n = 85.03129 ## delta = 5 ## sd = 10 ## sig.level = 0.05 ## power = 0.9 ## alternative = two.sided ## ## NOTE: n is number in *each* group So the correct answer is 2: n 1 = n 2 85 6

Exercise III Exercise III: Poisson Question III.1 (7) Exercise III.1 (7): Assume X represents the number of requests in a 4 hour interval, then we need to scale the arrival rate from a 24 hour interval to a 4 hour interval, thus X Po(λ = 50 6 ) and then the probability of getting over 12 requests in a random selected 4 hour interval is P (X > 12) = 1 P (X 12) which can be found by the R command 1 - ppois(12, lambda=50/6) ## [1] 0.08119212 Hence the correct answer is 5: 0.081. Question III.2 (8) Exercise III.2 (8): We have the same arrival rate as above, λ = 50 6, hence we need to nd the 99% percentile (q 0.99 ) of the Poisson distribution, this is level above which the number of requests per 4 hours occur with probability 1% or less. This can be found by qpois(lambda=50/6, p=0.99) ## [1] 16 or similarly it could be found by running a sequence of probabilities and select for the rst to be below 1% by 1-ppois(15:18, lambda=50/6) ## [1] 0.011708513 0.005494677 0.002448679 0.001038494 Hence the correct answer is 2: Capacity for 16 service requests per 4 hours. 7

Exercise IV Exercise IV: Question IV.1 (9) Exercise IV.1 (9): The mean and the variance of the total pump capacity (Y ) is found by using the identities in Theorem 2.55: Y = X 1 + X 2 + X 3 + X 4 + X 5 E(Y ) = E(X 1 ) + E(X 2 ) + E(X 3 ) + E(X 4 ) + E(X 5 ) = 50 V ar(y ) = V ar(x 1 ) + V ar(x 2 ) + V ar(x 3 ) + V ar(x 4 ) + V ar(x 5 ) = 20 Hence the correct R code for calculating becomes pnorm(45, mean=50, sd=sqrt(20)) ## [1] 0.1317762 thus the correct answer is 2. P (Y < 45) Question IV.2 (10) Exercise IV.2 (10): Since we don't want to assume anything about the distribution, we want to nd a non-parametric bootstrap 95% condence interval (as described in Section 4.3.2), and we use Method 4.19 as done in Example 4.20. Thus quantile(replicate(1000, mean(sample(x, replace = TRUE))), c(0.025, 0.975)) is correct, since we need to resample the data with replacement, then take the mean, and replicate this 1000 time, and nally of these 1000 values we calculate the 0.025 and 0.975 quantiles. Thus the correct answer is 5. Question IV.3 (11) Exercise IV.3 (11): We have to calculate the probability that dhyper(x=0, m=2, n=23, k=6) ## [1] 0.57 8

Hence the correct answer is: 1 dbinom(x=0, size=6, p=2/25) 1-dhyper(x=0, m=2, n=23, k=6) 1-dbinom(x=0, size=6, p=2/25) dpois(0, 2, 23, 6) 9

Exercise V Exercise V: ANOVA Question V.1 (12) Exercise V.1 (12): If you look at the analysis of variance table then we see that A and B are degrees of freedom. There are 4 machines and 3 time periods so: and A = k 1 = 4 1 = 3 B = (k 1)(l 1) = (4 1)(3 1) = 3 2 = 6 This leaves two of the suggested options. To calculate C you need: SST = SS(T r) + SS(Bl) + SSE In this example the factor "Time period" is the block so we get: C = SS(T r) = SST SS(Bl) SSE = 11900.6660 9324.9717 114.6739 = 2461.0204 So the correct answer is: 1: A: 3, B: 6, C: 2461.0204, D: 4662.4859 Question V.2 (13) Exercise V.2 (13): We are using a signicance level α = 0.05 and the relevant distribution for testing in an ANOVA is an F-distribution. The degrees of freedom for this test are: and So the correct answer is: 3: F 0.95 (3, 6) = 4.757. k 1 = 4 1 = 3 (k 1)(l 1) = (4 1)(3 1) = 3 2 = 6 Question V.3 (14) Exercise V.3 (14): If we want to test the hypothesis of no dierence between machines while time of day is taken into account then we have to calculate the F-test statistic F = SS(T r)/(k 1) SSE/((k 1)(l 1)) = 820.3401 19.1123 = 42.92 10

The value U is the p-value from the F(3,6) distribution P (F > 42.92) = 0.00019 The hypothesis of no dierence between machines is rejected, we have a signicant eect, and the correct answer is: 5: Yes, since the value U is less than 0.05. 11

Exercise VI Exercise VI: ANOVA Question VI.1 (15) Exercise VI.1 (15): If you want to test the hypothesis of no dierence between the four treatments then you need to calculate the F-test statistic C = A B = SS(T r)/(k 1) SSE/(n k) = 43.726/3 63.537/24 = 5.506 The value D is the p-value from the F(3,24) distribution P (F > 5.506) = 0.005 This p-value is smaller than 0.01 so the hypothesis of no dierence between treatments is rejected. The correct answer is: 4: No, since the value D is smaller than 0.01. Question VI.2 (16) Exercise VI.2 (16): Using Method 8.9 from the e-notes a 99% condence interval for the dierence between treatment A and B can be calculated as: x A x B ± t 1 0.01/2 SSE/(n k)( 1 7 + 1 7 ) Where t 1 0.01/2 is a quantile from a t-distribution with n k degrees of freedom. In this case n = 28 and k = 4 so t 1 0.01/2 = 2.797. This quantile can be looked up in a table or found using R: qt(1-0.01/2,(28-4)) ## [1] 2.79694 Using the expression for the condence interval with the current values plugged in: 7.82 4.90 ± 2.797 63.537/24( 2 ) [0.48; 5.35] 7 The correct answer is: 5: 2.92 ± 2.797 63.537/24 2/7 [0.48; 5.35] 12

Exercise VII Exercise VII: Regressions Question VII.1 (17) Exercise VII.1 (17): First note that the empirical correlation is always between -1 and 1, thus 1) is therefore excluded, from the plot it is clear the all correlations are larger than 0, this exclude 4), and 5). It is also clear that the largest correlation is between D and V which exclude 2. The only possible answer is therefore 3), as already stated r DV > r HV, and r DV > r HD, it is not completely clear from the plot that r HV > r HD, but it is clearly possible. Hence the correct answer is 3. Question VII.2 (18) Exercise VII.2 (18): Test for hypothesis on parameter by hand calculations is stated in Theorem 5.9 (eq. 5-75) T β1 = ˆβ 1 β 1,0 ˆσ β1 (1) the numbers from the expression can be identied from the R-output as, ˆβ1 = 2.19, and ˆσ β1 = 0.0911. The null hypothesis is β 1,0 = 2, and we get T β1 = 2.19 2 0.0911 the test statistics hould be compared with quantiles in the t-distribution, in this case ±t 0.975 based in 29 degrees of freedom. In R we get (2.19-2)/0.0911 ## [1] 2.08562 qt(0.975,df=29) ## [1] 2.04523 Since T β1 > t 0.975 we reject the null-hypothesis and the correct answer is 4. (2) Question VII.3 (19) Exercise VII.3 (19): The general formula for a prediction interval is given in Theorem 5.13, as ˆβ 0 + ˆβ 1 x new ± t 1 α/2ˆσ 1 + 1 n + (x new x) 2 (3) S xx in our case we har ˆβ 0 = 3.34, ˆβ 1 = 2.19, x new = log(0.336), α = 0.05, ˆσ = 0.117, n = 31, x = 1.12. Further we will need S xx, this can be calculated from s x, as S xx = (n 1)s 2 x = 30 0.234 2. 13

Inserting in the formula gives 3.34 + 2.19 log(0.336) ± t 0.975 0.117 in R we get 1 + 1 (log(0.336 ( 1.12))2 + 31 30 0.234 2 (4) 3.34 + 2.19 * log(0.336) + c(-1,1) * qt(0.975,df=29) * 0.117 * sqrt(1 + 1/31 + (log(0.336) - (-1.12) )^2 / (30 * 0.234^2)) ## [1] 0.7083068 1.1946719 which is answer 4) Question VII.4 (20) Exercise VII.4 (20): From Figure 2.D there are no evidence against the residuals being normally distributed, Figure 2.B suggest that log(h) should be included in the model, and answer 1 is correct. Answer 2 is not correct since the conclusion should be the opposite (data should not be transformed). Answer 3 is not correct, since there is no evidence that log(d) should be included in the model. Answer 4 is not correct since Figure 2.A only indicate that there are no violation of the assumption of iid residuals. Answer 5 is not correct, since Figure 2.D suggest that data is actully normally distributed. The correct answer is 1. Question VII.5 (21) Exercise VII.5 (21): The estimates of β 0, β 1, and β 2 is found in the rst column of the coecient matrix and these are ˆβ 0 = 0.524, ˆβ1 = 1.97, and ˆβ 2 = 1.15, and ˆσ = 0.0813 is the residual standard error. Hence the correct answer is 1). Question VII.6 (22) Exercise VII.6 (22): Answer 1) is not correct, since β i is not assumed to be random variable but constants, further the correlation between the estimators is not in general zero. Answer 2) is correct, this the stanmdard assumption on the residuals. Answer 3) is not correct, β 1 is a slope of the reression model (which is related to the correlation but not equal the correlation). Answer 4) is not correct, since β 0 is the expected value of Y when x 1 = x 2 =.. = 0 and this will in general be dierent from ȳ. Answer 5) is not correct, since Y i is not assumed independent from x i,j, also x j is not assumed to follow a normal distribution. 14

Answer 2) is the correct answer. Exercise VIII Spørgsmål VIII.1 (23) For at undersøge, om Rød blok's tilslutning har været konstant hen over tidsperioden foretages et χ 2 -test for en 2 8 antalstabel. Følgende svar er altså det korrekte: 1 Et χ 2 -test for en 2 8 antalstabel Exercise IX Spørgsmål IX.1 (24) Lad X være en stokastisk variabel, der angiver antal RPL-spiller i stikprøven for SFI undersøgelsen. Antag X B(8153, p), hvor p er sandsynligheden (hhv. andelen) for at et tilfældig valgt individ er RPLspiller. 90% kondensintervallet for andelen af RPL-spiller (p) bestemmes ved (Jf. Method 7.3): ˆp (1 ˆp) ˆp ± z 1 α/2 n Idet: 1 α = 90% har vi α = 0.05 og 1 α/2 = 0.95 95% fraktilen i normalfordelingen bestemmes ved: z 0.95 = qnorm(0.95) = 1.6449. Desuden er n = 8153 og ˆp = 0.023 Ved indsættelse fås at den rigtige løsning er: 4 0.023 ± 1.6449 Exercise X 0.023 0.977 8153 = [0.0221; 0.0239] Spørgsmål X.1 (25) Lad X være en stokastisk variabel, der angiver antal respondenter i stikprøven fra Bjerg K og Megafon undersøgelsen der bor i Hovedstads regionen. Antag at X B( 1005, p ), hvor p angiver sandsynligheden for at et tilfældig valgt individ er fra Hovedstads regionen. Det ønskes undersøgt om stikprøven er repræsentativ mht. regions fordeling svarende til følgende hypotese: H 0 : p = 0.3139, H 1 : p 0.3139 H 0 testes ved et dobbeltsidet test på α = 0.05 Teststørrelsen bestemmes ved (Jf. Method 7.10): z obs = x n p 0 n p0 (1 p 0 ) = x/n p 0 n p0 (1 p 0) n = ˆp p 0 p 0 (1 p 0) n. Idet x = 309,n = 1005,p 0 = 0.3139 og ˆp = x n = 309 1005 = 0.3075 Ved indsættelse fås: z o bs = 0.44 p-værdien bestemmes ved: p = 2 P (Z > z o bs ) = pnorm(0.44) = 0.6599 Samlet har vi altså, at den rigtige løsning er: *1 z obs = 0.3075 0.3139 0.3139 0.6861 = 0.44. Idet normalfordelingen er den relevante at 1005 benytte, har vi at p-værdien = 2P (Z 0.44 ) = 0.6599, H 0 accepteres, dvs. det på det foreliggende data vurderes, at stikprøven er repræsentativ mht. region 15

Alternativt kan alle beregninger laves med R-commandoen: prop.test(309, 1005, p = 0.3139, correct = F ALSE) Denne beregner dog teststørrelsen χ 2 = z o bs 2, evt. forskelle skyldes afrundinger. Spørgsmål X.2 (26) Lad X være en stokastisk variabel, der angiver antal RPL-spiller i stikprøven. Antag at X B( 1005, p ), hvor p angiver andelen hhv. sandsynligheden for at et tilfældig valgt individ er RPL-spiller. ˆp = 0.0755 er et godt gæt på hvad værdien af p er. Hvis 95% kondensintervallet har en bredde på 0.0066 er den tilhørende M E = 0.0033 og stikprøvensstørrelse kan da bestemmes ved formlen (Method 7.12 formel 7-4): n p (1 p) ( ) 2 (z1 α/2 ME Idet n = 1005, 1 α = 0.95 så α = 0.05 og z 1 α/2 = z 0.975 = qnorm(0.975) = 1.96 Ved indsættelse får vi at den rigtige løsning er: ( 2 1.96 3 n 0.0755 0.9245 0.0066/2) = 24622.8 rundet op altså n=24623 Spørgsmål X.3 (27) Lad Y være en stokastisk variabel, der angiver antal ludomaner blandt studenterne, idet studenterne opfattes som en tilfældig stikprøve fra den voksne danske befolkning, så gælder der at Y B( 717, p ), hvor p er sandsynligheden for at en tilfældig valgt individ i den danske voksne befolkning er ludoman. Dvs. p = P (X = ludoman) = 0.0109 Punktsandsynligheden i en binomialfordeling er bestemt ved (Jf. Denition 2.16 formel 2-20): ( ) n P (Y = y) = p y (1 p) n y = dbinom(y, n, p) y Idet n = 717, y = 7 og p = 0.0109 fås ved indsættelse at den rigtige løsning er: ( ) 717 2 P (Y = 7) = 0.0109 7 7 0.9891710 = 0.1432 Spørgsmål X.4 (28) Betragt antalstabellen over hyppighed og køn. Der er tale om en 6 2 antalstabel og vi betragter uafhængighedshypotesen: H 0 : p ij = p i p j, i = 1, 2,..., 6, j = 1, 2, hvor i angiver hyppighedssvar og j angiver køn. Testet af uafhængighedshypotesen forgår ved sætning 7.23/Metode 7.21 og de forventede værdier bestemmes ved: (iterkketotalen) (jtekolonnetotal) e ij = total Ved indsættelse følger: e Aldrig,kvinde = e 6,1 = 255 510 1003 Således ses at rigtig løsning er: 16 = 129.661

2 e Aldrig, kvinde = 129.661 Spørgsmål X.5 (29) Betragt antalstabellen over hyppighed og alder. Der er tale om en 6 6 antalstabel og vi betragter uafhængighedshypotesen: H 0 : p ij = p i p j, i = 1, 2,..., 6, j = 1, 2,..., 6,hvor i angiver hyppighedssvar og j angiver alder. Testet af uafhængighedshypotesen forgår ved sætning 7.23/Metode 7.21 og bidraget fra celle (i,j) q ij til teststørrelsen χ 2 obs bestemmes ved: q ij = left(o ij e ijright) 2 e ij, hvor o ij er den observerede værdi for celle (i,j) og e ij er den forventedeværdi for celle (i,j) Idet o En gang ugentligt, 50-59 år = o 1,4 = 41 og idet e En gang ugentligt,50-59 år = e 1,4 = 166 165) 997 = 27.4724 Ved indsættelse nder vi: q En gang ugentligt,50-59 år = q textup1,4 = (41 27.4724)2 27.4724 = 6.661 Den rigtige løsning er altså: 3 q En gang ugentligt, 50-59 år = 6.661 Spørgsmål X.6 (30) Vi ønsker at sammenligne andelen af RPL-spiller i SFI undersøgelsen hhv. Bjerg K og Megafon undersøgelsen, svarende til hypotesen: H 0 : p 1 = p 2 H 1 : p 1 p 2 Og skal bestemme teststørrelsen, p-værdi og konklusion, når α = 0.05. Teststørrelsen bestemmes ved (Jf. Method 7.17): ˆp z obs = 1 ˆp 2 ˆp (1 ˆp) ( ), hvor ˆp = x1+x2 1 n + 1 n 1+n 2 = n1 ˆp1+n2 ˆp2 n 1+n 2 1 n 2 p-værdien bestemmes ved: p = 2 P (Z z obs ) = 2 (1 pnorm (abs (z obs ))) Idet n 1 = 8153,ˆp 1 = 0.023,n 2 = 1005,ˆp 2 = 0.0755 følger ved indsættelse: ˆp = 8153 0.023+1005 0.0755 8153+1005 = 0.0288 0.023 0.0755 z obs = sqrt0.0288 (1 0.9712) ( 1 8153 + 1005) = 9.390 1 p-værdien bestemmes nu ved: p = 2 P (Z 9.390 ) = 2 (1 pnorm (abs ( 9.390))) = 6.02 10 21 Det fremgår således at den rigtige løsning er: 5 z obs = (0.023 0.0755)/ 0.0288 0.9712/( 1 8153 + 1 1005 ) = 9.390, Idet p = P (Z 9.390) = 3.01 10 21, forkastes H 0, dvs.der er en signikant stigning i andelen af RPL-spiller. 17