Lineær regression i SAS Lineær regression i SAS p.1/20
Lineær regression i SAS Simpel lineær regression Grafisk modelkontrol Multipel lineær regression SAS-procedurer: PROC REG PROC GPLOT Lineær regression i SAS p.2/20
Fitness data Evne til at forbruge oxygen - dyrt at bestemme! Ønsker at prædiktere denne evne på baggrund af kovariater: Run Rest Run Max Obs Age Weight Oxygen Time Pulse Pulse Pulse 1 44 89.47 44.609 11.37 62 178 182 2 40 75.07 45.313 10.07 62 185 185 3 44 85.84 54.297 8.65 45 156 168 4 42 68.15 59.571 8.17 40 166 172 5 38 89.02 49.874 9.22 55 178 180 6 47 77.45 44.811 11.63 58 176 176 7 40 75.98 45.681 11.95 70 176 180 8 43 81.19 49.091 10.85 64 162 170 9 44 81.42 39.442 13.08 63 174 176 10 38 81.87 60.055 8.63 48 170 186 Lineær regression i SAS p.3/20
Simpel lineær regression Model: ε α N 0 β σ2 ε Modellen fittes i SAS: PROC REG DATA=fitness; MODEL Oxygen=RunTime; RUN; Lineær regression i SAS p.4/20
Simpel lineær regression, output (1) The REG Procedure Model: MODEL1 Dependent Variable: Oxygen Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 1 632.90010 632.90010 84.01 <.0001 Error 29 218.48144 7.53384 Corrected Total 30 851.38154 Root MSE 2.74478 R-Square 0.7434 Dependent Mean 47.37581 Adj R-Sq 0.7345 Coeff Var 5.79364 Lineær regression i SAS p.5/20
Simpel lineær regression, output (2) Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > t Intercept 1 82.42177 3.85530 21.38 <.0001 RunTime 1-3.31056 0.36119-9.17 <.0001 ˆα 82 42 3 86 ˆβ 3 31 0 36 s 2 74 01 0001 17 F-test for modellen uden kovariater / T-test for β F 84 p 0 T 9 p 0 0001 0: Lineær regression i SAS p.6/20
Diagnostics fra proc reg Datasæt diagnostics indeholder prædikterede værdier, residualer og standardiserede residualer: PROC REG DATA=fitness; MODEL Oxygen=RunTime; OUTPUT OUT=diagnostics PRED=predicted RESIDUAL=res STUDENT=st_res; RUN; Der findes andre diagnostics, se evt. SAS manualen. Lineær regression i SAS p.7/20
Diagnostics, output Datasættet diagnostics: R p e R M r R s u a e W O u t n x d e x n P P P i s i y T u u u c t O A g g i l l l t r r b g h e m s s s e e e s e t n e e e e d s s 1 44 89.47 44.609 11.37 62 178 182 44.7808-0.17176-0.06396 2 40 75.07 45.313 10.07 62 185 185 49.0845-3.77148-1.40011 3 44 85.84 54.297 8.65 45 156 168 53.7855 0.51153 0.19614 4 42 68.15 59.571 8.17 40 166 172 55.3745 4.19646 1.64230 5 38 89.02 49.874 9.22 55 178 180 51.8985-2.02445-0.76260 6 47 77.45 44.811 11.63 58 176 176 43.9200 0.89099 0.33324 Lineær regression i SAS p.8/20
Diagnostics, redidualplot Mulighed 1: Benyt PROC GPLOT på datasættet diagnostics Mulighed 2: Benyt PROC REG direkte: PROC REG DATA=fitness; MODEL Oxygen=RunTime; PLOT Oxygen*RunTime student.*predicted.; RUN; Lineær regression i SAS p.9/20
Scatterplot med regressionslinie O x y g e n = 8 2. 4 2 2-3. 3 1 0 6 R u n T i m e 6 5 6 0 N 3 1 R s q 0. 7 4 3 4 A d j R s q 0. 7 3 4 5 R M S E 2. 7 4 4 8 5 5 O x y g e n 5 0 4 5 4 0 3 5 8 9 1 0 1 1 1 2 1 3 1 4 1 5 R u n T i m e Lineær regression i SAS p.10/20
Residualplot O x y g e n = 8 2. 4 2 2-3. 3 1 0 6 R u n T i m e 3 2 N 3 1 R s q 0. 7 4 3 4 A d j R s q 0. 7 3 4 5 R M S E 2. 7 4 4 8 S t u d e n t i z e d R e s i d u a l 1 0-1 - 2 3 5. 0 3 7. 5 4 0. 0 4 2. 5 4 5. 0 4 7. 5 5 0. 0 5 2. 5 5 5. 0 5 7. 5 P r e d i c t e d V a l u e Lineær regression i SAS p.11/20
Plots Hvilken metode skal man bruge til at plotte? Et OUTPUT statement i PROC REG sammen med PROC GPLOT giver langt de fleste muligheder. Hurtigt og let at plotte direkte med PROC REG alene, men begrænsede muligheder. Lineær regression i SAS p.12/20
Plots vha. PROC GPLOT Regressionslinie indtegnet: SYMBOL1 V=CIRCLE I=RL; PROC GPLOT DATA=fitness; PLOT Oxygen*RunTime; RUN; 95% konfidensinterval for regressionslinien: SYMBOL1 V=CIRCLE I=RLCLM95; 95% prædiktionsinterval: SYMBOL1 V=CIRCLE I=RLCLI95; Lineær regression i SAS p.13/20
Regressionslinie O x y g e n 7 0 6 0 5 0 4 0 3 0 8 9 1 0 1 1 1 2 1 3 1 4 1 5 R u n T i m e Lineær regression i SAS p.14/20
Regressionslinie med sikkerhedsgrænser O x y g e n 7 0 6 0 5 0 4 0 3 0 8 9 1 0 1 1 1 2 1 3 1 4 1 5 R u n T i m e Lineær regression i SAS p.15/20
Regressionslinie med prædiktionsgrænser O x y g e n 7 0 6 0 5 0 4 0 3 0 8 9 1 0 1 1 1 2 1 3 1 4 1 5 R u n T i m e Lineær regression i SAS p.16/20
Multipel lineær regression Model: α β 1 β 2 β 3 ε β 4 β 6 N 0 σ2 β 5 ε Modellen fittes i SAS: PROC REG DATA=fitness; MODEL Oxygen=RunTime Age Weight RunPulse RestPulse MaxPulse; RUN; Lineær regression i SAS p.17/20
Multipel lineær regression, output (1) The REG Procedure Model: MODEL1 Dependent Variable: Oxygen Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 6 722.54361 120.42393 22.43 <.0001 Error 24 128.83794 5.36825 Corrected Total 30 851.38154 Root MSE 2.31695 R-Square 0.8487 Dependent Mean 47.37581 Adj R-Sq 0.8108 Coeff Var 4.89057 Lineær regression i SAS p.18/20
Multipel lineær regression, output (2) Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > t Intercept 1 102.93448 12.40326 8.30 <.0001 RunTime 1-2.62865 0.38456-6.84 <.0001 Age 1-0.22697 0.09984-2.27 0.0322 Weight 1-0.07418 0.05459-1.36 0.1869 RunPulse 1-0.36963 0.11985-3.08 0.0051 RestPulse 1-0.02153 0.06605-0.33 0.7473 MaxPulse 1 0.30322 0.13650 2.22 0.0360 ˆα 102 93 12 40 ˆβ 1 2 63 0 38 F-test for modellen uden kovariater: F 22 43 p 0 0001 T-test for β 1 0: T 6 84 p 0 0001 s 2 32 Lineær regression i SAS p.19/20
Opsummering af syntax PROC REG DATA=data; MODEL respons=a b c; BY d; /* Kræver data sorteret efter d */ PLOT respons*a student.*predicted.; OUTPUT OUT=diagnostics PRED=predicted RESIDUAL=res STUDENT=st_res; RUN; SYMBOL1 V=CIRCLE I=RL /* RLCLM95/RLCLI95 */; PROC GPLOT DATA=data; PLOT respons*a; RUN; Lineær regression i SAS p.20/20