WPS / R day Rune Juhl DTU Technical University of Denmark DTU Compute Department of Applied Mathematics and Computer Science 11th December 2013 DTU WPS Compute / R day Department of Applied 11th December Mathematics 2013and Computer 1 / 27
Om mig Historie B.Sc. Medicin og Teknologi, DTU og KU Prægraduat forskningsår, KU Hjerneforskning, Rigshospitalet og Hvidovre Hospital M.Sc. Matematisk modellering og computing, DTU une Juhl (DTU Technical University of Denmark DTU WPS Compute / R day Department of Applied 11th December Mathematics 2013and Computer 2 / 27
Om mig Historie B.Sc. Medicin og Teknologi, DTU og KU Prægraduat forskningsår, KU Hjerneforskning, Rigshospitalet og Hvidovre Hospital M.Sc. Matematisk modellering og computing, DTU Nu PhD Matematisk statistik (SDE), DTU Compute DTU WPS Compute / R day Department of Applied 11th December Mathematics 2013and Computer 2 / 27
Om R Åben og gratis computersprog til statistik Ross Ihaka og Robert Gentleman, University of Auckland, New Zealand Første udgave i 1993 Stort online community Mange udvidelsesmuligheder (CRAN) Hvor findes R: www.r-project.org DTU WPS Compute / R day Department of Applied 11th December Mathematics 2013and Computer 3 / 27
RStudio IDE Gratis og meget populær IDE - www.rstudio.com DTU WPS Compute / R day Department of Applied 11th December Mathematics 2013and Computer 4 / 27
Grundlæggende R R er et computersprog med variabler af forskellige datatyper funktioner med flere argumenter og et resultat DTU WPS Compute / R day Department of Applied 11th December Mathematics 2013and Computer 5 / 27
Grundlæggende R R er et computersprog med variabler af forskellige datatyper funktioner med flere argumenter og et resultat Datatyper i R er numeric integer character logical matrix list data.frame DTU WPS Compute / R day Department of Applied 11th December Mathematics 2013and Computer 5 / 27
Grundlæggende operationer Tildelingsoperatoren er "<-" x <- 1 x ## [1] 1 Lave en vector c(-1, 0, 1, 2) ## [1] -1 0 1 2-1:2 ## [1] -1 0 1 2 DTU WPS Compute / R day Department of Applied 11th December Mathematics 2013and Computer 6 / 27
Grundlæggende operationer Hvis springet nu ikke er 1 (x <- seq(-1, 2, by = 0.5)) ## [1] -1.0-0.5 0.0 0.5 1.0 1.5 2.0 Inspicering af variablen x summary(x) ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## -1.00-0.25 0.50 0.50 1.25 2.00 str(x) ## num [1:7] -1-0.5 0 0.5 1 1.5 2 DTU WPS Compute / R day Department of Applied 11th December Mathematics 2013and Computer 7 / 27
Grundlæggende operationer Pas på med at sætte forskellige datatyper sammen (x <- c(1, 2, "a")) ## [1] "1" "2" "a" x er nu en charactervektor str(x) ## chr [1:3] "1" "2" "a" DTU WPS Compute / R day Department of Applied 11th December Mathematics 2013and Computer 8 / 27
Dataframes data.frame er en vigtig datatype i R dat <- data.frame(values = c(20.2, 19.0, 18.7, 20.1), Sex = rep(c("m", "F"), each = 2)) dat ## Values Sex ## 1 20.2 M ## 2 19.0 M ## 3 18.7 F ## 4 20.1 F str(dat) ## 'data.frame': 4 obs. of 2 variables: ## $ Values: num 20.2 19 18.7 20.1 ## $ Sex : Factor w/ 2 levels "F","M": 2 2 1 1 DTU WPS Compute / R day Department of Applied 11th December Mathematics 2013and Computer 9 / 27
Dataframes - udtræk variabler Variabler i en data.frame kaldes via $ dat$values ## [1] 20.2 19.0 18.7 20.1 dat$sex ## [1] M M F F ## Levels: F M dat[, "Sex"] ## [1] M M F F ## Levels: F M DTU WPS Compute / R day Department of Applied 11th December Mathematics 2013and Computer 10 / 27
Subsetting Først dannes 10 tilfældige normalfordelte tal (x <- rnorm(10)) ## [1] -0.25752-0.57893 0.80306-0.63223 0.65494-0.06964 ## [8] 0.98033 0.82180 0.55078 Find dem der er større end 0 (keep er logical) (keep <- x > 0) ## [1] FALSE FALSE TRUE FALSE TRUE FALSE TRUE TRUE TRUE x[keep] ## [1] 0.8031 0.6549 0.2186 0.9803 0.8218 0.5508 x[!keep] DTU WPS Compute / R day Department of Applied 11th December Mathematics 2013and Computer 11 / 27
Subsetting i dataframes - udtræk rækker dat dataframen fra før. Udtræk kvinderne dat[dat$sex == "F", ] ## Values Sex ## 3 18.7 F ## 4 20.1 F subset(dat, Sex == "F") ## Values Sex ## 3 18.7 F ## 4 20.1 F Komplicerede udtræk kan laves med & og. DTU WPS Compute / R day Department of Applied 11th December Mathematics 2013and Computer 12 / 27
Indlæsning af data R læser nemt tekstfiler ind med read.table. data <- read.table("data.csv", header = TRUE, sep = ";") data er nu en data.frame. DTU WPS Compute / R day Department of Applied 11th December Mathematics 2013and Computer 13 / 27
Linære modeller Linære modeller fittes med lm. Modellen gives ved en formular: respons ~ forklarende.variabler Intercept er inkluderet som standard og kan fjernes med -1 + eller 0 + DTU WPS Compute / R day Department of Applied 11th December Mathematics 2013and Computer 14 / 27
Linære modeller Simpel linær model på Iris datasættet i R lm(sepal.length ~ Sepal.Width + Petal.Width, data = iris) ## ## Call: ## lm(formula = Sepal.Length ~ Sepal.Width + Petal.Width, data ## ## Coefficients: ## (Intercept) Sepal.Width Petal.Width ## 3.457 0.399 0.972 Ikke anbefalet måde lm(iris$sepal.length ~ iris$sepal.width + iris$petal.width) DTU WPS Compute / R day Department of Applied 11th December Mathematics 2013and Computer 15 / 27
Visualisering af data R er har mange muligheder for at plotte data. Grundlæggende funktioner plot hist pairs plot(iris$sepal.length, iris$sepal.width) DTU WPS Compute / R day Department of Applied 11th December Mathematics 2013and Computer 16 / 27
Visualisering af data plot(iris$sepal.length, iris$sepal.width) iris$sepal.width 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 iris$sepal.length DTU WPS Compute / R day Department of Applied 11th December Mathematics 2013and Computer 17 / 27
Øvelse Installer R Installer RStudio Importer data Iris summary plot lm DTU WPS Compute / R day Department of Applied 11th December Mathematics 2013and Computer 18 / 27
R s pakkesystem R udvides dagligt af mange folk fra hele verden. Standard repository er CRAN (Comprehensive R Archive Network) 5006 tilgængelige pakker (november 2013) une Juhl (DTU Technical University of Denmark DTU WPS Compute / R day Department of Applied 11th December Mathematics 2013and Computer 19 / 27
R s pakkesystem R udvides dagligt af mange folk fra hele verden. Standard repository er CRAN (Comprehensive R Archive Network) Installering og brug af pakker 5006 tilgængelige pakker (november 2013) install.packages("ggplot2") library(ggplot2) DTU WPS Compute / R day Department of Applied 11th December Mathematics 2013and Computer 19 / 27
Overblik over pakkerne CRAN Task Views http://cran.r-project.org/web/views/ DTU WPS Compute / R day Department of Applied 11th December Mathematics 2013and Computer 20 / 27
Alternative plottemuligheder Mulige pakker til at plotte data med ggplot2 lattice GGobi Rgl (3D) Anbefaler ggplot2 skrevet af Hadley Wickham, Rice University DTU WPS Compute / R day Department of Applied 11th December Mathematics 2013and Computer 21 / 27
ggplot2 DTU WPS Compute / R day Department of Applied 11th December Mathematics 2013and Computer 22 / 27
ggplot2 eksempel ggplot(data = iris, aes(x = Sepal.Length, y = Sepal.Width, shape = Species)), colour = Petal.Length) geom_point() + ggtitle("questionable Plot\nFlowers") DTU WPS Compute / R day Department of Applied 11th December Mathematics 2013and Computer 23 / 27
ggplot2 eksempel 4.5 Questionable Plot Flowers 4.0 Sepal.Width 3.5 3.0 Species setosa versicolor virginica Petal.Length 6 5 4 3 2 1 2.5 2.0 5 6 7 8 Sepal.Length DTU WPS Compute / R day Department of Applied 11th December Mathematics 2013and Computer 24 / 27
GLM i R ## Dobson (1990) Page 93: Randomized Controlled Trial : counts <- c(18, 17, 15, 20, 10, 20, 25, 13, 12) outcome <- gl(3, 1, 9) treatment <- gl(3, 3) head(d.ad <- data.frame(treatment, outcome, counts)) ## treatment outcome counts ## 1 1 1 18 ## 2 1 2 17 ## 3 1 3 15 ## 4 2 1 20 ## 5 2 2 10 ## 6 2 3 20 DTU WPS Compute / R day Department of Applied 11th December Mathematics 2013and Computer 25 / 27
GLM i R (glm.d93 <- glm(counts ~ outcome + treatment, family = poisson( ## ## Call: glm(formula = counts ~ outcome + treatment, family = ## ## Coefficients: ## (Intercept) outcome2 outcome3 treatment2 treatme ## 3.04e+00-4.54e-01-2.93e-01 1.59e-16-3.55e ## ## Degrees of Freedom: 8 Total (i.e. Null); 4 Residual ## Null Deviance: 10.6 ## Residual Deviance: 5.13 AIC: 56.8 DTU WPS Compute / R day Department of Applied 11th December Mathematics 2013and Computer 26 / 27
GLM i R summary(glm.d93) ## glm(formula = counts ~ outcome + treatment, family = poisson ## ## Call: ## ## Deviance Residuals: ## 1 2 3 4 5 6 ## -0.6712 0.9627-0.1696-0.2200-0.9555 1.0494 0.847 ## 9 ## -0.9666 ## ## Coefficients: ## Estimate Std. Error z value Pr(> z ) ## (Intercept) 3.04e+00 1.71e-01 17.81 <2e-16 *** ## outcome2-4.54e-01 2.02e-01-2.25 0.025 * ## outcome3-2.93e-01 1.93e-01-1.52 0.128 DTU WPS Compute / R day Department of Applied 11th December Mathematics 2013and Computer 27 / 27