Advanced Statistical Computing Week 5: EM Algorithm

Relaterede dokumenter
Generalized Probit Model in Design of Dose Finding Experiments. Yuehui Wu Valerii V. Fedorov RSU, GlaxoSmithKline, US

Basic statistics for experimental medical researchers

DoodleBUGS (Hands-on)

Introduction Ronny Bismark

Privat-, statslig- eller regional institution m.v. Andet Added Bekaempelsesudfoerende: string No Label: Bekæmpelsesudførende

Linear Programming ١ C H A P T E R 2

ECE 551: Digital System * Design & Synthesis Lecture Set 5

Project Step 7. Behavioral modeling of a dual ported register set. 1/8/ L11 Project Step 5 Copyright Joanne DeGroat, ECE, OSU 1

Black Jack --- Review. Spring 2012

Exercise 6.14 Linearly independent vectors are also affinely independent.

PARALLELIZATION OF ATTILA SIMULATOR WITH OPENMP MIGUEL ÁNGEL MARTÍNEZ DEL AMOR MINIPROJECT OF TDT24 NTNU

Vina Nguyen HSSP July 13, 2008

Particle-based T-Spline Level Set Evolution for 3D Object Reconstruction with Range and Volume Constraints

Status på det trådløse netværk

Sign variation, the Grassmannian, and total positivity

Sortering fra A-Z. Henrik Dorf Chefkonsulent SAS Institute

GIGABIT COLOR IP PHONE

Multivariate Extremes and Dependence in Elliptical Distributions

X M Y. What is mediation? Mediation analysis an introduction. Definition

Noter til kursusgang 8, IMAT og IMATØ

SKRIFTLIG EKSAMEN I NUMERISK DYNAMIK Bygge- og Anlægskonstruktion, 7. semester Torsdag den 19. juni 2003 kl Alle hjælpemidler er tilladt

Sampling real algebraic varieties for topological data analysis

Statistik for MPH: 7

IPTV Box (MAG250/254) Bruger Manual

StarWars-videointro. Start din video på den nørdede måde! Version: August 2012

Probabilistic properties of modular addition. Victoria Vysotskaya

Skriftlig Eksamen Beregnelighed (DM517)

Angle Ini/al side Terminal side Vertex Standard posi/on Posi/ve angles Nega/ve angles. Quadrantal angle

Skriftlig Eksamen Beregnelighed (DM517)

Large Scale Sequencing By Hybridization. Tel Aviv University

IBM Network Station Manager. esuite 1.5 / NSM Integration. IBM Network Computer Division. tdc - 02/08/99 lotusnsm.prz Page 1

Skriftlig Eksamen Automatteori og Beregnelighed (DM17)

Eric Nordenstam 1 Benjamin Young 2. FPSAC 12, Nagoya, Japan

what is this all about? Introduction three-phase diode bridge rectifier input voltages input voltages, waveforms normalization of voltages voltages?

Besvarelser til Lineær Algebra Reeksamen Februar 2017

Skriftlig Eksamen Kombinatorik, Sandsynlighed og Randomiserede Algoritmer (DM528)

Pontryagin Approximations for Optimal Design of Elastic Structures

University of Copenhagen Faculty of Science Written Exam April Algebra 3

The X Factor. Målgruppe. Læringsmål. Introduktion til læreren klasse & ungdomsuddannelser Engelskundervisningen

Vores mange brugere på musskema.dk er rigtig gode til at komme med kvalificerede ønsker og behov.

Strings and Sets: set complement, union, intersection, etc. set concatenation AB, power of set A n, A, A +

A multimodel data assimilation framework for hydrology

UNISONIC TECHNOLOGIES CO.,

Bookingmuligheder for professionelle brugere i Dansehallerne

Statistik for MPH: oktober Attributable risk, bestemmelse af stikprøvestørrelse (Silva: , )

q-værdien som skal sammenlignes med den kritiske Chi-i-Anden værdi p-værdien som skal sammenlignes med signifikansniveauet.

On the complexity of drawing trees nicely: corrigendum

University of Copenhagen Faculty of Science Written Exam - 3. April Algebra 3

Shooting tethered med Canon EOS-D i Capture One Pro. Shooting tethered i Capture One Pro 6.4 & 7.0 på MAC OS-X & 10.8

NOTIFICATION. - An expression of care

Aktivering af Survey funktionalitet

Heuristics for Improving

E-PAD Bluetooth hængelås E-PAD Bluetooth padlock E-PAD Bluetooth Vorhängeschloss

CHAPTER 8: USING OBJECTS

Nyhedsmail, december 2013 (scroll down for English version)

Bayesian Statistics. Debdeep Pati Florida State University. October 6, 2016

CS 4390/5387 SOFTWARE V&V LECTURE 5 BLACK-BOX TESTING - 2

3D NASAL VISTA TEMPORAL

Kurver og flader Aktivitet 15 Geodætiske kurver, Isometri, Mainardi-Codazzi, Teorema Egregium

An expression of care Notification. Engelsk

ArbejsskadeAnmeldelse

University of Copenhagen Faculty of Science Written Exam - 8. April Algebra 3

Fejlbeskeder i SMDB. Business Rules Fejlbesked Kommentar. Validate Business Rules. Request- ValidateRequestRegist ration (Rules :1)

Adaptive Algorithms for Blind Separation of Dependent Sources. George V. Moustakides INRIA, Sigma 2

Portal Registration. Check Junk Mail for activation . 1 Click the hyperlink to take you back to the portal to confirm your registration

Department of Public Health. Case-control design. Katrine Strandberg-Larsen Department of Public Health, Section of Social Medicine

Aarhus Universitet, Science and Technology, Computer Science. Exam. Wednesday 27 June 2018, 9:00-11:00

Applications. Computational Linguistics: Jordan Boyd-Graber University of Maryland RL FOR MACHINE TRANSLATION. Slides adapted from Phillip Koehn

De tre høringssvar findes til sidst i dette dokument (Bilag 1, 2 og 3). I forlængelse af de indkomne kommentarer bemærkes følgende:

3D NASAL VISTA 2.0

Differential Evolution (DE) "Biologically-inspired computing", T. Krink, EVALife Group, Univ. of Aarhus, Denmark

DET KONGELIGE BIBLIOTEK NATIONALBIBLIOTEK OG KØBENHAVNS UNIVERSITETS- BIBLIOTEK. Index

USERTEC USER PRACTICES, TECHNOLOGIES AND RESIDENTIAL ENERGY CONSUMPTION

User Manual for LTC IGNOU

How consumers attributions of firm motives for engaging in CSR affects their willingness to pay

frame bracket Ford & Dodge

Barnets navn: Børnehave: Kommune: Barnets modersmål (kan være mere end et)

TM4 Central Station. User Manual / brugervejledning K2070-EU. Tel Fax

Business Rules Fejlbesked Kommentar

H2020 DiscardLess ( ) Lessons learnt. Chefkonsulent, seniorrådgiver Erling P. Larsen, DTU Aqua, Denmark,

Trolling Master Bornholm 2016 Nyhedsbrev nr. 3

Trolling Master Bornholm 2016 Nyhedsbrev nr. 5

Help / Hjælp

Fejlbeskeder i Stofmisbrugsdatabasen (SMDB)

Boligsøgning / Search for accommodation!

Economic MPC for large and distributed energy systems

Logistisk Regression - fortsat

- man sov tæt på belægningsstuerne

The River Underground, Additional Work

QUICK START Updated:

QUICK START Updated: 18. Febr. 2014

Mandara. PebbleCreek. Tradition Series. 1,884 sq. ft robson.com. Exterior Design A. Exterior Design B.

United Nations Secretariat Procurement Division

Chapter 6. Hydrogen Atom. 6.1 Schrödinger Equation. The Hamiltonian for a hydrogen atom is. Recall that. 1 r 2 sin 2 θ + 1. and.

WIKI & Lady Avenue New B2B shop

INGEN HASTVÆRK! NO RUSH!

Info og krav til grupper med motorkøjetøjer

RoE timestamp and presentation time in past

Statistical information form the Danish EPC database - use for the building stock model in Denmark

Software 1 with Java. Recitation No. 7 (Servlets, Inheritance)

Transkript:

Advanced Statistical Computing Week 5: EM Algorithm Aad van der Vaart Fall 2012

Contents EM Algorithm Mixtures Hidden Markov models 2

EM Algorithm

EM-algorithm SETTING: Observation X, likelihood θ p θ (X), hard to maximize and find MLE ˆθ). X can be viewed as 1st coordinate of (X,Y) with density (x,y) p θ (x,y): p θ (x) = p θ (x,y)dµ(y). EM-ALGORITHM: GIVEN θ 0 REPEAT E-step: compute θ E θi ( log pθ (X,Y) X ). M-step: θ i+1 =: point of maximum of this function. θ 0, θ 1,... often tends to MLE, but may not converge, converge slowly, or converge to local maximum. [ Y may be missing data, of augmented data, invented for convenience.] 4

EM-Algorithm increases target LEMMA θ 0, θ 1,... generated by EM-algorithm satisfies p θ0 (X) p θ1 (X). PROOF p θ (x,y) = p θ (y x)p θ (x). E θi ( log pθ (X,Y) X ) = E θi ( logpθ (Y X) X ) +logp θ (X). Because θ i+1 maximizes left side over θ, it suffices to show ( E θi logp θi+1 (Y X) X ) ( E θi logp θi (Y X) X ). Or K(p,q):= E p log(q/p)(y) 0 for p =, q =, conditioned on X. p θi p θi+1 Now Kullback-Leibler divergence K(p; q) is nonnegative for any p, q. This does not prove that θ i converges to the MLE! 5

EM-Algorithm linear convergence The speed of the EM-algorithm is linear, with slow convergence if the augmented model is statistically much more informative than the data model. 6

Mixtures

Mixtures SETTING Observations random sample X 1,...,X n from density p θ (x) = k p j f(x;η j ), θ = (p 1,...,p k,η 1,...,η k ). j=1 AUGMENTED DATA P(Y i = j) = p j, X i Y i = j f( ;η j ), i = 1,...,n. Full likelihood p θ (X 1,...,X n,y 1,...,Y n ) = n k ( pj f(x i ;η j ) ) 1 {Yi =j}. i=1j=1 8

Mixtures E-step, M-step E-step: given ( p, ẽta): E p, η (log n k ( pj f(x i,η j ) ) ) 1 {Yi =j} X 1,...,X n = i=1j=1 n k log ( p j f(x i,η j ) ) α i,j, i=1j=1 ( ) p j f(x i, η j ) α i,j := P p, η Yi = j X i = c p cf(x i, η c ) = [ k ( n logp j α i,j )]+ j=1 i=1 k [ n logf(x i ;η j ) α i,j ]. j=1 i=1 M-step: for j = 1,...,k: p new j = 1 n n α i,j, i=1 η new j = argmax η n logf(x i ;η) α i,j. i=1 [ If the f( ;η) have a common parameter, then the computation of the η j does not separate as they do here.] 9

Mixtures Example EXAMPLE If f( ;η) N(η,1), then n logf(x i,η) α i,j = 1 n 2 i=1 α i,j(x i η) 2 +Const. i=1 η new j = n i=1 α ijx i n i=1 α i,j. EXAMPLE If f( ;η) Γ(r,η), then n logf(x i,η) α i,j = i=1 η new j n (rlogη ηx i ) α i,j +Const. i=1 = r n i=1 α i,j n i=1 α i,jx i. 10

R 0.00 0.10 0.20 0.30 0 5 10 15 > n=100 > shape=c(2,2,2); eta=c(1,6,.2); prob=c(1/4,1/8,5/8) > component=sample(c(1,2,3),n,replace=true,prob=prob) > x=rgamma(n,shape=shape[component],rate=eta[component]) 11

R EM, known shape > k=3; a=matrix(0,n,k); p=c(1/3,1/3,1/3); eta=c(1,2,3); change=1 > while (change>0.0001){ + for (j in 1:k) a[,j]=p[j]*dgamma(x,2,eta[j]) + a=diag(1/apply(a,1,sum))%*%a + etanew=2*apply(a,2,sum)/matrix(x,1,n)%*%a + pnew=apply(a,2,mean) + change=sum(abs(etanew-eta)+abs(pnew-p)) + print(rbind(pnew,etanew)) + eta=etanew; p=pnew} [ --- output deleted ---- ] [,1] [,2] [,3] pnew 0.6259239 0.3161804 0.05789564 0.2157931 1.7430514 7.57683781 0.00 0.10 0.20 0.30 0 5 10 15 12

R packages > library(mixtools) > mod=gammamixem(x,k=3) number of iterations= 323 > summary(mod) Error in summary.mixem(mod) : Unknown mixem object of type gammamixem > mod[[2]]; mod[[3]] [1] 0.37441469 0.57523322 0.05035209 comp.1 comp.2 comp.3 alpha 1.6203475 2.092346 20.9880430 beta 0.6184701 4.126267 0.7926715 0.00 0.10 0.20 0.30 0 5 10 15 [ Besides package mixtools, there is also flexmix, and... (?)] 13

Mixtures warnings Not all mixtures are identifiable from the data: multiple parameter vectors may give the same mixture. Maximum likelihood may work only if the parameter set is restricted. (Notable example: location scale mixtures, if the scale parameter approaches zero, the likelihood may tend to infinity.) EM tends to be slow for large data sets, and might get stuck in local maxima (?) 14

Hidden Markov models

Hidden Markov model Y 1 Y 2 Y 3... Y n 1 Y n X 1 X 2 X 3... X n 1 X n Markov chain of hidden states Y 1,Y 2,...,; only outputs X 1,X 2,... observed. X i given Y i conditionally independent of all other variables. EXAMPLES speech recognition: states abstract, outputs Fourier coding of sounds. genomics: states are introns/exons, outputs nucleotides genomics: states are # chromosomal duplicates, outputs noisy measurements genetics: states inheritance vectors, output measured markers. cell biology: states of ion channels, outputs current or no current economics: state of economy, output # firms in default. 16

Hidden Markov model Y 1 Y 2 Y 3... Y n 1 Y n X 1 X 2 X 3... X n 1 X n Markov chain of hidden states Y 1,Y 2,...,; only outputs X 1,X 2,... observed. X i given Y i conditionally independent of all other variables. Parameters density π of Y 1 transition density p(y i y i 1 ) of the Markov chain. output density q(x i y i ). Full likelihood π(y 1 )p(y 2 y 1 ) p(y n y n 1 ) q(x 1 y 1 ) q(x n y n ). 17

HMM E and M-step E-step: E π, p, q ( logπ(y 1 ) n p(y i Y i 1 ) i=2 n ) q(x i Y i ) X 1,...,X n i=1 = E π, p, q ( logπ(y 1 ) X 1,...,X n ) + + n i=2 n i=1 E π, p, q ( logp(y i Y i 1 ) X 1,...,X n ) E π, p, q ( logq(x i Y i ) X 1,...,X n ). M-step: depends on the specification of models for π,p,q. if state space is finite p is typically left free. only current estimate of law of (Y i 1,Y i ) given X 1,...,X n needed, which are computed using the forward and backward algorithm. 18

Baum-Welch The EM-algorithm for the HMM with finite state space, and completely unspecified distributions π, p, q, is called Baum-Welch algorithm. If π and p are left free: π new = p Y 1 X 1,...,X n π, p, q (y). p new (v u) = n i=2 py i 1,Y i X 1,...,X n π, p, q (u, v) n. i=2 py i 1 X 1,...,X n π, p, q (u) If q is also left free (possible for finite output space, but not often the case): q new (x y) = x X i:x i =x py i X 1,...,X i 1,X i =x,x i+1,...,x n π, p, q (y) i:x i =x py i X 1,...,X i 1,X i =x,x i+1,...,x n π, p, q (y). [ To compute these expressions need density of (Y i 1,Y i ) given X 1,...,X n. This is computed using the forward and backward algorithm.] 19

Viterbi Y 1 Y 2 Y 3... Y n 1 Y n X 1 X 2 X 3... X n 1 X n The Viterbi algorithm computes the most likely state path given the outcomes: argmax y 1,...,y n P(Y 1 = y 1,...,Y n = y n X 1,...,X n ). 20

R 0 1 2 3 4 5 0 20 40 60 80 100 > library(hiddenmarkov) > Pi=matrix(c(0.7,0.3,0.2,0.8),2,2,byrow=TRUE); delta=c(0.3,0.7 > n=100; pn=list(size=rep(5,n)); pm=list(prob=c(0.3,0.8)) > myhmm=dthmm(null,pi=pi,delta=delta,distn="binom",pn=pn,pm=pm) > x=simulate(myhmm,n) > > plot(1:n,x$x,type="s",xlab="",ylab="") > lines(1:n,x$y-1,col=2,type="s") [ Markov chain with two states, transition matrix Π, initial distribution δ. Outputs are from the binomial(5,p)- distribution, with θ = 0.3 from state 1 and θ = 0.8 from state 2. Red: states, Black: outputs.] 21

R 0 1 2 3 4 5 0 20 40 60 80 100 > mod=baumwelch(x); mod$pi; mod$pm [---- output deleted ---] [,1] [,2] [1,] 0.6287149 0.3712851 [2,] 0.2637289 0.7362711 $prob [1] 0.3173456 0.8313127 0.7 0.3 [ Markov chain with two states, transition matrix Π =, initial distribution δ = (0.3,0.7). 0.2 0.8 Outputs are from the binomial(5,p)- distribution, with θ = 0.3 from state 1 and θ = 0.8 from state 2.] 22

R 0 1 2 3 4 5 0 20 40 60 80 100 > Viterbi(x) [1] 2 2 2 2 2 2 2 1 1 2 2 2 2 2 1 1 1 1 1 2 2 2 2 2 2 2 1 1 1 [36] 2 2 2 2 2 2 2 2 2 1 1 1 2 1 1 2 2 2 2 2 2 2 1 1 1 1 2 2 1 [71] 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 1 1 > lines(1:n,viterbi(x)-1,col=3,lw=2) [ Red: true states, Black: outputs; Green: reconstructed states.] 23