Bayes,a,b autoatic relevance deterination ARD Chang Tipping 00 relevance vector achine RVM RVM 003 Tipping RVM Bayes autoatic relevance deterination Taylor Fast Sparse Bayesian Learning Algorith for Ordinal Regression Nagashia Kazuhisa,a Inoue Masato,b Abstract: Ordinal regression is a ulticlass classification proble in which classes have an order relation. One of the sparse approaches for ordinal regression has been proposed by Chang et al., which utilizes autoatic relevance deterination ARD prior. This idea is siilar to the one of the relevance vector achines RVMs for regression and classification probles by Tipping, 00. A fast algorith for solving RVMs has also been proposed by Tipping et al. in 003. This algorith greatly iproves the coputational coplexity of RVMs. In this anuscript, we introduce a new odel for ordinal regression that is easy to handle and propose a fast sparse learning ethod according to that of RVMs. We then illustrate that the proposed ethod runs rearkably faster than existing ethods with equivalent or better precision rates by nuerical experients. Keywords: ordinal regression, Bayesian estiation, autoatic relevance deterination prior, sparse learning, Taylor approxiation. relevance vector achine RVM 00 Tipping a kazuhisa@suou.waseda.jp b asato.inoue@eb.waseda.ac.jp 003 RVM RVM
.. ; a b px a; b x R N µ, Σ N x; µ, Σ exp πσ x µ Σ x µ R σx + e x if is true 0 otherwise σ σ σ R + 0 i,j i j I i i i diag. x t {0,,, K} N {x n, t n } N x n t n n N + x N+ t N+ t [t,..., t N ] class 0 < class < < class K class 0 class class class.3 M ϕ x R,,..., M x x 3. 3. ϕ n [ϕ x n,..., ϕ M x n ] R M [3] K b [b, b,..., b K ] R K b < b <... < b K p t n ϵ n, w; b btn ϕ n w+ϵn<btn+ pϵ n N ϵ n ; 0, 3 w [w,..., w M ] R M b 0 b K+ + pt n w; b + pt n ϵ n, w; b pϵ n dϵ n cub tn+ ϕ n w cub tn ϕ n w 4 cu K [4] x n t n pt n k w; b σb k+ ϕ n w 5 pt n w; b σb tn+ ϕ n w σb tn ϕ n w 6 4 0 π 3 pϵ n e ϵn + e ϵn 7
3. 4 6 w α [α,..., α M ] R + M pw; α M N w ; 0, α 8 autoatic relevance deterination ARD Tipping relevance vector achine RVM [6] pt N+, t, w; b, α N+ p t n w; b pw; α 9 3.3 6 4 σ cu Bayes t N+ pt N+ t; b, α 0 {b, α } argax pt; b, α b,α α w Dirac w 0 ϕ w axiu a posteriori: MAP w argax pw t; b, α w t N+ pt N+ w 3 t N+ w MAP t N+ argax t N+ pt N+ t; b, α 4 t N+ argax t N+ pt N+ w 5 t N+ MAP 3.4 w b α w b 0 α 0 w i {b i, α i } i 0,,... b α w b α w w i+ argax pw t; b i, α i 6 w Newton-Raphson w i,0 w i j 0,,... w i,j+ w Hw; b, α hw; b, α : w w i,j, b b i, α α i 7 hw; b, α ln pt, w; b, α 8 hw; b, α hw; b, α w N s s 0 ϕ n + Aw 9 Hw; b, α hw; b, α w w 0 N [s ] s ϕ n ϕ n + A s 0 s 0 s 0 σb tn+ ϕ n w σb tn ϕ n w s σ b tn+ ϕ n w σ b tn ϕ n w 3 s σ b tn+ ϕ n w σ b tn ϕ n w 4 8 Hesse Hw; b, α w [ϕ,..., ϕ N ] M w i+ pt; b, α Laplace b α Laplace fw, w R M Taylor 3
e fw dw e fµ [w µ] F[w µ] dw R M R M e fµ π F µ arginfw, F fw w w w wµ b α pt; b, α pt, w; b, αdw R M sb, α sb, α 5 pt, w i+ ; b, α π Hwi+ ; b, α 6 Laplace Taylor w i+ Taylor 0 b α α i+ argax sb i, α 7 α b i+ argax sb, α i+ 8 b α [ pt n w, b tn ] y n,k Z n k [ K tn ] Z n y n,k t n0 k K kt n+ K kt n+ y n,k 3 y n,k 3 y n,k σϕ n w+b k 33 Z n Z n w [w,..., w M ] R M b [b,..., b K ] R K b k 3 4. [7] ARD 8 w α [α,..., α M ] R + M b β [β,..., β K ] R + K pω; χ ω [w, b ] [ω,..., ω M+K ] 34 χ [α, β ] [χ,..., χ M+K ] 35 M+K N ω ; 0, χ 36 C diagχ pt N+, t, ω; χ N+ p t n ω pω; χ 37 α i+ αi [Hw i+, b i, α i ], w 9 b b i,0 b i j 0,,... b i,j+ k b k η N tn+k tnkσ b k ϕ n w σb tn+ ϕ n w σb tn ϕ n w : w w t+, b b i,j 30 η > 0 0 4 4. 4. 4.3 ω MAP ω MAP t N+ MAP 4.4 χ argax pt; χ 38 χ ω argax pω t; χ 39 ω t N+ argax t N+ pt N+ ω 40 χ 0 ω i χ i i 0,,... χ ω 4
χ ω ω i+ argax pω t; χ i 4 ω Newton-Raphson ω i,0 ω i j 0,,... ω i,j+ ω Hω; χ hω; χ : ω ω i,j, χ χ i 4 hω; χ ln pt, ω; χ 43 hω; χ hω; χ Ψ d + Cω ω 44 Hω; χ hω; χ ω ω Ψ DΨ + C 45 d [ K [ N k y n,k t n ] N ] K y n,k k tn k Φ 0N,K Ψ, Φ 0 K,M I K D DN D N,K D N,K D K 46 ϕ.. ϕ N 47 48 [D N,K ] n,k y n,k σ ϕ n w+b k 49 [ K ] N D N diag 50 k [ N D K diag y n,k y n,k ] K k 5 43 Hesse y n,k d D y n,k y n,k ω χ ln pt ω ω ω t+ Taylor gω ln pt ω 5 gω gω ω Ψ d 53 Gω gω ω ω Ψ DΨ 54 gω g ˇω + [ω ˇω] g ˇω + [ω ˇω] G ˇω[ω ˇω] 55 ln N ˇτ ; Ψω, Ď + const 56 ˇτ Ψ ˇω Ď ď 57 ˇ ω ω t+ const ω χ pt; χ pt; χ pt ωpω; χdω R M+K N ˇτ ; Ψω, Ď N ω; 0, C dω R M+K N ˇτ ; 0, Ď + ΨC Ψ 58 χ χ χ i+ argax N ˇτ ; 0, Ď + ΨC Ψ 59 χ { s χ i+ q q s > s otherwise s q 60 ψ E ψ 6 ψ E ψ χ i χ i E Ď + Ψ ψ E τ 6 ψ E ψ C i Ψ 63 ψ Ψ 6 6 E 63 χ i D C ω i+ χ i E 58 ω χ ω 4.5 α w 0 ϕ S i { } S i {,, K + M} χ i 64 S i χ C S S Ψ S S ω S M S S i S i 4 O M + K 3 + M + K N 3 O M i i S + M S N 4 5
ω i,j+ S Ψ S ĎΨ S + C S Ψ S ĎΨ S ˇω S + Ψ S ď 65 ˇ j ω i,j i M S [4] 5. 4 GPOR[3] ARD ORSB ARD SoftMax 0-fold cross validation 5. N M M S 0 K + 5. [3] 0.0 0. 0.4 0.6 0.8 validation error proposed GPOR ORSB SoftMax 0.0 0.5.0.5.0.5 3.0 learning tiesec proposed GPOR ORSB SoftMax N 00 M 0 M S 3 K 3 50 0-fold Diabetes pyriidines traiazines wisconsin proposed 0.49±0.0 0.5±0.09 0.56±0.04 0.69±0.05 GPOR 0.86±0.9 0.60±0.3 0.8±0.6 0.68±0.05 ORSB 0.57±0.4 0.8±0.7 0.70±0.04 0.88±0.07 SoftMax 0.49±0.04 - - - ORSB GPOR SoftMax 6. [] Peter McCullagh, Regression Models for Ordinal Data, Journal of the Royal Statistical Society. Series B Methodological, pp 09-4, 980. [] Cande V Ananth and David G Kleinbau, Regression odels for ordinal responses: a review of ethods and applications, International journal of epideiology, 997. [3] Wei Chu and Zoubin Ghahraani, Gaussian Processes for Ordinal Regression, Technical Report, UCL, UK, 004. [4] Xiao Chang, Qinghua Zheng and Peng Lin, Ordinal Regression with Sparse Bayesian, Eerging Intelligent Coputing Technology and Applications. With Aspects of Artificial Intelligence, Lecture Notes in Coputer Science Volue 5755, 009, pp 59-599 [5] C.M.Bishop, Pattern Recognition and Machine Learning, Springer, 006. [6] Michael E Tipping,Sparse bayesian learning and the relevance vector achine, The Journal of Machine Learning Research archive Volue, 00 [7] Michael E. Tipping and Anita C. Faul, Fast arginal likelihood axiisation for sparse Bayesian odels, Microsoft Research,Cabridge,U,K.,003 6