Adaptive Algorithms for Blind Separation of Dependent Sources George V. Moustakides INRIA, Sigma 2
Problem definition-motivation Existing adaptive scheme-independence General adaptive scheme-dependence Imposing the desired limits Imposing symmetric behavior Non-separable sources Performance measure Optimum algorithms Conclusion Outline 2
Problem Definition-Motivation x1() n a11 a12 L a1 L s1() n x2() n a21 a22 a 2L s2() n L = M M M M M M x () n a a L a s () n L L1 L2 LL L Xn () = ASn () If the observation sequence X(n) is available, estimate the source sequence S(n). Equivalently estimate B = A -1. Interested in adaptive techniques. At every time n, available X(n) and form an estimate B(n) of A -1. 3
Why Dependent Sources? Because it is analytically challenging. Here is a practical reason to motivate such an analysis L f( s1, K, s ) = 0.99 f ( s) + 0.01 L i i i= 1 ε-contamination model L i= 1 g( s) i i f(s 1,,s L ) no longer corresponds to independent sources!! Question: Do BSS algorithms still work under such mild divergence from the independence assumption? 4
Existing adaptive technique Consider L = 2 sources and let B(n) denote the estimate of A -1 at time n. At time n available: B(n-1), X(n) Sn ˆ( ) = B ( n 1) Xn ( ) B( n) = B( n 1) Sˆ µ H ( n) B( n 1), B(0) = I ( ) µ is a constant (step size) with 0 < µ << 1. 2 z1 1 zz 1 2 0 gzz 1( 1) 2 gzz 2( 2) 1 H() Z = 2 + zz 12 z2 1 gzz 1( 2) 1 gzz 2( 1) 2 0 g i (z) are odd univariate nonlinear function (i.e. g i (-z) = - g i (z)). 5
Acceptable solution(s) Sn ˆ( ) = B( n 1) Xn ( ) = B( n 1) ASn ( ) = C( n 1) Sn ( ) ± c1 0 ˆ ± cs 1 1( n) C( n) S( n) 0 ± c ± c s ( n) 2 2 2 c c s n 0 ± 2 ± 2 2( ), ± c1 0 ± cs 1 1( n) 6
Does C(n) = B(n)A converge to one of the eight matrices? ± c1 0 0 ± c2 or? 0 ± c ± c 0 2 1 Theorem 1: If the sources s i (n) are independent random variables, with symmetric probability density functions; and at most one is Gaussian, then ± c { } 1 0 0 ± c2 lim E C( n) = or n 0 ± c 2 ± c1 0 lim n E{(C(n)-E{C( )}) 2 }= O(µ) 7
Observations No knowledge of the exact pdfs is required; only symmetry (and independence (?)). 2 z1 1 zz 1 2 H( z1, z2) = 2 zz 1 2 z2 1 0 g( z) z g ( z ) z g() z = α z i 1 1 2 2 2 1 + g1( z1) z2+ g2( z2) z1 0 i 3 common choice Two or more Gaussian sources are NOT separable, in the sense that for ANY adaptive algorithm there is a continuum of limits instead of the 8 desirable ones. 8
Dependent Source Statistics If sources are dependent and the joint pdf is known, in principle we can construct adaptive algorithms that solve the source separation problem. We are interested in dependent sources for which it is not required knowledge of the pdfs. All bivariate densities must satisfy the following quadratal symmetry s j f ( s, s ) = f ( s, s ) = f ( s, s ) ij i j ij i j ij i j s i L f( s1, K, s ) = 0.99 f ( s) + 0.01 L i i i= 1 L i= 1 g( s) i i 9
General adaptive scheme Sn ˆ( ) = B ( n 1) Xn ( ), Xn ( ) = ASn ( ) ( ˆ ) B( n) = B( n 1) µ H S( n) B( n 1), B(0) = I H( z, z ) 1 2 hz ( ) qz (, z) = 1 2 1 qz (, z) hz ( ) 1 2 2 Sn ˆ( ) = C ( n 1) Sn ( ), C ( n ) = B ( n ) A ( ˆ ) C( n) = C( n 1) µ H S( n) C( n 1), C(0) = A C( n) = C( n 1) µ H C( n 1) S( n) C( n 1) ( ) 10
Imposing the desired limits From the theory of adaptive algorithms we know that an algorithm of the form C( n) = C( n 1) µ H C( n 1) S( n) C( n 1) ( ) converges in the mean lim n E{C(n)}= C, where C is an equilibrium matrix satisfying the equation { ( ( ))} C= C µe H CS n C or equivalently { H( C )} E S( n ) = 0 11
Since H( z, z ) 1 2 hz ( ) qz (, z) 1 2 1 = q( z1, z2) h( z2) relation { H( C )} E S( n ) = 0 is equivalent to a system of four (in general nonlinear) equations { ( C )} ( C ) { } E h S( n) = E q S( n) = 0 in four unknowns (the four elements of matrix C) that can be solved to identify the equilibrium matrices. 12
If we want a specific matrix C 0 to become an equilibrium, then the elements of H(Z) must satisfy { ( C )} ( C ) If in particular we like the following eight matrices C 0 ± c 0 0 ± c 1 2 =, 0 ± c 2 ± c1 0 to become equilibriums, then { } E h S( n) = E q S( n) = 0 0 0 { } { ( )} ( ) i i i i j j E h ± cs = E q ± cs, ± c s = 0 13
Imposing symmetric behavior For every trajectory converging to one limit we require existence of symmetric trajectories converging to the other seven limits with the same probability. hz ( 1) qz ( 2, z1) H( z1, z2) = qz ( 1, z2) hz ( 2) h( z) = h( z) q( z, z ) = q( z, z ) = q( z, z ) 1 2 1 2 1 2 E h( c s ) = E h( c s ) = 0 { } { } 1 1 2 2 E qcs (, cs) = E qcs (, cs) = 0 (for free) { } { } 1 1 2 2 2 2 1 1 14
Necessary conditions 1. H( z1, z2) 2. h( z) = h( z) hz ( ) qz (, z) 1 2 1 = qz ( 1, z2) hz ( 2) 3. q( z, z ) = q( z, z ) = q( z, z ) 1 2 1 2 1 2 4. E h( c s ) = E h( c s ) = 0 (existenceof c, c ) { } { } 1 1 2 2 1 2 2 z1 1 zz 1 2+ g( z1) z2 g( z2) z 1 H( z1, z2) = 2 zz 1 2 g( z1) z2+ g( z2) z1 z2 1 15
Theorem 2: If the sources s i (n) are random variables, with symmetric bivariate densities and no bivariate density is of the form f s s β Ks K s 2 2 ij( i, j) = ( i i + j j ) that is, no bivariate density exhibits either elliptical or hyperbolic symmetry (with respect to the two axis), then the generalized algorithm has exactly the same convergence properties for the dependent class, as the existing algorithm for the independent class. 16
Quadrantally symmetric bivariate pdfs 17
Example Existing scheme H( z, z ) 1 2 2 z1 1 zz 1 2+ g( z1) z2 g( z2) z 1 = 2 zz 1 2 gz ( 1) z2+ gz ( 2) z1 z2 1 Select H z 1 zzsgn( z ) = 2 1 1 2 2 1 z2 2 z2z1sgn( z1) z2 1 ( z, ) There is no whitening involved!!! 18
2 2 2 2 2 2 2 2 1 /2 2/2 σ 1 /2 σ 2/2 s s s a s a e e e e f( s1, s2) = 0.5 + 0.5 2π 2πσ 2πaσ 2πa 2 2 2 2 If σ 1 the two sources are separable If σ = 1 the two sources are not separable a=1.5, σ =0.8 19
2 1.5 a=1.5, σ =0.8 1 0.5 0-0.5-1 -1.5-2 -2.5-3 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 x 10 5 20
2 1.5 a=1.5, σ =1 1 0.5 0-0.5-1 -1.5-2 -2.5-3 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 x 10 5 21
Performance measure Define a performance measure for the algorithmic class C( n) = C( n 1) µ H C( n 1) S( n) C( n 1) ( ) with H(z 1, z 2 ) satisfying the five necessary conditions. For performance there are two important quantities: Rate of convergence Steady state estimation error power Both quantities depend on µ. Combine the two quantities to define a fair performance measure. 22
5 2 ( ) 2 C( n) E C( ) = C ( n) E C ( ) { } { } ij ij i, j 0 Estimation Error Power in db -5-10 -15-20 -25-30 -35 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 Time n 23
2 ( ) 2 C( n) E C( ) = C ( n) E C ( ) { } { } ij ij i, j Estimation Error Power in db 0-5 -10-15 -20 Adjust the step sizes µ so Classical that the algorithms Optimum have the same steady state error power, then compare their convergence rates. -25 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 n x 10 4 24
It is possible to define an analytic performance measure that can characterize an adaptive algorithm. ( hz qz ) 1 z2 fij si sj Efficacy = function ( ), (, ), (, ) The important properties of the Efficacy are the following When comparing Efficacies of two algorithms, it is equivalent to comparing the two algorithms fairly!! The optimum algorithm is the one that maximizes the Efficacy over h(z), q(z 1,z 2 ). 25
Optimum adaptive algorithms When the bivariate pdfs are different the optimization problem is still open. When the sources are independent with the same pdf f (z), then hz ( ) = a zlz ( ) 1 ( ) 1 1 1 qz (, z) = bzlz ( ) + czlz ( ) 1 2 1 2 2 1 where a, b, c, proper constants l ' f ( z ) ( z ) =, f ( z ) Existing scheme ( ) h( z) = z 1, q( z, z) = zz + zg( z) zg( z) 2 1 1 1 2 1 2 1 2 2 1 Maximize Efficacy with respect to g(z) qz (, z) = zz + d zlz ( ) zlz ( ) ( ) 1 2 1 2 1 2 2 1 26
Conclusion We have presented a rich class of adaptive algorithms that can solve the blind source separation problem for dependent sources with symmetries. We have identified the sources that cannot be separated with the specific combination of source statistics and algorithmic model. We have proposed an analytic performance measure that can be optimized to yield the fastest converging algorithm in the equidistributed source case. 27
Questions please? F i N 28