## 展开查看详情

1.Regularized Discriminant Analysis and Reduced-Rank LDA Regularized Discriminant Analysis and Reduced-Rank LDA Jia Li Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu http://www.stat.psu.edu/∼jiali Jia Li http://www.stat.psu.edu/∼jiali

2.Regularized Discriminant Analysis and Reduced-Rank LDA Regularized Discriminant Analysis A compromise between LDA and QDA. Shrink the separate covariances of QDA toward a common covariance as in LDA. Regularized covariance matrices: ˆ k (α) = αΣ Σ ˆ k + (1 − α)Σ ˆ. The quadratic discriminant function δk (x) is defined using the ˆ k (α). shrunken covariance matrices Σ The parameter α controls the complexity of the model. Jia Li http://www.stat.psu.edu/∼jiali

3.Regularized Discriminant Analysis and Reduced-Rank LDA Jia Li http://www.stat.psu.edu/∼jiali

4.Regularized Discriminant Analysis and Reduced-Rank LDA Computations for LDA Discriminant function: 1 ˆ k | − 1 (x − µk )T Σ ˆ −1 (x − µk ) + log πk . δk (x) = − log |Σ k 2 2 Eigen-decomposition of Σ ˆk: Σ ˆ k = Uk Dk UT . Dk is diagonal k with elements dkl , l = 1, 2, ..., p. Uk is p × p orthonormal. (x − µ ˆ −1 (x − µ ˆk )T Σ ˆk ) k T −1 = [UT T k (x − µk )] Dk [Uk (x − µk )] −1 −1 = [Dk 2 UT T T k (x − µk )] [Dk Uk (x − µk )] 2 ˆk| = log |Σ log dkl . l Jia Li http://www.stat.psu.edu/∼jiali

5.Regularized Discriminant Analysis and Reduced-Rank LDA ˆ = UDUT : LDA, Σ 1 1 Sphere the data D− 2 UT X → X ∗ and D− 2 UT µk → µ∗k . For the transformed data and class centroids, classify x ∗ to the closest class centroid in the transformed space, modulo the effect of the class prior probabilities πk . Jia Li http://www.stat.psu.edu/∼jiali

6.Regularized Discriminant Analysis and Reduced-Rank LDA The geometric illustration of LDA. Left: Original data in the two classes. The ellipsis represent the two estimated covariance matrices. Right: The class mean removed data and the estimated common covariance matrix. Jia Li http://www.stat.psu.edu/∼jiali

7.Regularized Discriminant Analysis and Reduced-Rank LDA The geometric illustration of LDA. Left: The sphered mean removed data. Right: The sphered data in the two classes, the sphered means, and the decision boundary. Jia Li http://www.stat.psu.edu/∼jiali

8.Regularized Discriminant Analysis and Reduced-Rank LDA Reduced-Rank LDA Binary classification Decision boundary is given by the following linear equation: π1 1 log − (µ1 + µ2 )T Σ−1 (µ1 − µ2 ) π2 2 +x T Σ−1 (µ1 − µ2 ) = 0 . Only the projection of X on the direction Σ−1 (µ1 − µ2 ) matters. If the data are sphered, only the projection of X ∗ on µ∗1 − µ∗2 is needed. Jia Li http://www.stat.psu.edu/∼jiali

9.Regularized Discriminant Analysis and Reduced-Rank LDA Suppose data are sphered. The subspace spanned by the K centroids is of rank K − 1, denoted by HK −1 . Data can be viewed in HK −1 without losing any information. When K > 3, we might want to find a subspace HL ⊆ HK −1 optimal for LDA in some sense. Jia Li http://www.stat.psu.edu/∼jiali

10.Regularized Discriminant Analysis and Reduced-Rank LDA Optimization Criterion Fisher’s optimization criterion: the projected centroids were spread out as much as possible comparing with variance. Find the linear combination Z = aT X such that the between-class variance is maximized relative to the within-class variance, where a = (a1 , a2 , ..., ap )T . Assume the within-class covariance matrix of X is W, i.e., the common covariance matrix of the classes. Jia Li http://www.stat.psu.edu/∼jiali

11.Regularized Discriminant Analysis and Reduced-Rank LDA The between-class covariance matrix is B. Suppose µk is a column vector denoting the mean vector of class k. K µ = πk µk k=1 K B = πk (µk − µ)(µk − µ)T k=1 Note πk is the percentage of class k samples in the entire data set. Jia Li http://www.stat.psu.edu/∼jiali

12.Regularized Discriminant Analysis and Reduced-Rank LDA Jia Li http://www.stat.psu.edu/∼jiali

13.Regularized Discriminant Analysis and Reduced-Rank LDA For the linear combination Z , the between-class variance is aT Ba and the within-class variance is aT Wa. Fisher’s optimization becomes aT Ba max . a aT Wa T . Eigen-decomposition of W = VW DW VW 1 1 1 1 W = (W 2 )T W 2 , where W 2 = DW 2 T . VW 1 1 Define b = W 2 a, then a = W− 2 b. The optimization becomes 1 1 b T (W− 2 )T BW− 2 b max b bT b 1 1 Define B∗ = (W− 2 )T BW− 2 . Jia Li http://www.stat.psu.edu/∼jiali

14.Regularized Discriminant Analysis and Reduced-Rank LDA Eigen-decomposition of B∗ = V∗ DB V∗T . V∗ = (v1∗ , v2∗ , ..., vp∗ ). The maximization is achieved by b = v1∗ , the first eigen vector of B∗ . Similarly, one can find the next direction b2 = v2∗ that is orthogonal to b1 = v1∗ and maximizes b2T B∗ b2 /b2T b2 . 1 Since a = W− 2 b, convert to the original problem, 1 al = W− 2 vl∗ . The al (also denoted as vl in the textbook) are referred to as discriminant coordinates or canonical variates. Jia Li http://www.stat.psu.edu/∼jiali

15.Regularized Discriminant Analysis and Reduced-Rank LDA Summarization on obtaining discriminant coordinates: Find the centroids for all the classes. Find between-class covariance matrix B using the centroid vectors. ˆ Find within-class covariance matrix W, i.e., Σ. By eigen-decomposition 1 1 1 1 W = (W 2 )T W 2 = (DW 2 T T VW ) DW 2 T VW . Compute 1 1 −1 −1 B∗ = (W− 2 )T BW− 2 = DW 2 VW T BVW DW 2 . Eigen-decomposition of B∗ : B∗ = V∗ DB V∗T . 1 The discriminant coordinates are: al = W− 2 vl∗ . Jia Li http://www.stat.psu.edu/∼jiali

16.Regularized Discriminant Analysis and Reduced-Rank LDA Simulation Three classes with equal prior probabilities 1/3. Input is two dimensional. The class conditional density of X is a normal distribution. 1.0 0.0 The common covariance matrix Σ = . 0.0 1.0 The three mean vectors are: 0 −3 −1 µ1 = µ2 = µ3 = 0 2 −3 Total of 450 samples are drawn with 150 in each class for training. Another set of 450 samples are drawn with 150 in each class for testing. Jia Li http://www.stat.psu.edu/∼jiali

17.Regularized Discriminant Analysis and Reduced-Rank LDA The scatter plot of the test data. Red: class 1. Blue: class 2. Magenta: class 3. Jia Li http://www.stat.psu.edu/∼jiali

18.Regularized Discriminant Analysis and Reduced-Rank LDA LDA Result Priors: π ˆ1 = π ˆ3 = 150 ˆ2 = π 450 = 0.3333. The three mean vectors are: −0.0757 −2.8310 −0.9992 µ ˆ1 = µ ˆ2 = µ ˆ3 = −0.0034 1.9847 −2.9005 ˆ= 0.9967 0.0020 Estimated covariance matrix: Σ . 0.0020 1.0263 Decision boundaries: Between class 1 (red) and 2 (blue): 5.9480 + 2.7684X1 − 1.9427X2 = 0 . Between class 1 (red) and 3 (magenta): 4.5912 + 0.9209X1 + 2.8211X2 = 0 . Between class 2 (blue) and 3 (magenta): −1.3568 − 1.8475X1 + 4.7639X2 = 0 . Jia Li http://www.stat.psu.edu/∼jiali

19.Regularized Discriminant Analysis and Reduced-Rank LDA Classification error rate on the test data set: 7.78%. Jia Li http://www.stat.psu.edu/∼jiali

20.Regularized Discriminant Analysis and Reduced-Rank LDA Discriminant Coordinates Between-class covariance matrix: 1.3111 −1.3057 B= . −1.3057 4.0235 Within-class covariance matrix: 0.9967 0.0020 W= . 0.0020 1.0263 1 −0.0686 −1.0108 W2 = . 0.9960 −0.0676 1 1 3.7361 1.4603 B∗ = (W− 2 )T BW− 2 = . 1.4603 1.5050 Jia Li http://www.stat.psu.edu/∼jiali

21.Regularized Discriminant Analysis and Reduced-Rank LDA Eigen-decomposition of B∗ : B∗ = V∗ DB V∗T 0.8964 0.4432 V∗ = 0.4432 −0.8964 4.4582 0 DB = . 0 0.7830 Jia Li http://www.stat.psu.edu/∼jiali

22.Regularized Discriminant Analysis and Reduced-Rank LDA The two discriminant coordinates are: 1 −0.0668 0.9994 0.8964 v1 = W− 2 v1∗ = −0.9848 −0.0678 0.4432 0.3831 = −0.9128 1 −0.9255 v2 = W− 2 v2∗ = −0.3757 Project data onto v1 and classify using only this 1-D data. The projected data are xiT v1 , i = 1, ..., N. Jia Li http://www.stat.psu.edu/∼jiali

23.Regularized Discriminant Analysis and Reduced-Rank LDA Solid line: first DC. Dash line: second DC. Jia Li http://www.stat.psu.edu/∼jiali

24.Regularized Discriminant Analysis and Reduced-Rank LDA Projection on the First DC Projection of the training data on the first discriminant coordinate. Jia Li http://www.stat.psu.edu/∼jiali

25.Regularized Discriminant Analysis and Reduced-Rank LDA Perform LDA on the projected data. The classification rule is: 1 −1.4611 ≤ x T v1 ≤ 1.1195 ˆ G (x) = 2 x T v1 ≤ −1.4611 3 x T v1 ≥ 1.1195 Error rate on the test data: 12.67%. Jia Li http://www.stat.psu.edu/∼jiali

26.Regularized Discriminant Analysis and Reduced-Rank LDA Principal Component Direction Find the input matrix of X , or do singular value decomposition of mean removed X , to find the principal component directions. Denote the covariance matrix by T: 2.3062 −1.3066 T= . −1.3066 5.0542 T: Eigen-decomposition of T = VT DT VT 0.3710 −0.9286 5.5762 0 VT = DT = −0.9286 −0.3710 0 1.7842 Jia Li http://www.stat.psu.edu/∼jiali

27.Regularized Discriminant Analysis and Reduced-Rank LDA Solid line: first PCD. Dash line: second PCD. Jia Li http://www.stat.psu.edu/∼jiali

28.Regularized Discriminant Analysis and Reduced-Rank LDA Results Based on the First PC Projection of data on the first PC. The boundaries between classes are shown. Jia Li http://www.stat.psu.edu/∼jiali

29.Regularized Discriminant Analysis and Reduced-Rank LDA Perform LDA on the projected data. The classification rule is: 1 −1.4592 ≤ x T v1 ≤ 1.1489 ˆ G (x) = 2 x T v1 ≤ −1.4592 3 x T v1 ≥ 1.1489 Error rate on the test data: 13.11%. Jia Li http://www.stat.psu.edu/∼jiali