从基因组学到NLP——一种将它们全部规则化的算法

从基因组学到NLP——一种将它们全部规则化的算法。奇异值分解(SVD)是一种矩阵分解技术,在遗传学、自然语言处理(NLP)、社会网络分析等领域有着广泛的应用。所有这些应用领域都产生了具有数百万行和特征的非常大的矩阵。
展开查看详情

1.From Genomics to NLP: One algorithm to rule them all Santi Adavani/Vinay Rao Rocketml.net @rocketml #AI2SAIS

2.#AI2SAIS

3.Outline • Introduction • Making sense of SVD • One algorithm to rule them all • Scale out SVD #AI2SAIS

4.#AI2SAIS

5.Singular Value Decomposition (SVD) Σ is a diagonal matrix 𝑈, 𝑉 𝑎𝑟𝑒 𝑜𝑟𝑡ℎ𝑜𝑛𝑜𝑟𝑚𝑎𝑙 with singular values #AI2SAIS

6.Others in the same family Principal component analysis (PCA) Eigenvalue decomposition Latent Semantic Indexing (LSI) Latent Semantic Analysis (LSA) #AI2SAIS

7.Outline • Introduction • Making sense of SVD • One algorithm to rule them all • Scale out SVD #AI2SAIS

8.Making sense of SVD #AI2SAIS

9.Dimensionality reduction 𝑈 Σ 𝑉′ E UABCDACC ΣACCDACC 𝑉ACCDACC #AI2SAIS

10.Reconstruct the matrix 𝑈 Σ 𝑉′ E UABCDG ΣGDG 𝑉GDACC #AI2SAIS

11.Singular values and cumulative sum #AI2SAIS

12. E E UABCDH ΣHDH 𝑉HDACC UABCDI ΣIDI 𝑉IDACC 𝐴 #AI2SAIS

13. E E 𝐴 UHC ΣHCDHC 𝑉HC UKCC ΣKCCDKCC 𝑉KCC 180000 17020 85100 9.4% 47% #AI2SAIS

14. E UHCC ΣHCCDHCC 𝑉HCC 180000 170200 94% #AI2SAIS

15.Key points If eigenvalues Drop are decaying Identify Reduce components fast then there components dimension that are noisy is scope for that contribute without losing or do not dimensionality the most information contribute reduction #AI2SAIS

16.Outline • Introduction • Making sense of SVD • One algorithm to rule them all • Scale out SVD #AI2SAIS

17.Supervised Unsupervised Inverse Learning Learning problems #AI2SAIS

18.Supervised Unsupervised Inverse Learning Learning problems #AI2SAIS

19. Supervised learning 𝑁 𝑓𝑒𝑎𝑡𝑢𝑟𝑒𝑠 𝑀 𝑙𝑎𝑏𝑒𝑙𝑠 𝑀 𝑠𝑎𝑚𝑝𝑙𝑒𝑠 𝐴 𝑦 #AI2SAIS

20. 𝑛 ≪ 𝑁 𝑠𝑖𝑛𝑔𝑢𝑙𝑎𝑟 𝑓𝑒𝑎𝑡𝑢𝑟𝑒𝑠 𝐴 = UΣ𝑉′ 𝑀 𝑠𝑎𝑚𝑝𝑙𝑒𝑠 𝑀 𝑙𝑎𝑏𝑒𝑙𝑠 𝑦 𝐴′ = 𝐴𝑉 𝐴′ Use the new matrix A’ to solve supervised learning problems using SVM, Logistic Regression, Decision Trees, Neural Networks etc. #AI2SAIS

21.Supervised Unsupervised Inverse Learning Learning problems #AI2SAIS

22.Unsupervised learning 𝑁 𝑓𝑒𝑎𝑡𝑢𝑟𝑒𝑠 𝑀 𝑠𝑎𝑚𝑝𝑙𝑒𝑠 𝐴 #AI2SAIS

23. 𝑛 ≪ 𝑁 𝑠𝑖𝑛𝑔𝑢𝑙𝑎𝑟 𝑓𝑒𝑎𝑡𝑢𝑟𝑒𝑠 𝐴 = UΣ𝑉′ 𝑀 𝑠𝑎𝑚𝑝𝑙𝑒𝑠 𝐴′ = 𝐴𝑉 𝐴′ Use the new matrix A’ for clustering, nearest-neighbors, anomaly detection #AI2SAIS

24.SVD on streaming data Randomized SVD Σ 𝑉′ 𝐵 −> 2𝑛 𝑥 𝑁 𝑈 𝐴 → 𝑀 𝑥 𝑁 Randomized 𝑛 K SVD 𝐵 = 𝛼𝑉′ 𝑛 K 𝐵 = 𝛼𝑉′ 𝑛 n rows from A 𝑛 Zeros #AI2SAIS

25.#AI2SAIS

26.Anomaly detection Anomaly #AI2SAIS

27.Steps Create a data matrix A using normal images and compute SVD Image 1 Image 2 Image N #AI2SAIS

28.Anomaly Score For a new image y compute anomaly score = | 𝐼 − 𝑉𝑉 E 𝑦 | Intuition: Find the closest point to y that can be formed as a linear combination of vectors in V #AI2SAIS

29.Anomaly detection Randomized SVD Σ 𝑉′ 𝐵 −> 2𝑛 𝑥 𝑁 𝑈 𝐴 → 𝑀 𝑥 𝑁 Randomized 𝑛 K SVD 𝐵 = 𝛼𝑉′ 𝑛 K 𝐵 = 𝛼𝑉′ 𝑛 n rows from A 𝑛 Zeros Anomaly #AI2SAIS