K means ++ and K means ||

下载 0

快召唤伙伴们来围观吧
微博 QQ QQ空间 贴吧
文档嵌入链接
<iframe src="https://www.slidestalk.com/u41/k_means_and_k_means_6cnlih?embed" frame border="0" width="640" height="360" scrolling="no" allowfullscreen="true">复制
微信扫一扫分享
已成功复制到剪贴板

海绵宝宝

发布于

7年前

1915

人观看

#信息技术

No description currently

展开查看详情

1 .K means ++ and K means Parallel Jun Wang

2 .Review of K means Simple and fast Choose k centers randomly Class points to its nearest center Update centers

3 .K means ++ Actually you are using it Spend some time on choosing k centers(seeding) Save time on clustering

4 .K means ++ algorithm Seeding Choose a center from X randomly For k-1 times Sample one center each time from X with probability p Update center matrix Clustering

5 .d i 2 =min( euclidean distance b/t Xi to each Ci )

6 .How to choose K centers

7 .Choose a point from X randomly

8 .Calculate all d i 2

9 .Calculate Pi D=d 1 2 +d 2 2 +d 3 2 + …+ d n 2 P i =d i 2 / D ∑P i =1 Points further away from red point have better chance to be chosen

10 .Pick up point with probability p

11 .Keep doing the following: Update center matrix Calculate d i 2 Calculate pi Until k centers are found

12 .K means || algorithm Seeding Choose a small subset C from X Assign weight to points in C Cluster C and get k centers Clustering

13 .Choose subset C from X Let D=Sum of square distance=d 1 2 +d 2 2 +d 3 2 + …+ d n 2 Let L be f(k) like 0.2k or 1.5k for ln(D) times Pick up each point in X using Bernoulli distribution P(chosen)= L *d i 2 /D Update the C

14 .How many data in C?

15 .How many data in C? Ln(D) iterations Each iteration there suppose to be 1*P1+1*P2+ …+1*Pn =L points T otal Ln(D)*L points

16 .Cluster the subset C Red points are in subset C

17 .Cluster the sample C Calculate distances between point A to other points in C, and find the smallest distance In this case , d_c 1

18 .Cluster the sample C Calculate distances between point A and all points in X, and get d_x i

19 .Cluster the sample C Compare d_x i to d_c 1 , and let W A =number of d_x i < d_c 1 Then we get weight matrix, W Cluster W into k clusters, get k centers

20 .Difference among three methods K means K means ++ K means || seeding choose k centers randomly Choose k centers proportionally Choose subset C and get k centers from C clustering

21 .Hypothesis K means K means ++ K means || seeding choose k centers randomly fast Choose k centers proportionally slow Choose subset C and get k centers from C slower clustering

22 .Hypothesis K means K means ++ K means || seeding choose k centers randomly fast Choose k centers proportionally slow Choose subset C and get k centers from C slower clustering slow fast faster

23 .Test Hypothesis Toy data one – very small Cloud data – small Spam data – moderate Toy data two – very large

24 .Simple data set N=100; d=2; k=2 ; Iteration=100

25 .Executive time

26 .Cloud data consists of 1024 points in 10 dimension k=6

27 .

28 .Executive time (in seconds)

29 .Total scatter

1点赞

0收藏

0下载