Research Article
An Efficient MapReduce-Based Parallel Clustering Algorithm for Distributed Traffic Subarea Division
Input: | : the index of the cluster, | medi: the list of the local sums from different clusters. | Output: , | : the index of the cluster, | : the new cluster center. | (1) Construct a counter Num to record the total number of samples belonging to the same cluster; | (2) Construct an array sum_v to record the sum of the values of different dimensions of the samples | in the same cluster (i.e., the samples in the list medi); | (3) Construct the sample examples to extract the data objects from medi.next(), and the dimensions | to obtain the dimension of the original data object; | (4) Num = 0; | (5) while (medi.hasNext()) do | (6) CurrentPoint = medi.next(); | (7) Num+ = num_s; | (8) for to dimensions do | (9) sum_v[]+ = CurrentPoint.point[]; | (10) end for | (11) for to dimensions do | (12) mean[] = sum_v[]/Num; | (13) //Obtain the new cluster center | (14) end for | (15) end while | (16) index = ; | (17) Construct value 3 as a string composed of the new cluster center; | (18) return pairs; |
|