Research Article

An Efficient MapReduce-Based Parallel Clustering Algorithm for Distributed Traffic Subarea Division

Algorithm 2

Combiner(key 1, medi).
Input:
: the index of the cluster,
 medi: the list of the samples assigned to the same cluster.
Output: ,
: the index of the cluster,
: the sum of the values of the samples belonging to the same cluster and the number of samples.
(1)Construct a counter num_s to record the number of samples in the same cluster;
(2)Construct an array sum_v to record the sum of the values of different dimensions of the samples belonging
 to the same cluster (i.e., the samples in the list medi);
(3)Construct the sample examples to extract the data objects from medi.next(), and the dimensions to obtain
 the dimension of the original data object;
(4) num_s = 0;
(5)while (medi.hasNext()) do
(6)  CurrentPoint = medi.next();
(7)  num_s++;
(8)  for   to dimensions  do
(9)   sum_v[]+ = CurrentPoint.point[];
(10)    //Calculate the sum of the values of each dimension of examples
(11) end for
(12) for   to dimensions  do
(13)  mean[] = sum_v[]/num_s;
(14)  //Compute the mean value of the samples for each cluster
(15) end for
(16) end while
(17) index = ;
(18) Construct as a string containing the sum of the values of each dimension sum_v[] and
the number of samples num_s;
(19) return pairs;