optimization - Optimizing clustering in Python -


i wrote own clustering algorithm (bad, know) problem. works well, work faster.

algorithm takes list of values (1d) in input, , works this:

  1. for each cluster, calculate distance closest neighbor cluster
  2. select cluster has smallest distance neighbor b
  3. if distance between , b less threshold, return
  4. combine , b
  5. goto 1.

i reinvented wheel here..

this brute foce code, how make faster? i've scipy , numpy installed, if there's ready made

#cluster center simple average value def cluster_center(cluster):   return sum(cluster) / len(cluster)  #distance between clusters def cluster_distance(a, b):   return abs(cluster_center(a) - cluster_center(b))  while true:   cluster_distances = []    #if nothing cluster, ready   if len(clusters) < 2:     break    #go thru clusters, calculate shortest distance neighbor     cluster in clusters:     cluster_distances.append((cluster, sorted([(cluster_distance(cluster, c), c) c in clusters if c != cluster])[0]))    #find out closest pair    cluster_distances.sort(cmp=lambda a,b:cmp(a[1], b[1]))    #check if distance under threshold 15   if cluster_distances[0][1][0] < 15:      = cluster_distances[0][0]      b = cluster_distances[0][1][1]      #combine clusters (combine lists)      a.extend(b)       #form new cluster list      clusters = [c[0] c in cluster_distances if c[0] != b]   else:     break 

usually, term "cluster analysis" used multi-variate partitions. because in 1d, can sort data, , solve of these problems easier way.

so speed approach, sort data! , reconsider need do.

as more advanced method: kernel density estimation, , local minima splitting points.


Comments

Popular posts from this blog

linux - Does gcc have any options to add version info in ELF binary file? -

android - send complex objects as post php java -

charts - What graph/dashboard product is facebook using in Dashboard: PUE & WUE -