使受这些约束的组方差总和最小化的动态程序是简单且相当快的,特别是对于如此狭窄的组大小范围。它重现了发布的解决方案。

数据被绘制为点符号。这些组采用颜色编码并由垂直线分隔。组均值绘制为水平线。
注释R
代码如下。它递归地计算解决方案,通过在执行过程中缓存结果来提高效率。该程序通过在所有可行的长度窗口中搜索从index开始,找到(并记录)从indexcluster(x,i)
开始的最佳解决方案。它返回到目前为止找到的最佳值(并且,在全局变量中,留下了每个组开始的索引的指示符)。它可以在几秒钟内处理数千个元素的数组,具体取决于范围的大小。对于较大的问题,必须改进它以包括一些分支定界启发式算法以限制搜索量。i
x
n.min
n.max
i
cache$Breaks
n.max-n.min
#
# Univariate minimum-variance clustering with constraints.
# Requires a global data structure `cache`.
#
cluster <- function(x, i) {
#
# Cluster x[i:length(x)] recursively.
# Begin with the terminal cases.
#
if (i > cache$Length) return(0) # Nothing to process $
cache$Breaks[i] <<- FALSE # Unmark this break $
if (i + cache$n.min - 1 > cache$Length) return(Inf)# Interval is too short
if (!is.na(v <- cache$Cache[i])) return(v) # Use the cached value $
n.min <- cache$n.min + i-1 # Start of search $
n.max <- min(cache$n.max + i-1, cache$Length) # End of search
if (n.max < n.min) return(0) # Prevents `R` errors
#
# The recursion: accumulate the best total within-group variances.
# To implement other objective functions, replace `var` by any measure of
# within-group homogeneity.
#
values <- sapply(n.min:n.max, function(k) var(x[i:k]) + cluster(x, k+1))
#
# Find and store the best result.
#
j <- which.min(values)
cache$Breaks[n.min + j] <<- TRUE # Mark this as a good break $
cache$Cache[i] <<- values[j] # Cache the result $
return(values[j]) # Pass it to the caller
}
#
# The data.
#
x <- c(3,2,1,3,4,5,0,0,0,1,2,3,2,8,9,10,9,8,2,3,4,9,5,3)
#
# Initialize `cache` to specify the constraints; and run the clustering.
#
system.time({
n <- length(x)
cache <- list(n.min=4, n.max=10, # The length constraints
Cache=rep(NA, n), # Values already found
Breaks=rep(FALSE, n+1), # Group start indexes
Length=n) # Cache size
cluster(x, 1) # I.e., process x[1:n]
cache$Breaks[1] <- TRUE # Indicate the start of the first group $
})
#
# Display the results.
#
breaks <- (1:(n+1))[cache$Breaks] # Group start indexes $
groups <- cumsum(cache$Breaks[-(n+1)]) # Group identifiers
averages <- tapply(x, groups, mean) # Group summaries
colors <- terrain.colors(max(groups)) # Group plotting colors
plot(x, pch=21, bg=colors[groups], ylab="Rating")
abline(v = breaks-1/2, col="Gray")
invisible(mapply(function(left, right, height, color) {
lines(c(left, right)-1/2, c(height, height), col=color, lwd=2)
}, breaks[-length(breaks)], breaks[-1], averages, colors))