数据生成过程是:
让是一个序列至长度和成为相应的因素. 采取所有可能的组合计算:
使用(未居中的)B样条基础对于每个级别通过统一属性分区(行总和为 1)将不可行。这样的模型将无法识别(即使没有截距)。
示例:(设置:5 个内部节点区间(均匀分布),2 次 B-Spline,spline
-function 是自定义的)
# drawing the sequence
n <- 100
x <- seq(-4,4,length.out=n)
z <- seq(-4,4,length.out=n)
d <- as.factor(0:1)
data <- CJ(x=x,z=z,d=d)
set.seed(100)
# setting up the model
data[,y := sin(x+I(d==0)) + sin(x+4*I(d==1)) + I(d==0)*z^2 + 3*I(d==1)*z^2 + rnorm(n,0,1)]
# creating the uncentered B-Spline-Basis for x and z
X <- data[,spline(x,min(x),max(x),5,2,by=d,intercept=FALSE)]
> head(X)
x.1d0 x.2d0 x.3d0 x.4d0 x.5d0 x.6d0 x.7d0 x.1d1 x.2d1 x.3d1 x.4d1 x.5d1 x.6d1 x.7d1
[1,] 0.5 0.5 0 0 0 0 0 0.0 0.0 0 0 0 0 0
[2,] 0.0 0.0 0 0 0 0 0 0.5 0.5 0 0 0 0 0
[3,] 0.5 0.5 0 0 0 0 0 0.0 0.0 0 0 0 0 0
Z <- data[,spline(z,min(z),max(z),5,2,by=d)]
head(Z)
z.1d0 z.2d0 z.3d0 z.4d0 z.5d0 z.6d0 z.7d0 z.1d1 z.2d1 z.3d1 z.4d1 z.5d1 z.6d1
[1,] 0.5000000 0.5000000 0.00000000 0 0 0 0 0.0000000 0.0000000 0.00000000 0 0 0
[2,] 0.0000000 0.0000000 0.00000000 0 0 0 0 0.5000000 0.5000000 0.00000000 0 0 0
[3,] 0.4507703 0.5479543 0.00127538 0 0 0 0 0.0000000 0.0000000 0.00000000 0 0 0
z.7d1
[1,] 0
[2,] 0
[3,] 0
# lm will drop one spline-column for each factor
lm(y ~ -1+X+Z,data=data)
Call:
lm(formula = y ~ -1 + X + Z, data = data)
Coefficients:
Xx.1d0 Xx.2d0 Xx.3d0 Xx.4d0 Xx.5d0 Xx.6d0 Xx.7d0 Xx.1d1 Xx.2d1 Xx.3d1 Xx.4d1 Xx.5d1
23.510 19.912 18.860 22.177 23.080 19.794 18.727 68.572 69.185 67.693 67.082 68.642
Xx.6d1 Xx.7d1 Zz.1d0 Zz.2d0 Zz.3d0 Zz.4d0 Zz.5d0 Zz.6d0 Zz.7d0 Zz.1d1 Zz.2d1 Zz.3d1
69.159 67.496 1.381 -11.872 -19.361 -21.835 -19.698 -11.244 NA -1.329 -38.449 -62.254
Zz.4d1 Zz.5d1 Zz.6d1 Zz.7d1
-69.993 -61.438 -39.754 NA
为了克服这个问题,Wood,Generalized Additive Models: An Introduction with R,第 163-164 页提出了总和(或均值)中心约束:
这可以通过重新参数化来完成,如果一个矩阵发现这样
-matrix 可以通过约束矩阵的 QR 分解找到.
注意是通过统一属性的划分。
我的 B-Spline-Matrix 的居中/约束版本是:
X <- data[,spline(x,min(x),max(x),5,2,by=d,intercept=TRUE)]
head(X)
x.1d0 x.2d0 x.3d0 x.4d0 x.5d0 x.6d0 x.1d1 x.2d1 x.3d1 x.4d1
[1,] 0.2271923 -0.3225655 -0.3225655 -0.3225655 -0.2728077 -0.05790256 0.0000000 0.0000000 0.0000000 0.0000000
[2,] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.00000000 0.2271923 -0.3225655 -0.3225655 -0.3225655
[3,] 0.2271923 -0.3225655 -0.3225655 -0.3225655 -0.2728077 -0.05790256 0.0000000 0.0000000 0.0000000 0.0000000
x.5d1 x.6d1
[1,] 0.0000000 0.00000000
[2,] -0.2728077 -0.05790256
[3,] 0.0000000 0.00000000
Z <- data[,spline(z,min(z),max(z),5,2,by=d,intercept=TRUE)]
head(Z)
z.1d0 z.2d0 z.3d0 z.4d0 z.5d0 z.6d0 z.1d1 z.2d1 z.3d1 z.4d1
[1,] 0.2271923 -0.3225655 -0.3225655 -0.3225655 -0.2728077 -0.05790256 0.0000000 0.0000000 0.0000000 0.0000000
[2,] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.00000000 0.2271923 -0.3225655 -0.3225655 -0.3225655
[3,] 0.2875283 -0.3066501 -0.3079255 -0.3079255 -0.2604260 -0.05527458 0.0000000 0.0000000 0.0000000 0.0000000
z.5d1 z.6d1
[1,] 0.0000000 0.00000000
[2,] -0.2728077 -0.05790256
[3,] 0.0000000 0.00000000
我的问题是:即使拟合非常相似,为什么我的受约束 B 样条列与 gam 提供的不同?我错过了什么?
# comparing with gam from mgcv
mod.gam <- gam(y~d+s(x,bs="ps",by=d,k=7)+s(z,bs="ps",by=d,k=7),data=data)
X.gam <- model.matrix(mod.gam)
head(X.gam)
(Intercept) d1 s(x):d0.1 s(x):d0.2 s(x):d0.3 s(x):d0.4 s(x):d0.5 s(x):d0.6 s(x):d1.1 s(x):d1.2
1 1 0 0.5465301 -0.05732768 -0.2351708 -0.2259983 -0.1201207 -0.01043987 0.0000000 0.00000000
2 1 1 0.0000000 0.00000000 0.0000000 0.0000000 0.0000000 0.00000000 0.5465301 -0.05732768
3 1 0 0.5465301 -0.05732768 -0.2351708 -0.2259983 -0.1201207 -0.01043987 0.0000000 0.00000000
s(x):d1.3 s(x):d1.4 s(x):d1.5 s(x):d1.6 s(z):d0.1 s(z):d0.2 s(z):d0.3 s(z):d0.4 s(z):d0.5
1 0.0000000 0.0000000 0.0000000 0.00000000 0.5465301 -0.057327680 -0.2351708 -0.2259983 -0.1201207
2 -0.2351708 -0.2259983 -0.1201207 -0.01043987 0.0000000 0.000000000 0.0000000 0.0000000 0.0000000
3 0.0000000 0.0000000 0.0000000 0.00000000 0.5471108 -0.031559945 -0.2302910 -0.2213227 -0.1176356
s(z):d0.6 s(z):d1.1 s(z):d1.2 s(z):d1.3 s(z):d1.4 s(z):d1.5 s(z):d1.6
1 -0.01043987 0.0000000 0.000000000 0.0000000 0.0000000 0.0000000 0.00000000
2 0.00000000 0.5465301 -0.057327680 -0.2351708 -0.2259983 -0.1201207 -0.01043987
3 -0.01022388 0.0000000 0.000000000 0.0000000 0.0000000 0.0000000 0.00000000
虚线对应我的适合,直线对应游戏版本