r - 在 R 中,在 {boot} 中使用 boot() 函数对聚集数据最高级别的非参数化

  显示原文与译文双语对照的内容
52 3

我有两级分层数据,在最高级别尝试非参数的Bootstrap 采样,在保持原有的集群数据不变的情况下,随机采样最高级别的集群,以替换原始的。

我想使用 {boot} 包中的boot() 函数来实现这一点,因为我想使用 boot.ci() 构建a 间隔,需要引导对象。

下面是我不幸的尝试- 在启动调用上运行调试表明随机抽样不会在集群级( =subject ) 中发生。


### create a very simple two-level dataset with 'subject' as clustering variable



rho <- 0.4


dat <- expand.grid(


 trial=factor(1:5),


 subject=factor(1:3)


 )


sig <- rho * tcrossprod(model.matrix(~ 0 + subject, dat))


diag(sig) <- 1


set.seed(17); dat$value <- chol(sig) %*% rnorm(15, 0, 1)



### my statistic function (adapted from here: http://biostat.mc.vanderbilt.edu/wiki/Main/HowToBootstrapCorrelatedData)



resamp.mean <- function(data, i){


 cluster <- c('subject', 'trial')



 # sample the clustering factor


 cls <- unique(data[[cluster[1]]])[i] 



 # subset on the sampled clustering factors


 sub <- lapply(cls, function(b) subset(data, data[[cluster[1]]]==b)) 



 sub.2 <- do.call(rbind, sub) # join and return samples


 mean((sub.2$value)) # calculate the statistic


}



debugonce(boot)


set.seed(17); dat.boot <- boot(data = dat, statistic = resamp.mean, 4)



### stepping trough the debugger until object 'i' was assigned


### investigating 'i'


# Browse[2]> head(i)



 [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15]


[1,] 3 7 12 13 10 14 14 15 12 12 12 4 5 9 10


[2,] 15 9 3 13 4 10 2 4 6 11 10 4 9 4 3


[3,] 8 4 7 15 10 12 9 8 9 12 4 15 14 10 4


[4,] 12 3 1 15 8 13 9 1 4 13 9 13 2 11 2



### which is not what I was hoping for.



### I would like something that looks like this, supposing indices = c(2, 2, 1) for the first resample: 



 [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15]


[1,] 6 7 8 9 10 6 7 8 9 10 1 2 3 4 5



任何帮助都是非常感激的。

时间: 原作者:

51 2

我认为问题源于修改后的统计函数( 。函数中的cls 对象) 。 你能试试这个? 取消注释 print 语句以查看哪些主题已经被采样。 它不使用 boot 期望的index 参数,而是在原始函数中使用 sample


resamp.mean <- function(dat, 


 indices, 


 cluster = c('subject', 'trial'), 


 replace = TRUE){


 # boot expects an indices argument but the sampling happens


 # via sample() as in the original source of the function



 # sample the clustering factor


 cls <- sample(unique(dat[[cluster[1]]]), replace=replace)



 # subset on the sampled clustering factors


 sub <- lapply(cls, function(b) subset(dat, dat[[cluster[1]]]==b))



 # join and return samples


 sub <- do.call(rbind, sub)



 # UNCOMMENT HERE TO SEE SAMPLED SUBJECTS 


 # print(sub)



 mean(sub$value)


} 



计算 value 平均值之前的resamp.mean 函数的重采样如下所示:


 trial subject value


1 1 1 -1.1581291


2 2 1 -0.1458287


3 3 1 -0.2134525


4 4 1 -0.5796521


5 5 1 0.6501587


11 1 3 2.6678441


12 2 3 1.3945740


13 3 3 1.4849435


14 4 3 0.4086737


15 5 3 1.3399146


111 1 1 -1.1581291


121 2 1 -0.1458287


131 3 1 -0.2134525


141 4 1 -0.5796521


151 5 1 0.6501587 



...