December 3, 2014

条件付き推測木のサンプルデータ生成と推定

条件付き推測木(決定木)で生存時間解析

寿命分布が異なるように説明変数で分割する。

詳細はhttp://cran.r-project.org/web/packages/party/party.pdfで。

データ生成

set.seed(0)

n <- 1000

x <- rbinom(n, size=1, prob=1/3)

data.df <- data.frame(x = x,

                      tt = ifelse(x==0, rweibull(n, scale=200, shape=2), rweibull(n, scale=300, shape=3)),

                      status = rep(1,n)

                      )

推定

> library(party)

> (model <- ctree(Surv(tt, status)~x, data=data.df))



     Conditional inference tree with 2 terminal nodes



Response:  Surv(tt, status) 

Input:  x 

Number of observations:  1000 



1) x <= 0; criterion = 1, statistic = 150.161

  2)*  weights = 664 

1) x > 0

  3)*  weights = 336 

> data.df$node <- predict(model, type="node")

確認

> fitdistr(data.df[data.df$node==2,]$tt, "weibull")

      shape          scale    

    1.98999386   202.76429299 

 (  0.06034834) (  4.16490819)

Warning message:

In densfun(x, parm[1], parm[2], ...) : NaNs produced

> fitdistr(data.df[data.df$node==3,]$tt, "weibull")

     shape        scale   

    2.974397   306.660949 

 (  0.127542) (  5.919310)

© gepuro 2013