选取iris数据集,且对数据进行简单的探索和处理

  • 线性相关性可视化
  • 复制Petal.Length变量,且重命名为Petal.Length.Copy
  • 调整Petal.Length.Copy的input顺序

初步结论:

  • 针对高相关性(线性)变量并不敏感
  • 针对完全相关的变量会选择input顺序靠前的变量

3 可视化相关性

4 构架C5.0决策树

4.1 Raw

## 
## Call:
## C5.0.default(x = df_raw[-5], y = df_raw$Species)
## 
## 
## C5.0 [Release 2.07 GPL Edition]      Mon May 20 23:15:39 2019
## -------------------------------
## 
## Class specified by attribute `outcome'
## 
## Read 150 cases (5 attributes) from undefined.data
## 
## Decision tree:
## 
## Petal.Length <= 1.9: setosa (50)
## Petal.Length > 1.9:
## :...Petal.Width > 1.7: virginica (46/1)
##     Petal.Width <= 1.7:
##     :...Petal.Length <= 4.9: versicolor (48/1)
##         Petal.Length > 4.9: virginica (6/2)
## 
## 
## Evaluation on training data (150 cases):
## 
##      Decision Tree   
##    ----------------  
##    Size      Errors  
## 
##       4    4( 2.7%)   <<
## 
## 
##     (a)   (b)   (c)    <-classified as
##    ----  ----  ----
##      50                (a): class setosa
##            47     3    (b): class versicolor
##             1    49    (c): class virginica
## 
## 
##  Attribute usage:
## 
##  100.00% Petal.Length
##   66.67% Petal.Width
## 
## 
## Time: 0.0 secs

4.2 Copy

## 
## Call:
## C5.0.default(x = df_raw_copy[-5], y = df_raw_copy$Species)
## 
## 
## C5.0 [Release 2.07 GPL Edition]      Mon May 20 23:15:39 2019
## -------------------------------
## 
## Class specified by attribute `outcome'
## 
## Read 150 cases (6 attributes) from undefined.data
## 
## Decision tree:
## 
## Petal.Length <= 1.9: setosa (50)
## Petal.Length > 1.9:
## :...Petal.Width > 1.7: virginica (46/1)
##     Petal.Width <= 1.7:
##     :...Petal.Length <= 4.9: versicolor (48/1)
##         Petal.Length > 4.9: virginica (6/2)
## 
## 
## Evaluation on training data (150 cases):
## 
##      Decision Tree   
##    ----------------  
##    Size      Errors  
## 
##       4    4( 2.7%)   <<
## 
## 
##     (a)   (b)   (c)    <-classified as
##    ----  ----  ----
##      50                (a): class setosa
##            47     3    (b): class versicolor
##             1    49    (c): class virginica
## 
## 
##  Attribute usage:
## 
##  100.00% Petal.Length
##   66.67% Petal.Width
## 
## 
## Time: 0.0 secs

4.3 Sort

## 
## Call:
## C5.0.default(x = df_raw_copy_sort[-6], y = df_raw_copy_sort$Species)
## 
## 
## C5.0 [Release 2.07 GPL Edition]      Mon May 20 23:15:39 2019
## -------------------------------
## 
## Class specified by attribute `outcome'
## 
## Read 150 cases (6 attributes) from undefined.data
## 
## Decision tree:
## 
## Petal.Length.Copy <= 1.9: setosa (50)
## Petal.Length.Copy > 1.9:
## :...Petal.Width > 1.7: virginica (46/1)
##     Petal.Width <= 1.7:
##     :...Petal.Length.Copy <= 4.9: versicolor (48/1)
##         Petal.Length.Copy > 4.9: virginica (6/2)
## 
## 
## Evaluation on training data (150 cases):
## 
##      Decision Tree   
##    ----------------  
##    Size      Errors  
## 
##       4    4( 2.7%)   <<
## 
## 
##     (a)   (b)   (c)    <-classified as
##    ----  ----  ----
##      50                (a): class setosa
##            47     3    (b): class versicolor
##             1    49    (c): class virginica
## 
## 
##  Attribute usage:
## 
##  100.00% Petal.Length.Copy
##   66.67% Petal.Width
## 
## 
## Time: 0.0 secs