r_eda

桑基图展示

Jiaxiang Li 2019-03-12

参考 www.data-imaginist.com

suppressMessages(library(tidyverse))
library(ggforce)
titanic <- reshape2::melt(Titanic)
titanic %>% head
##   Class    Sex   Age Survived value
## 1   1st   Male Child       No     0
## 2   2nd   Male Child       No     0
## 3   3rd   Male Child       No    35
## 4  Crew   Male Child       No     0
## 5   1st Female Child       No     0
## 6   2nd Female Child       No     0
titanic %>% dim
## [1] 32  5

这是R自带的 Titanic 数据集,一共有四列分类变量,作为研究变量。

library(magrittr)
## 
## Attaching package: 'magrittr'

## The following object is masked from 'package:purrr':
## 
##     set_names

## The following object is masked from 'package:tidyr':
## 
##     extract
titanic <- gather_set_data(titanic, 1:4)
titanic %>% head
##   Class    Sex   Age Survived value id     x    y
## 1   1st   Male Child       No     0  1 Class  1st
## 2   2nd   Male Child       No     0  2 Class  2nd
## 3   3rd   Male Child       No    35  3 Class  3rd
## 4  Crew   Male Child       No     0  4 Class Crew
## 5   1st Female Child       No     0  5 Class  1st
## 6   2nd Female Child       No     0  6 Class  2nd
titanic %>% dim
## [1] 128   8
  1. 清洗好的数据集,只需要增加idxy 三列后,即可进行桑基图画图,这一步可以使用gather_set_data完成
titanic %$% x %>% unique
## [1] "Class"    "Sex"      "Age"      "Survived"
titanic %$% y %>% unique
##  [1] 1st    2nd    3rd    Crew   Male   Female Child  Adult  No     Yes   
## Levels: 1st 2nd 3rd Crew Male Female Child Adult No Yes
  1. x 分别是环节流程,也就是列名
  2. y 分别是每列的levels
ggplot(titanic, aes(x, id = id, split = y, value = value)) +
    geom_parallel_sets(aes(fill = Sex), alpha = 0.3, axis.width = 0.1) +
    # add river
    geom_parallel_sets_axes(axis.width = 0.1) +
    # add block
    geom_parallel_sets_labels(colour = 'white') +
    # add name
    theme_minimal()

  1. 这个模块做桑基图,代码可读性比更高些,见注释。