r_eda

WHO disease dataset EDA

参考 Strayer (2019)

suppressMessages(library(tidyverse))

熟悉数据集

who_disease <- read_csv('datasets/who_disease.csv')
## Parsed with column specification:
## cols(
##   region = col_character(),
##   countryCode = col_character(),
##   country = col_character(),
##   disease = col_character(),
##   year = col_double(),
##   cases = col_double()
## )
# print dataframe to inspect
who_disease %>% head
## # A tibble: 6 x 6
##   region countryCode country             disease  year cases
##   <chr>  <chr>       <chr>               <chr>   <dbl> <dbl>
## 1 EMR    AFG         Afghanistan         measles  2016   638
## 2 EUR    ALB         Albania             measles  2016    17
## 3 AFR    DZA         Algeria             measles  2016    41
## 4 EUR    AND         Andorra             measles  2016     0
## 5 AFR    AGO         Angola              measles  2016    53
## 6 AMR    ATG         Antigua and Barbuda measles  2016     0
# set x aesthetic to region column
ggplot(who_disease, aes(region)) +
    geom_bar()

geom_bar()直接默认 count。

Strayer, Nick. 2019. “Visualization Best Practices in R.” DataCamp. 2019. <https://www.datacamp.com/courses/visualization-best-practices-in-r>.