
JSON Manupulation using R

参考 Schouwenaars (2016)

jsonlite 取名取自 SQLite。

# Load the jsonlite package

# wine_json is a JSON
wine_json <- '{"name":"Chateau Migraine", "year":1997, "alcohol_pct":12.4, "color":"red", "awarded":false}'
wine_json_bracket <- glue('[{wine_json}]')

# Convert wine_json into a list: wine
wine <- fromJSON(wine_json)
wine_bracket <- fromJSON(wine_json_bracket)

# Print structure of wine
## List of 5
##  $ name       : chr "Chateau Migraine"
##  $ year       : int 1997
##  $ alcohol_pct: num 12.4
##  $ color      : chr "red"
##  $ awarded    : logi FALSE
## 'data.frame':    1 obs. of  5 variables:
##  $ name       : chr "Chateau Migraine"
##  $ year       : int 1997
##  $ alcohol_pct: num 12.4
##  $ color      : chr "red"
##  $ awarded    : logi FALSE

JSON is built on two structures: objects and arrays. DataCamp


  1. {}为总结构,因此反馈是一个list。
  2. []帮助识别成表格。


JSON Structure

json1 <- fromJSON('[1, 2, 3, 4, 5, 6]');class(json1)
## [1] "integer"
json2 <- fromJSON('{"a": [1, 2, 3], "b": [4, 5, 6]}');class(json2)
## [1] "list"
json3 <- fromJSON('[[1, 2], [3, 4]]');class(json3)
## [1] "matrix"
json4 <- fromJSON('[{"a": 1, "b": 2}, {"a": 3, "b": 4}, {"a": 5, "b": 6}]');class(json4)
## [1] "data.frame"

如果元素在一个 array 中,例如,[1, 2, 3, 4, 5, 6],那么数据类型一致,可以放入一个 vector 或者 matrix 中,以区别于 data.frame 和 list。 这点可以参考 Reference

如果元素在一个 {} 中,系统默认为可以是不同的数据类型,因此处理成 data.frame 中的两列,或者 list。

通过以上的解释,当 toJSON 把 R 中的四大格式

  1. vectors
  2. matrix
  3. data.frame
  4. list

转换成 json 时,也有理可据。

prettify or minify JSON

JSONs can come in different formats. Take these two JSONs, that are in fact exactly the same: the first one is in a minified format, the second one is in a pretty format with indentation, whitespace and new lines: DataCamp

JSON 一般由两种格式

  1. 一种是 minified 格式,非常紧凑,少空格、缩进,利于读取
  2. 一种是 prettied 格式,有空格、缩进,利于读取
# pretty_json <- prettify(toJSON(mtcars[1:5,1:2]))
pretty_json <- toJSON(mtcars[1:5,1:2], pretty = TRUE)

# Print pretty_json
## [
##   {
##     "mpg": 21,
##     "cyl": 6,
##     "_row": "Mazda RX4"
##   },
##   {
##     "mpg": 21,
##     "cyl": 6,
##     "_row": "Mazda RX4 Wag"
##   },
##   {
##     "mpg": 22.8,
##     "cyl": 4,
##     "_row": "Datsun 710"
##   },
##   {
##     "mpg": 21.4,
##     "cyl": 6,
##     "_row": "Hornet 4 Drive"
##   },
##   {
##     "mpg": 18.7,
##     "cyl": 8,
##     "_row": "Hornet Sportabout"
##   }
## ]
# Minify pretty_json: mini_json
mini_json <- minify(pretty_json)

# Print mini_json
## [{"mpg":21,"cyl":6,"_row":"Mazda RX4"},{"mpg":21,"cyl":6,"_row":"Mazda RX4 Wag"},{"mpg":22.8,"cyl":4,"_row":"Datsun 710"},{"mpg":21.4,"cyl":6,"_row":"Hornet 4 Drive"},{"mpg":18.7,"cyl":8,"_row":"Hornet Sportabout"}]


# Definition of quandl_url
quandl_url <- "https://www.quandl.com/api/v3/datasets/WIKI/FB/data.json?auth_token=i83asDsiWUUyfoypkgMz"

# Import Quandl data: quandl_data
quandl_data <- fromJSON(quandl_url)

# Print structure of quandl_data
## List of 1
##  $ dataset_data:List of 10
##   ..$ limit       : NULL
##   ..$ transform   : NULL
##   ..$ column_index: NULL
##   ..$ column_names: chr [1:13] "Date" "Open" "High" "Low" ...
##   ..$ start_date  : chr "2012-05-18"
##   ..$ end_date    : chr "2018-03-27"
##   ..$ frequency   : chr "daily"
##   ..$ data        : chr [1:1472, 1:13] "2018-03-27" "2018-03-26" "2018-03-23" "2018-03-22" ...
##   ..$ collapse    : NULL
##   ..$ order       : NULL
## $name
## [1] "Chateau Migraine"
## $year
## [1] 1997
## $alcohol_pct
## [1] 12.4
## $color
## [1] "red"
## $awarded
## [1] FALSE
Schouwenaars, Filip. 2016. “Importing Data in R (Part 2).” 2016. <https://www.datacamp.com/courses/importing-data-in-r-part-2>.