r_code

JSON Manupulation using R

参考 Schouwenaars (2016)

jsonlite 取名取自 SQLite。

# Load the jsonlite package
library(jsonlite)
library(glue)

# wine_json is a JSON
wine_json <- '{"name":"Chateau Migraine", "year":1997, "alcohol_pct":12.4, "color":"red", "awarded":false}'
wine_json_bracket <- glue('[{wine_json}]')

# Convert wine_json into a list: wine
wine <- fromJSON(wine_json)
wine_bracket <- fromJSON(wine_json_bracket)

# Print structure of wine
str(wine)
## List of 5
##  $ name       : chr "Chateau Migraine"
##  $ year       : int 1997
##  $ alcohol_pct: num 12.4
##  $ color      : chr "red"
##  $ awarded    : logi FALSE
str(wine_bracket)
## 'data.frame':    1 obs. of  5 variables:
##  $ name       : chr "Chateau Migraine"
##  $ year       : int 1997
##  $ alcohol_pct: num 12.4
##  $ color      : chr "red"
##  $ awarded    : logi FALSE

JSON is built on two structures: objects and arrays. DataCamp

如上的举例,我们发现,

  1. {}为总结构,因此反馈是一个list。
  2. []帮助识别成表格。

以下进行系统的解释。

JSON Structure

json1 <- fromJSON('[1, 2, 3, 4, 5, 6]');class(json1)
## [1] "integer"
json2 <- fromJSON('{"a": [1, 2, 3], "b": [4, 5, 6]}');class(json2)
## [1] "list"
json3 <- fromJSON('[[1, 2], [3, 4]]');class(json3)
## [1] "matrix"
json4 <- fromJSON('[{"a": 1, "b": 2}, {"a": 3, "b": 4}, {"a": 5, "b": 6}]');class(json4)
## [1] "data.frame"

如果元素在一个 array 中,例如,[1, 2, 3, 4, 5, 6],那么数据类型一致,可以放入一个 vector 或者 matrix 中,以区别于 data.frame 和 list。 这点可以参考 Reference

如果元素在一个 {} 中,系统默认为可以是不同的数据类型,因此处理成 data.frame 中的两列,或者 list。

通过以上的解释,当 toJSON 把 R 中的四大格式

  1. vectors
  2. matrix
  3. data.frame
  4. list

转换成 json 时,也有理可据。

prettify or minify JSON

JSONs can come in different formats. Take these two JSONs, that are in fact exactly the same: the first one is in a minified format, the second one is in a pretty format with indentation, whitespace and new lines: DataCamp

JSON 一般由两种格式

  1. 一种是 minified 格式,非常紧凑,少空格、缩进,利于读取
  2. 一种是 prettied 格式,有空格、缩进,利于读取
# pretty_json <- prettify(toJSON(mtcars[1:5,1:2]))
pretty_json <- toJSON(mtcars[1:5,1:2], pretty = TRUE)

# Print pretty_json
pretty_json
## [
##   {
##     "mpg": 21,
##     "cyl": 6,
##     "_row": "Mazda RX4"
##   },
##   {
##     "mpg": 21,
##     "cyl": 6,
##     "_row": "Mazda RX4 Wag"
##   },
##   {
##     "mpg": 22.8,
##     "cyl": 4,
##     "_row": "Datsun 710"
##   },
##   {
##     "mpg": 21.4,
##     "cyl": 6,
##     "_row": "Hornet 4 Drive"
##   },
##   {
##     "mpg": 18.7,
##     "cyl": 8,
##     "_row": "Hornet Sportabout"
##   }
## ]
# Minify pretty_json: mini_json
mini_json <- minify(pretty_json)

# Print mini_json
mini_json
## [{"mpg":21,"cyl":6,"_row":"Mazda RX4"},{"mpg":21,"cyl":6,"_row":"Mazda RX4 Wag"},{"mpg":22.8,"cyl":4,"_row":"Datsun 710"},{"mpg":21.4,"cyl":6,"_row":"Hornet 4 Drive"},{"mpg":18.7,"cyl":8,"_row":"Hornet Sportabout"}]

fromJSON可以识别链接

# Definition of quandl_url
quandl_url <- "https://www.quandl.com/api/v3/datasets/WIKI/FB/data.json?auth_token=i83asDsiWUUyfoypkgMz"

# Import Quandl data: quandl_data
quandl_data <- fromJSON(quandl_url)

# Print structure of quandl_data
str(quandl_data)
## List of 1
##  $ dataset_data:List of 10
##   ..$ limit       : NULL
##   ..$ transform   : NULL
##   ..$ column_index: NULL
##   ..$ column_names: chr [1:13] "Date" "Open" "High" "Low" ...
##   ..$ start_date  : chr "2012-05-18"
##   ..$ end_date    : chr "2018-03-27"
##   ..$ frequency   : chr "daily"
##   ..$ data        : chr [1:1472, 1:13] "2018-03-27" "2018-03-26" "2018-03-23" "2018-03-22" ...
##   ..$ collapse    : NULL
##   ..$ order       : NULL
fromJSON("datasets/wine_json.txt")
## $name
## [1] "Chateau Migraine"
## 
## $year
## [1] 1997
## 
## $alcohol_pct
## [1] 12.4
## 
## $color
## [1] "red"
## 
## $awarded
## [1] FALSE
Schouwenaars, Filip. 2016. “Importing Data in R (Part 2).” 2016. <https://www.datacamp.com/courses/importing-data-in-r-part-2>.