I’m currently trying to create a set of variables by extracting values from a string of characters in R.
I have a variable with the following format:
{"name":"John Smith","employee_id":"AB001","email":"johnsmith@abc.com"},{"name":"Sarah John","employee_id":"AB002","email":"sarahjohn@abc.com"},{"name":"Chris Lee","employee_id":"AB003","email":"chrislee@abc.com"},{"name":"Brian Johns","employee_id":"AB004","email":"brianjohns@abc.com"}
I would like extract all the employee_id(s) and create a varaible for each of them.
What I would like from the above example,
| Employee ID1 | Employee ID2 | Employee ID3 | Employee ID4 |
|---|---|---|---|
| AB001 | AB002 | AB003 | AB004 |
Any suggestions/directions would be much appreciated!
Many thanks in advance
>Solution :
Your data is in JSON format, so you should not use regex to parse that data. There are packages in R that you can use to parse JSON data. I’ll show you how to handle your data using the packages jsonlite.
The first step is to make sure that your data is valid data. The pieces in curly brackets are so-called JSON objects and since you have several of them, you should combine them into an array. This can be done by enclosing your string in square brackets:
data <- '{"name":"John Smith","employee_id":"AB001","email":"johnsmith@abc.com"},{"name":"Sarah John","employee_id":"AB002","email":"sarahjohn@abc.com"},{"name":"Chris Lee","employee_id":"AB003","email":"chrislee@abc.com"},{"name":"Brian Johns","employee_id":"AB004","email":"brianjohns@abc.com"}'
json_data <- paste0("[", data, "]")
This can now easily be parsed with jsonlite:
library(jsonlite)
table <- fromJSON(json_data)
table
## name employee_id email
## 1 John Smith AB001 johnsmith@abc.com
## 2 Sarah John AB002 sarahjohn@abc.com
## 3 Chris Lee AB003 chrislee@abc.com
## 4 Brian Johns AB004 brianjohns@abc.com
In order to get at the employee IDs, just extract the column:
table$employee_id
## [1] "AB001" "AB002" "AB003" "AB004"