Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

R regular expression – extracting a multiple specific values after a specific word

I’m currently trying to create a set of variables by extracting values from a string of characters in R.

I have a variable with the following format:

{"name":"John Smith","employee_id":"AB001","email":"johnsmith@abc.com"},{"name":"Sarah John","employee_id":"AB002","email":"sarahjohn@abc.com"},{"name":"Chris Lee","employee_id":"AB003","email":"chrislee@abc.com"},{"name":"Brian Johns","employee_id":"AB004","email":"brianjohns@abc.com"}

I would like extract all the employee_id(s) and create a varaible for each of them.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

What I would like from the above example,

Employee ID1 Employee ID2 Employee ID3 Employee ID4
AB001 AB002 AB003 AB004

Any suggestions/directions would be much appreciated!

Many thanks in advance

>Solution :

Your data is in JSON format, so you should not use regex to parse that data. There are packages in R that you can use to parse JSON data. I’ll show you how to handle your data using the packages jsonlite.

The first step is to make sure that your data is valid data. The pieces in curly brackets are so-called JSON objects and since you have several of them, you should combine them into an array. This can be done by enclosing your string in square brackets:

data <- '{"name":"John Smith","employee_id":"AB001","email":"johnsmith@abc.com"},{"name":"Sarah John","employee_id":"AB002","email":"sarahjohn@abc.com"},{"name":"Chris Lee","employee_id":"AB003","email":"chrislee@abc.com"},{"name":"Brian Johns","employee_id":"AB004","email":"brianjohns@abc.com"}'
json_data <- paste0("[", data, "]")

This can now easily be parsed with jsonlite:

library(jsonlite)
table <- fromJSON(json_data)
table
##          name employee_id              email
## 1  John Smith       AB001  johnsmith@abc.com
## 2  Sarah John       AB002  sarahjohn@abc.com
## 3   Chris Lee       AB003   chrislee@abc.com
## 4 Brian Johns       AB004 brianjohns@abc.com

In order to get at the employee IDs, just extract the column:

table$employee_id
## [1] "AB001" "AB002" "AB003" "AB004"
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading