Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Split large JSON file by using jq and awk

I have a large file called

Metadata_01.json

It consistst of blocks that following this structure:

[
 {
  "Participant_id": "P04_00001",
  "no_of_people": "Multiple",
  "apparent_gender": "F",
  "geographic_location": "AUS",
  "ethnicity": "Caucasian",
  "capture_device_used": "iOS 14",
  "camera_orientation": "Portrait",
  "camera_position": "Side View",
  "indoor_outdoor_env": "Indoors",
  "lighting_condition": "Bright",
  "Occluded": 1,
  "category": "Two Person",
  "camera_movement": "Still",
  "action": "No action",
  "indoor_outdoor_in_moving_car_or_train": "Indoor",
  "daytime_nighttime": "Nighttime"
 },
 {
  "Participant_id": "P04_00002",
  "no_of_people": "Single",
  "apparent_gender": "M",
  "geographic_location": "AUS",
  "ethnicity": "Caucasian",
  "capture_device_used": "iOS 14",
  "camera_orientation": "Portrait",
  "camera_position": "Frontal View",
  "indoor_outdoor_env": "Outdoors",
  "lighting_condition": "Bright",
  "Occluded": "None",
  "category": "Animals",
  "camera_movement": "Still",
  "action": "Small action",
  "indoor_outdoor_in_moving_car_or_train": "Outdoor",
  "daytime_nighttime": "Daytime"
 },

And so on… thousands of them.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I am using the following command:

jq -cr '.[]' Metadata_01.json | awk '{print > (NR ".json")}'

And it’s kinda doing the expected work.

From large file that is structured like this

I am getting tons of files that named like this

And structure like this (in one line)

Instead of those results I need each json file to be named after the "Participant_id" (e.g. P04_00002.json)
And I want to preserve the json structure to look like that for each file

{
  "Participant_id": "P04_00002",
  "no_of_people": "Single",
  "apparent_gender": "M",
  "geographic_location": "AUS",
  "ethnicity": "Caucasian",
  "capture_device_used": "iOS 14",
  "camera_orientation": "Portrait",
  "camera_position": "Frontal View",
  "indoor_outdoor_env": "Outdoors",
  "lighting_condition": "Bright",
  "Occluded": "None",
  "category": "Animals",
  "camera_movement": "Still",
  "action": "Small action",
  "indoor_outdoor_in_moving_car_or_train": "Outdoor",
  "daytime_nighttime": "Daytime"
 }

What adjustments should I make to the command above to achieve this?
Or maybe there’s an easier way to do this? Thank you!

>Solution :

Would recommend using PowerShell since working with objects tends to be easier overall. Fortunately, PowerShell has a ConvertFrom-Json cmdlet you can use to convert the returned text into a PS object letting you reference the properties via dot notation (.Participant_id). Then, you’d just have to convert each iteration back to JSON format and export it. Here I use New-Item to create the file with the output but piping to Out-File would work as well.

$json = Get-Content -Path '.\Metadata_01.json' -Raw | ConvertFrom-Json 
foreach ($json_object in $json)
{
    New-Item -Path ".\Desktop\" -Name "$($json_object.Participant_id).json" -Value (ConvertTo-Json -InputObject $json_object) -ItemType 'File' -Force
}

The issue I can see you probably running into is not enough memory, due to the size of that file since you’ll be saving to a variable first in this example. There are ways around it but this is for demonstration purposes.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading