Split large JSON file by using jq and awk

I have a large file called

Metadata_01.json

It consistst of blocks that following this structure:

[
 {
  "Participant_id": "P04_00001",
  "no_of_people": "Multiple",
  "apparent_gender": "F",
  "geographic_location": "AUS",
  "ethnicity": "Caucasian",
  "capture_device_used": "iOS 14",
  "camera_orientation": "Portrait",
  "camera_position": "Side View",
  "indoor_outdoor_env": "Indoors",
  "lighting_condition": "Bright",
  "Occluded": 1,
  "category": "Two Person",
  "camera_movement": "Still",
  "action": "No action",
  "indoor_outdoor_in_moving_car_or_train": "Indoor",
  "daytime_nighttime": "Nighttime"
 },
 {
  "Participant_id": "P04_00002",
  "no_of_people": "Single",
  "apparent_gender": "M",
  "geographic_location": "AUS",
  "ethnicity": "Caucasian",
  "capture_device_used": "iOS 14",
  "camera_orientation": "Portrait",
  "camera_position": "Frontal View",
  "indoor_outdoor_env": "Outdoors",
  "lighting_condition": "Bright",
  "Occluded": "None",
  "category": "Animals",
  "camera_movement": "Still",
  "action": "Small action",
  "indoor_outdoor_in_moving_car_or_train": "Outdoor",
  "daytime_nighttime": "Daytime"
 },

And so on… thousands of them.

I am using the following command:

jq -cr '.[]' Metadata_01.json | awk '{print > (NR ".json")}'

And it’s kinda doing the expected work.

From large file that is structured like this

I am getting tons of files that named like this

And structure like this (in one line)

Instead of those results I need each json file to be named after the "Participant_id" (e.g. P04_00002.json)
And I want to preserve the json structure to look like that for each file

{
  "Participant_id": "P04_00002",
  "no_of_people": "Single",
  "apparent_gender": "M",
  "geographic_location": "AUS",
  "ethnicity": "Caucasian",
  "capture_device_used": "iOS 14",
  "camera_orientation": "Portrait",
  "camera_position": "Frontal View",
  "indoor_outdoor_env": "Outdoors",
  "lighting_condition": "Bright",
  "Occluded": "None",
  "category": "Animals",
  "camera_movement": "Still",
  "action": "Small action",
  "indoor_outdoor_in_moving_car_or_train": "Outdoor",
  "daytime_nighttime": "Daytime"
 }

What adjustments should I make to the command above to achieve this?
Or maybe there’s an easier way to do this? Thank you!

>Solution :

Would recommend using PowerShell since working with objects tends to be easier overall. Fortunately, PowerShell has a ConvertFrom-Json cmdlet you can use to convert the returned text into a PS object letting you reference the properties via dot notation (.Participant_id). Then, you’d just have to convert each iteration back to JSON format and export it. Here I use New-Item to create the file with the output but piping to Out-File would work as well.

$json = Get-Content -Path '.\Metadata_01.json' -Raw | ConvertFrom-Json 
foreach ($json_object in $json)
{
    New-Item -Path ".\Desktop\" -Name "$($json_object.Participant_id).json" -Value (ConvertTo-Json -InputObject $json_object) -ItemType 'File' -Force
}

The issue I can see you probably running into is not enough memory, due to the size of that file since you’ll be saving to a variable first in this example. There are ways around it but this is for demonstration purposes.

Leave a Reply