Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Data Cleaning Badly Designed JSON Data — Need advice on best strategy

I have over 1,000 JSON files representing game elements. This is a typical example:

{
    "name": "Reeve Hunter",
    "edition": 3,
    "revision": "2021 v1",
    "keywords": [
        "Circle",
        "Wolf Sworn",
        "Reeve of Orboros",
        "Solo"
    ],
    "point cost": 4,
    "field allowance": 2,
    "models": [
        {
            "Hunter": {
                "spd": 6,
                "str": 6,
                "mat": 6,
                "rat": 7,
                "def": 13,
                "arm": 13,
                "cmd": 7,
                "boxes": 5,
                "advantages": [
                    "Advance Deploy",
                    "Assault",
                    "Pathfinder"
                ],
                "base size": 30,
                "attacks": [
                    {
                        "Double Crossbow": {
                            "type": "ranged",
                            "rng": 12,
                            "rof": 2,
                            "pow": 10
                        }
                    },
                    {
                        "Battle Blade": {
                            "type": "melee",
                            "rng": 0.5,
                            "pow": 3,
                            "p+s": 9
                        }
                    },
                    {
                        "Cleft Sword": {
                            "type": "melee",
                            "rng": 1,
                            "pow": 5,
                            "p+s": 11,
                            "weapon qualities": ["Weapon Master"],
                            "abilities": ["Powerful Charge"]
                        }
                    }
                ],
                "abilities": [
                    "Hunter",
                    "Leadership [Reeves of Orboros]",
                    "Quickwork",
                    "Sprint"
                ]
            }
        }
    ]
}

While the above is valid JSON, once I got all the JSON files into a MongoDB database I realized I couldn’t analyze my data properly because every sub-document in the models and attacks arrays had a unique name as the key.

I want to transform each JSON document like this:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

{
  "name": "Reeve Hunter",
  "edition": 3,
  "revision": "2021 v1",
  "keywords": [
    "Circle",
    "Wolf Sworn",
    "Reeve of Orboros",
    "Solo"
  ],
  "point cost": 4,
  "field allowance": 2,
  "models": [
    {
      "model name": "Hunter",
      "spd": 6,
      "str": 6,
      "mat": 6,
      "rat": 7,
      "def": 13,
      "arm": 13,
      "cmd": 7,
      "boxes": 5,
      "advantages": [
        "Advance Deploy",
        "Assault",
        "Pathfinder"
      ],
      "base size": 30,
      "attacks": [
        {
          "attack name": "Double Crossbow",
          "type": "ranged",
          "rng": 12,
          "rof": 2,
          "pow": 10
        },
        {
          "attack name": "Battle Blade",
          "type": "melee",
          "rng": 0.5,
          "pow": 3,
          "p+s": 9
        },
        {
          "attack name": "Cleft Sword",
          "type": "melee",
          "rng": 1,
          "pow": 5,
          "p+s": 11,
          "weapon qualities": [
            "Weapon Master"
          ],
          "abilities": [
            "Powerful Charge"
          ]
        }
      ],
      "abilities": [
        "Hunter",
        "Leadership [Reeves of Orboros]",
        "Quickwork",
        "Sprint"
      ]
    }
  ]
}

I’m an old hat at using awk and sed for cleaning up text, but this transformation is complex and features enough "gotchas" that I feel like I need a different strategy.

I’ve been using the wonderful tool jq to manipulate and query JSON files, and I feel like there has to be a way to leverage its power to do what I need.

Does anyone have any advice on the best strategy to write a script that can transform all of my data into the cleaner format I require?

>Solution :

You could use to_entries to decompose the nested objects into .key and .value, then rearrange and update using |=:

<file.json jq '.models |= map(to_entries[] | {"model name": .key} + .value)
  | .models[].attacks |= map(to_entries[] | {"attack name": .key} + .value)'
{
  "name": "Reeve Hunter",
  "edition": 3,
  "revision": "2021 v1",
  "keywords": [
    "Circle",
    "Wolf Sworn",
    "Reeve of Orboros",
    "Solo"
  ],
  "point cost": 4,
  "field allowance": 2,
  "models": [
    {
      "model name": "Hunter",
      "spd": 6,
      "str": 6,
      "mat": 6,
      "rat": 7,
      "def": 13,
      "arm": 13,
      "cmd": 7,
      "boxes": 5,
      "advantages": [
        "Advance Deploy",
        "Assault",
        "Pathfinder"
      ],
      "base size": 30,
      "attacks": [
        {
          "attack name": "Double Crossbow",
          "type": "ranged",
          "rng": 12,
          "rof": 2,
          "pow": 10
        },
        {
          "attack name": "Battle Blade",
          "type": "melee",
          "rng": 0.5,
          "pow": 3,
          "p+s": 9
        },
        {
          "attack name": "Cleft Sword",
          "type": "melee",
          "rng": 1,
          "pow": 5,
          "p+s": 11,
          "weapon qualities": [
            "Weapon Master"
          ],
          "abilities": [
            "Powerful Charge"
          ]
        }
      ],
      "abilities": [
        "Hunter",
        "Leadership [Reeves of Orboros]",
        "Quickwork",
        "Sprint"
      ]
    }
  ]
}

Demo

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading