Filtering a JSON array in a Python JSON reading and parsing program

I am writing a python program that reads, parses, and hopefully filters out results in a JSON file coming from a URL. I have searched and watched videos for methods that could filter out specific results from a JSON file. However, this JSON file seems a little bit complicated.

 {
  "collection": {
    "version": "1.0",
    "href": "http://images-api.nasa.gov/search?q=galaxy&page=1",
    "items": [
      {
        "href": "https://images-assets.nasa.gov/image/PIA04921/collection.json",
        "data": [
          {
            "center": "JPL",
            "title": "Andromeda Galaxy",
            "nasa_id": "PIA04921",
            "media_type": "image",
            "keywords": [
              "Galaxy Evolution Explorer GALEX"
            ],
            "date_created": "2003-12-10T22:41:32Z",
            "description_508": "This image is from NASA Galaxy Evolution Explorer is an observation of the large galaxy in Andromeda, Messier 31. The Andromeda galaxy is the most massive in the local group of galaxies that includes our Milky Way.",
            "secondary_creator": "NASA/JPL/California Institute of Technology",
            "description": "This image is from NASA Galaxy Evolution Explorer is an observation of the large galaxy in Andromeda, Messier 31. The Andromeda galaxy is the most massive in the local group of galaxies that includes our Milky Way."
          }
        ],
        "links": [
          {
            "href": "https://images-assets.nasa.gov/image/PIA04921/PIA04921~thumb.jpg",
            "rel": "preview",
            "render": "image"
          }
        ]
      },
      {
        "href": "https://images-assets.nasa.gov/image/PIA04634/collection.json",
        "data": [ (different galaxies with their data etc.) 

The JSON file’s content is too long, and I am trying to filter out the galaxies with a specific title such as Sombrero. Since each father array has so many children, how would I implement it? I have tried doing the following:

from urllib.request import urlopen
import json

url = "https://images-api.nasa.gov/search?q=galaxy&page=1"

response = urlopen(url)
data_json = json.loads(response.read())

list(list(filter(lambda x:x["title"]=="Sombrero", data_json)))

>Solution :

You can use list comprehension:

planets = [i for i in data_json['collection']['items'] if i['data'][0]['title'] == 'Andromeda Galaxy']

To filter for partial string matches:

planets = [i for i in data_json['collection']['items'] if 'Andromeda' in i['data'][0]['title']]

To filter for partial string matches regardless capitalization:

planets = [i for i in data_json['collection']['items'] if lower('Andromeda') in lower(i['data'][0]['title'])]

Leave a Reply