Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to properly access Pandas DataFrame generated from xarray Dataset

I have an xarray dataset created and converted to pandas like so:

arr = xr.Dataset(
    coords={
        "test1": range(20000,60000+1,2500),
        "test2": range(10, 100+1),
        "test3": range(1,10+1),
        "count_at_1": 0,
        "count_at_5": 0,
        "count_at_10": 0,
    }
)

df = arr.to_dataframe()

The dataframe looks like this, which seems to be exactly what I want:

                   count_at_1  count_at_5  count_at_10
test1 test2 test3                                     
20000 10    1               0           0            0
            2               0           0            0
            3               0           0            0
            4               0           0            0
            5               0           0            0
...                       ...         ...          ...
60000 100   6               0           0            0
            7               0           0            0
            8               0           0            0
            9               0           0            0
            10              0           0            0

However, when I try to access a specific value inside this dataframe it causes some issues:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

print(df["count_at_1"][50000][70][5]) # works fine, prints 0 as it should

df.loc["count_at_1"][50000][70][5] = 10 # does not work, KeyError: 'count_at_1'
df.at["count_at_1"][50000][70][5] = 10 # does not work, gives TypeError

I would also like to print out all the count_at_x values for a certain test1, test2, test3. Should look something like this:

print(df[50000][70][5])
count_at_1  count_at_5  count_at_10
         0           0            0

>Solution :

You just have the wrong indexing syntax. .loc and .at index rows when you give them a scalar, not columns. You can actually give them a tuple of (row, column) instead.

df.loc[(50000, 70, 5), "count_at_1"] = 11
df.at[(50000, 70, 5), "count_at_1"] = 12

You should use something similar for printing the value too, either:

print(df.loc[(50000, 70, 5), "count_at_1"])
print(df.at[(50000, 70, 5), "count_at_1"])

To get all the values on this row, you can use either:

>>> df.loc[(50000, 70, 5)]  # Single row = Series
count_at_1     12
count_at_5      0
count_at_10     0
Name: (50000, 70, 5), dtype: int64

>>> df.loc[[(50000, 70, 5)]]  # Selection of one row = df
                   count_at_1  count_at_5  count_at_10
test1 test2 test3                                     
50000 70    5              12           0            0

I’m not terribly familiar with xarray, but part of your confusion might stem from the fact that Pandas DataFrames are fundamentally 2D, so indexing multiple levels doesn’t really make sense.

For more info, see the Pandas user guide:

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading