Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How can I create a multiindex matrix from a 4D xarray dataset?

I currently have a 4D dataset ds in xarray that looks like this:

<xarray.Dataset>
Dimensions:  (lat: 60, lon: 78, time: 216, pres: 395)
Coordinates:
  * lat      (lat) float32 0.5 1.5 2.5 3.5 4.5 5.5 ... 55.5 56.5 57.5 58.5 59.5
  * lon      (lon) float32 -45.5 -44.5 -43.5 -42.5 ... -69.5 -75.5 -74.5 -76.5
  * time     (time) float32 7.32e+05 7.32e+05 7.32e+05 ... 7.385e+05 7.385e+05
  * pres     (pres) float64 2.5 7.5 12.5 17.5 ... 1.962e+03 1.968e+03 1.972e+03
Data variables:
    var       (pres, lat, lon, time) float64 2.03e+03 2.03e+03 ... nan nan>

My goal is to turn it into a pandas df that looks like this:

id   time  pres param  20.5-70.5  20.5-71.5  20.5-72.5
0     0     0   var       2085       2073       2057
1     0     1   var       2114       2156       2054
2     0     2   var       2039       2006       2179
3     1     0   var       2199       2144       2033
4     1     1   var       2056       2102       2191
5     1     2   var       2062       2033       2052
6     2     0   var       2001       2153       2170
7     2     1   var       2187       2120       2100
8     2     2   var       2138       2076       2002

where I have some sort of multi-index with time and pres, a param column (because I will likely have more than one variable at a time), and each pixel (so paired lat-lon) as a column header, so that for each pixel column I have the var value corresponding to the time and pres. I need this format for the next part of my analysis that includes some vectorization.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I tried a few things, including stacked = ds.stack(coordinates=["lat", "lon"]) which I think is the start of what I want to do, to then do stacked.to_dataframe() but then the latter unstacks my paired coordinates. I think I’m missing something here but I’m not too sure how to go about this?

Any help is greatly appreciated!

Thanks

>Solution :

(Since you don’t provide a sample, you should adapt this example)

Use stack/unstack to reshape your dataset:

import xarray as xr
import pandas as pd

ds = xr.tutorial.load_dataset('air_temperature')
df = ds.to_dataframe().rename_axis(columns='param').stack('param').unstack(['lat', 'lon'])
df.columns = [f"{lat}-{lon}" for lat, lon in df.columns]

Output:

>>> df
                           75.0-200.0  75.0-202.5  75.0-205.0  75.0-207.5  ...  15.0-322.5  15.0-325.0  15.0-327.5  15.0-330.0
time                param                                                  ...                                                
2013-01-01 00:00:00 air    241.199997  242.500000  243.500000  244.000000  ...  297.600006  296.899994  296.790009  296.600006
2013-01-01 06:00:00 air    242.099991  242.699997  243.099991  243.389999  ...  296.899994  296.399994  296.399994  296.600006
2013-01-01 12:00:00 air    242.299988  242.199997  242.299988  242.500000  ...  297.600006  297.000000  297.000000  296.790009
2013-01-01 18:00:00 air    241.889999  241.799988  241.799988  242.099991  ...  298.199982  297.790009  298.000000  297.899994
2013-01-02 00:00:00 air    243.199997  243.099991  243.099991  243.299988  ...  297.699982  297.100006  297.399994  297.399994
...                               ...         ...         ...         ...  ...         ...         ...         ...         ...
2014-12-30 18:00:00 air    243.089996  243.389999  243.689987  243.789993  ...  297.989990  297.389984  296.889984  296.089996
2014-12-31 00:00:00 air    242.489990  242.389999  242.189987  241.689987  ...  297.290009  296.589996  295.989990  295.489990
2014-12-31 06:00:00 air    243.489990  242.989990  242.089996  240.689987  ...  297.089996  296.089996  295.790009  295.790009
2014-12-31 12:00:00 air    245.789993  244.789993  243.489990  241.889999  ...  296.589996  295.690002  295.489990  295.190002
2014-12-31 18:00:00 air    245.089996  244.289993  243.289993  242.189987  ...  297.190002  296.489990  296.190002  295.690002

[2920 rows x 1325 columns]

>>> ds
<xarray.Dataset>
Dimensions:  (lat: 25, time: 2920, lon: 53)
Coordinates:
  * lat      (lat) float32 75.0 72.5 70.0 67.5 65.0 ... 25.0 22.5 20.0 17.5 15.0
  * lon      (lon) float32 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0
  * time     (time) datetime64[ns] 2013-01-01 ... 2014-12-31T18:00:00
Data variables:
    air      (time, lat, lon) float32 241.2 242.5 243.5 ... 296.5 296.2 295.7
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading