Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Python – Converting a dataframe with columns x, y and a variable "A" into a netCDF file

My (simplified) data structure is as follows:

x = [1,1,2,2,3,3,4,4,…n,n]

y = [1,2,1,2,1,2,1,2,…1,2]

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

A = [7,5,6,5,4,6,2,5,…4,3]

"A" is a variable which is linked to coordinates x and y. Dataframe consists of three columns. The variables are being read originally top down. Starting with x = 1 and y = 1, going down to y = max and after that x = 2, y from 1 to y_max -> next x = 3 and so on. So, this is 2 dimensional data, each value of "variable A" has a coordinate value of x and y in the same row in my dataframe.

However when I convert this directly to netCDF with

Data.to_netcdf("filename.nc")

I get massive amount of x and y variables (dimension ends up being an index from 1 to n). For example if my x coordinate goes from 1 to 5 like 1,1,1,2,2,2,3,3,3,4,4,4,5,5,5 the netCDF will have 15 x -coordinates while I would like it to only have 5 of them. And same happens with the y -coordinates. I have tried many other approaches but I do not end up with anything useful.

I would like to have a netCDF with "A" as a variable and x and y as dimensions without them being repeated multiple times. My real dataset has more than a hundred x values and nearly a hundred y values. So every x value is repeated y times and vice versa.

>Solution :

IIUC, you could set the x/y as index, convert to xarray and then to netCDF:

import pandas as pd
import xarray as xr

df = pd.DataFrame({'x': [1,1,2,2,3,3,4,4],
                   'y': [1,2,1,2,1,2,1,2],
                   'A': [7,5,6,5,4,6,2,5],
                   })

xr.Dataset.from_dataframe(df.set_index(['x', 'y'])).to_netcdf('filename.nc')

Dataset:

<xarray.Dataset>
Dimensions:  (x: 4, y: 2)
Coordinates:
  * x        (x) int32 1 2 3 4
  * y        (y) int32 1 2
Data variables:
    A        (x, y) int32 ...

Underlying A:

array([[7, 5],
       [6, 5],
       [4, 6],
       [2, 5]])
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading