My (simplified) data structure is as follows:
x = [1,1,2,2,3,3,4,4,…n,n]
y = [1,2,1,2,1,2,1,2,…1,2]
A = [7,5,6,5,4,6,2,5,…4,3]
"A" is a variable which is linked to coordinates x and y. Dataframe consists of three columns. The variables are being read originally top down. Starting with x = 1 and y = 1, going down to y = max and after that x = 2, y from 1 to y_max -> next x = 3 and so on. So, this is 2 dimensional data, each value of "variable A" has a coordinate value of x and y in the same row in my dataframe.
However when I convert this directly to netCDF with
Data.to_netcdf("filename.nc")
I get massive amount of x and y variables (dimension ends up being an index from 1 to n). For example if my x coordinate goes from 1 to 5 like 1,1,1,2,2,2,3,3,3,4,4,4,5,5,5 the netCDF will have 15 x -coordinates while I would like it to only have 5 of them. And same happens with the y -coordinates. I have tried many other approaches but I do not end up with anything useful.
I would like to have a netCDF with "A" as a variable and x and y as dimensions without them being repeated multiple times. My real dataset has more than a hundred x values and nearly a hundred y values. So every x value is repeated y times and vice versa.
>Solution :
IIUC, you could set the x/y as index, convert to xarray and then to netCDF:
import pandas as pd
import xarray as xr
df = pd.DataFrame({'x': [1,1,2,2,3,3,4,4],
'y': [1,2,1,2,1,2,1,2],
'A': [7,5,6,5,4,6,2,5],
})
xr.Dataset.from_dataframe(df.set_index(['x', 'y'])).to_netcdf('filename.nc')
Dataset:
<xarray.Dataset>
Dimensions: (x: 4, y: 2)
Coordinates:
* x (x) int32 1 2 3 4
* y (y) int32 1 2
Data variables:
A (x, y) int32 ...
Underlying A:
array([[7, 5],
[6, 5],
[4, 6],
[2, 5]])