Is it impossible to select a single dataframe row without unwanted type conversion?


I’m iterating row-by-row over an existing dataframe, and I need to select the contents of one row, preserving all of its properties, and then append new columns to it. The augmented row is then to be appended to a new dataframe. For various reasons, I can’t do a bulk operation on the entire dataframe, because complex logic goes into adding the contents of the new columns, and that logic depends on the contents of the original columns as well as on external data.

My problem is that I can’t seem to operate on a single row in a way that preserves the original types of each column; it always gets converted to a numpy float64 object:

print('Chunk dtypes:')

for i in range(len(chunk)):
    row = chunk.iloc[i]
    print('chunk: ',chunk)
    print('row:  ', row)
    print('row dtype: ', row.dtype)

which gives the following output

Chunk dtypes:
dt             int64
lat          float32
lon          float32
isfc           uint8
isst          uint16
itpw           uint8
iali           uint8

chunk:             dt    lat         lon  isfc  isst  itpw 
iali    1393980240  33.93 -109.330002    10   279     8    99   

dt           1.393980e+09
lat          3.393000e+01
lon         -1.093300e+02
isfc         1.000000e+01
isst         2.790000e+02
itpw         8.000000e+00
iali         9.900000e+01
Name: 0, dtype: float64

row dtype:  float64

How can I operate on a single row at a time and concatenate it to a new dataframe without the unwanted type conversions, and ideally without having to retroactively reapply dtypes to columns? This is especially concerning for columns that are intended to contain datetime-like objects.

>Solution :

Reason is because at least one value is float and all values are numeric, Pandas will automatically convert the Series to a float dtype, this behavior is intentional, as it ensures that all of the values in the Series are consistent in data type.

If one value of column is object (e.g. string) it working like expected:

chunk = pd.DataFrame({'a': [10],
                   'd': ['dd']})

#view DataFrame
    a     b    c   d
0  10  7.45  4.7  dd

for i in range(len(chunk)):
    row = chunk.iloc[i]
    print (row)
    a      10
    b    7.45
    c     4.7
    d      dd
    Name: 0, dtype: object

One idea is use double [] for one row DataFrame, if possible convert values to list of dictionaries and loop:

chunk = pd.DataFrame({'a': [10,5],
                   'd': [78,8]})

#view DataFrame
    a      b    c   d
0  10   7.45  4.7  78
1   5  45.00  0.4   8

for i in range(len(chunk)):
    row = chunk.iloc[[i]]
    print (row)
        a     b    c   d
    0  10  7.45  4.7  78
       a     b    c  d
    1  5  45.0  0.4  8
for x in chunk.to_dict('records'):
    print (x)
    {'a': 10, 'b': 7.45, 'c': 4.7, 'd': 78}
    {'a': 5, 'b': 45.0, 'c': 0.4, 'd': 8}

Leave a Reply Cancel reply