Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Is it impossible to select a single dataframe row without unwanted type conversion?

I’m iterating row-by-row over an existing dataframe, and I need to select the contents of one row, preserving all of its properties, and then append new columns to it. The augmented row is then to be appended to a new dataframe. For various reasons, I can’t do a bulk operation on the entire dataframe, because complex logic goes into adding the contents of the new columns, and that logic depends on the contents of the original columns as well as on external data.

My problem is that I can’t seem to operate on a single row in a way that preserves the original types of each column; it always gets converted to a numpy float64 object:

print('Chunk dtypes:')
print(chunk.dtypes)

for i in range(len(chunk)):
    row = chunk.iloc[i]
    
    print('chunk: ',chunk)
    print()
    print('row:  ', row)
    print()
    print('row dtype: ', row.dtype)

which gives the following output

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Chunk dtypes:
dt             int64
lat          float32
lon          float32
isfc           uint8
isst          uint16
itpw           uint8
iali           uint8

chunk:             dt    lat         lon  isfc  isst  itpw 
iali    1393980240  33.93 -109.330002    10   279     8    99   

row:   
dt           1.393980e+09
lat          3.393000e+01
lon         -1.093300e+02
isfc         1.000000e+01
isst         2.790000e+02
itpw         8.000000e+00
iali         9.900000e+01
...    
Name: 0, dtype: float64

row dtype:  float64

How can I operate on a single row at a time and concatenate it to a new dataframe without the unwanted type conversions, and ideally without having to retroactively reapply dtypes to columns? This is especially concerning for columns that are intended to contain datetime-like objects.

>Solution :

Reason is because at least one value is float and all values are numeric, Pandas will automatically convert the Series to a float dtype, this behavior is intentional, as it ensures that all of the values in the Series are consistent in data type.

If one value of column is object (e.g. string) it working like expected:

chunk = pd.DataFrame({'a': [10],
                   'b':[7.45],
                   'c':[4.7],
                   'd': ['dd']})

#view DataFrame
print(chunk)
    a     b    c   d
0  10  7.45  4.7  dd

for i in range(len(chunk)):
    row = chunk.iloc[i]
    print (row)
    
    a      10
    b    7.45
    c     4.7
    d      dd
    Name: 0, dtype: object

One idea is use double [] for one row DataFrame, if possible convert values to list of dictionaries and loop:

chunk = pd.DataFrame({'a': [10,5],
                   'b':[7.45,45],
                   'c':[4.7,0.4],
                   'd': [78,8]})

#view DataFrame
print(chunk)
    a      b    c   d
0  10   7.45  4.7  78
1   5  45.00  0.4   8

for i in range(len(chunk)):
    row = chunk.iloc[[i]]
    print (row)
        a     b    c   d
    0  10  7.45  4.7  78
       a     b    c  d
    1  5  45.0  0.4  8
    
for x in chunk.to_dict('records'):
    print (x)
    {'a': 10, 'b': 7.45, 'c': 4.7, 'd': 78}
    {'a': 5, 'b': 45.0, 'c': 0.4, 'd': 8}
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading