I’m iterating row-by-row over an existing dataframe, and I need to select the contents of one row, preserving all of its properties, and then append new columns to it. The augmented row is then to be appended to a new dataframe. For various reasons, I can’t do a bulk operation on the entire dataframe, because complex logic goes into adding the contents of the new columns, and that logic depends on the contents of the original columns as well as on external data.
My problem is that I can’t seem to operate on a single row in a way that preserves the original types of each column; it always gets converted to a numpy float64 object:
print('Chunk dtypes:')
print(chunk.dtypes)
for i in range(len(chunk)):
row = chunk.iloc[i]
print('chunk: ',chunk)
print()
print('row: ', row)
print()
print('row dtype: ', row.dtype)
which gives the following output
Chunk dtypes:
dt int64
lat float32
lon float32
isfc uint8
isst uint16
itpw uint8
iali uint8
chunk: dt lat lon isfc isst itpw
iali 1393980240 33.93 -109.330002 10 279 8 99
row:
dt 1.393980e+09
lat 3.393000e+01
lon -1.093300e+02
isfc 1.000000e+01
isst 2.790000e+02
itpw 8.000000e+00
iali 9.900000e+01
...
Name: 0, dtype: float64
row dtype: float64
How can I operate on a single row at a time and concatenate it to a new dataframe without the unwanted type conversions, and ideally without having to retroactively reapply dtypes to columns? This is especially concerning for columns that are intended to contain datetime-like objects.
>Solution :
Reason is because at least one value is float and all values are numeric, Pandas will automatically convert the Series to a float dtype, this behavior is intentional, as it ensures that all of the values in the Series are consistent in data type.
If one value of column is object
(e.g. string
) it working like expected:
chunk = pd.DataFrame({'a': [10],
'b':[7.45],
'c':[4.7],
'd': ['dd']})
#view DataFrame
print(chunk)
a b c d
0 10 7.45 4.7 dd
for i in range(len(chunk)):
row = chunk.iloc[i]
print (row)
a 10
b 7.45
c 4.7
d dd
Name: 0, dtype: object
One idea is use double []
for one row DataFrame, if possible convert values to list of dictionaries and loop:
chunk = pd.DataFrame({'a': [10,5],
'b':[7.45,45],
'c':[4.7,0.4],
'd': [78,8]})
#view DataFrame
print(chunk)
a b c d
0 10 7.45 4.7 78
1 5 45.00 0.4 8
for i in range(len(chunk)):
row = chunk.iloc[[i]]
print (row)
a b c d
0 10 7.45 4.7 78
a b c d
1 5 45.0 0.4 8
for x in chunk.to_dict('records'):
print (x)
{'a': 10, 'b': 7.45, 'c': 4.7, 'd': 78}
{'a': 5, 'b': 45.0, 'c': 0.4, 'd': 8}