I’m looking a way to use groupby on a dataframe that contains datetime64 format on some columns. But im having this error:
dfsum = df.groupby(['City']).sum()
TypeError: datetime64 type does not support sum operations
I tried using:
desired = df.select_dtypes(include=[int, float, object])
dfsum_city = desired.groupby(['City']).sum()
dfsum_city.reset_index(inplace=True)
But i dont think it is the right way to solve the problem. Any advice?
| Order ID | Product | Quantity Ordered | Price Each | Order Date | Purchase Address | day | month | year | Total Price | City |
|---|---|---|---|---|---|---|---|---|---|---|
| 141234 | iPhone | 1 | 700 | 2019-01-22 21:25:00 | 944 Walnut St, Boston, MA 02215 | 22 | 1 | 2019 | 700 | Boston (MA) |
| 141235 | Lightning Charging Cable | 1 | 14.95 | 2019-01-28 14:15:00 | 185 Maple St, Portland, OR 97035 | 28 | 1 | 2019 | 14.95 | Portland (OR) |
| 141236 | Wired Headphones | 2 | 11.99 | 2019-01-17 13:33:00 | 538 Adams St, San Francisco, CA 94016 | 17 | 1 | 2019 | 23.98 | San Francisco (CA) |
| 141237 | 27in FHD Monitor | 1 | 149.99 | 2019-01-05 20:33:00 | 738 10th St, Los Angeles, CA 90001 | 5 | 1 | 2019 | 149.99 | Los Angeles (CA) |
| 141238 | Wired Headphones | 1 | 11.99 | 2019-01-25 11:59:00 | 387 10th St, Austin, TX 73301 | 25 | 1 | 2019 | 11.99 | Austin (TX) |
>Solution :
.groupby.sum has a numeric_only parameter you could switch to True, but in this case, I think you want to select specific columns, since summing the Order ID and day, month, and year doesn’t make a lot of sense.
df.groupby(['City'])[['Quantity Ordered', 'Price Each', 'Total Price']].sum()
Quantity Ordered Price Each Total Price
City
Austin (TX) 1 11.99 11.99
Boston (MA) 1 700.00 700.00
Los Angeles (CA) 1 149.99 149.99
Portland (OR) 1 14.95 14.95
San Francisco (CA) 2 11.99 23.98