I have a dataframe in the following form:
index client_ip http_response_code
2022-07-23 05:10:10+00:00 172.19.0.1 300
2022-07-23 06:13:26+00:00 192.168.0.1 400
... ... ...
I need to group by clientip and count the number of occurences of number 4xx in the column response, namely the times of occurences of integers start with 4.
What I have tried is the following:
df.groupby('client_ip')['http_response_code'].apply(lambda x: (str(x).startswith(str(4))).sum())
But I get the following error:
AttributeError: 'bool' object has no attribute 'sum'
However, if let’s say that I need to find the number of occurences of 400, then the following does not give any error, although is still boolean:
df.groupby('client_ip')['http_response_code'].apply(lambda x: (x==400).sum())
Any idea of what is wrong here?
>Solution :
Any idea of what is wrong here?
Your function get Series as input, comparing it against value gives Series of boolean values, which could be summed, using str functions gives str, which has not .sum. Use .astype(str) to convert each value into str rather than whole Series, example
import pandas as pd
df = pd.DataFrame({"User":["A","A","B"],"Status":[400,301,302]})
grouped = df.groupby("User")["Status"].apply(lambda x:x.astype(str).str.startswith("4").sum())
print(grouped)
output
User
A 1
B 0
Name: Status, dtype: int64