Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Increasing performance for search pandas df, Count occurrences of starting string grouped by unique identifier

Current dataframe is as follows:

df = pd.read_csv('filename.csv', delimiter=',')

print(df)
idx   uniqueID      String 

0        1           'hello'
1        1           'goodbye'
2        1           'happy'
3        2           'hello'
4        2           'happy'
5        3           'goodbye' 
6        3           'hello'
7        3           'hello'
8        4           'goodbye'
9        5           'goodbye'

Expected Output:

{ 'hello': 2, 'goodbye' : 3} 
Where hello was counted by idx 0 & 3, and goodbye was counted by idx 5 & 8 & 9. 
In the actual dataset there are more than two starting strings. 



I'm thinking of potentially using pandas .groupby() && .where(), 
to filter out for the first time a uniqueId occurs, then group 
by the string? Not entirely sure. 

Question: How do I get the counts of the ‘starting string’, only when uniqueID is occurring for the first time with increased performance.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Thus far, I’m doing with a simple for loop of the data and checking with if/else statements. But this is incredibly slow with a large dataframe.

I’m curious if there are any functions built in pandas, or another library out there, that would reduce the overall time it takes.

>Solution :

Achieving better than O(N) is not possible.

You can drop_duplicates, then value_counts:

out = df.drop_duplicates('uniqueID')['String'].value_counts()

output:

goodbye    3
hello      2
Name: String, dtype: int64

As dictionary:

df.drop_duplicates('uniqueID')['String'].value_counts().to_dict()

output: {'goodbye': 3, 'hello': 2}

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading