Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

group by removing column I'd like to group by in pandas

I’m trying to take a list of list and then add it to pandas to sum up by one value.

My list of list:

[['she', 'walked', 4],
 ['she', 'my', 3],
 ['she', 'dog', 2],
 ['she', 'to', 1],
 ['sniffed', 'I', 5],
 ['sniffed', 'walked', 4],
 ['sniffed', 'my', 3],
 ['sniffed', 'dog', 2],
 ['sniffed', 'to', 1]]

I create the dataframe:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

import pandas as pd
df = pd.DataFrame(distanceList, columns = ['word1', 'word2', 'weight']) 

the result looks weird(it has the extra index column for some reason):

    word1   word2   weight
0   I   walked  5
1   I   my  4
2   I   dog 3
3   I   to  2
4   I   the 1
... ... ... ...
1135    I   walked  5
1136    I   my  4
1137    I   dog 3
1138    I   to  2
1139    I   the 1
1140 rows × 3 columns

but when I sum it, seems to combine the words.
I used this:

df.groupby('weight').sum()

word1   word2
weight      
1   Iwalkedmydogtotheparkandshesniffedgrassthenrol...   thethethethethetotototototototototototototothe...
2   Iwalkedmydogtotheparkandshesniffedgrassthenrol...   totototodogdogdogdogdogdogdogdogdogdogdogdogdo...
3   Iwalkedmydogtotheparkandshesniffedgrassthenrol...   dogdogdogmymymymymymymymymymymymymymymymydogdo...
4   Iwalkedmydogtotheparkandshesniffedgrassthenrol...   mymywalkedwalkedwalkedwalkedwalkedwalkedwalked...
5   Iwalkedmydogtotheparkandshesniffedgrassthenrol...   walkedIIIIIIIIIIIIIIIIIIwalkedIIIIIIIIIIIIIIII...

What I want is if I have:

dog, cat, 1
dog, cat, 5
dog, rabbit, 1

then the result is:

dog, cat, 6
dog, rabbit, 1

>Solution :

The code you want is as follows.

df.groupby('word1')['weight'].sum()

The code calculates sum of weight according to the word1.

Your code calculates sum of word1 and word2 according to the weight, and sum of strings are concat of strings. That is why the string is concat (e.g, Iwalkedmydogtotheparkandshesniffedgrassthenrol)

Edit
I am confusing with the example data.
You should try the following code.

df.groupby(['word1', 'word2'], as_index = False)['weight'].sum()

The result as follows.

    word1   word2   weight
0   dog      cat      6
1   dog      rabbit   1
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading