Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to do groupby on multi index dataframe based on condition

I have a multi index dataframe, and I want to combine rows based on certain conditions and I want to combine rows per index.

import pandas as pd

#  data
data = {
    'date': ['01/01/17', '02/01/17', '03/01/17', '01/01/17', '02/01/17', '03/01/17'],
    'language': ['python', 'python', 'python', 'r', 'r', 'r'],
    'ex_complete': [6, 5, 10, 8, 8, 8]
}

# Convert to DataFrame
df = pd.DataFrame(data)

# Convert DataFrame to JSON
json_data = df.to_json(orient='records')

# Convert JSON data back to DataFrame
df_from_json = pd.read_json(json_data, orient='records')

# Set date and language as multi-index
df_from_json.set_index(['date', 'language'], inplace=True)

df_from_json.sort_index(inplace= True)

df_from_json

1st Problem:

I want to combine the dates ’01/01/17′, ’02/01/17′ and rename as ‘1_2’, this should give me 4 rows:
2 rows for ‘1_2′ – (Python and R) and 2 rows for ’03/01/17’ (Python and R)

2nd Problem:

I want to combine Python and R rows and rename as Python_R, this should give 3 rows for 3 dates.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Any guidance or pointer will be hugely appreciated.

>Solution :

IIUC use DataFrame.rename with aggregate, e.g. sum:

out = (df_from_json.rename({pd.Timestamp('01/01/17'):'1_2',
                            pd.Timestamp('02/01/17'):'1_2'}, level=0)
                   .groupby(level=[0,1]).sum())
print (out)
                              ex_complete
date                language             
2017-03-01 00:00:00 python             10
                    r                   8
1_2                 python             11
                    r                  16

out = (df_from_json.rename({'python':'Python_R', 'r':'Python_R'}, level=1)
                    .groupby(level=[0,1]).sum())
print (out)
                     ex_complete
date       language             
2017-01-01 Python_R           14
2017-02-01 Python_R           13
2017-03-01 Python_R           18
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading