I have a multi index dataframe, and I want to combine rows based on certain conditions and I want to combine rows per index.
import pandas as pd
# data
data = {
'date': ['01/01/17', '02/01/17', '03/01/17', '01/01/17', '02/01/17', '03/01/17'],
'language': ['python', 'python', 'python', 'r', 'r', 'r'],
'ex_complete': [6, 5, 10, 8, 8, 8]
}
# Convert to DataFrame
df = pd.DataFrame(data)
# Convert DataFrame to JSON
json_data = df.to_json(orient='records')
# Convert JSON data back to DataFrame
df_from_json = pd.read_json(json_data, orient='records')
# Set date and language as multi-index
df_from_json.set_index(['date', 'language'], inplace=True)
df_from_json.sort_index(inplace= True)
df_from_json
1st Problem:
I want to combine the dates ’01/01/17′, ’02/01/17′ and rename as ‘1_2’, this should give me 4 rows:
2 rows for ‘1_2′ – (Python and R) and 2 rows for ’03/01/17’ (Python and R)
2nd Problem:
I want to combine Python and R rows and rename as Python_R, this should give 3 rows for 3 dates.
Any guidance or pointer will be hugely appreciated.
>Solution :
IIUC use DataFrame.rename with aggregate, e.g. sum:
out = (df_from_json.rename({pd.Timestamp('01/01/17'):'1_2',
pd.Timestamp('02/01/17'):'1_2'}, level=0)
.groupby(level=[0,1]).sum())
print (out)
ex_complete
date language
2017-03-01 00:00:00 python 10
r 8
1_2 python 11
r 16
out = (df_from_json.rename({'python':'Python_R', 'r':'Python_R'}, level=1)
.groupby(level=[0,1]).sum())
print (out)
ex_complete
date language
2017-01-01 Python_R 14
2017-02-01 Python_R 13
2017-03-01 Python_R 18