I have the some data that looks like this:
City 1994 1995 1996 1997
Rome X NaN NaN NaN
New York NaN NaN NaN NaN
Paris NaN X NaN NaN
Sidney NaN NaN NaN NaN
The crosses represent that an event happened in that city in that specific year.
I want to add new years to my data frame, where the data looks like this:
City 1998
Rome X
Sidney X
London X
Madrid X
Basically the new data indicates where events happened in a new different year (so it’ll always have ‘X’).
The result I’m looking for should look like this:
City 1994 1995 1996 1997 1998
Rome X NaN NaN NaN X
New York NaN NaN NaN NaN NaN
Paris NaN X NaN NaN NaN
Sidney NaN NaN NaN NaN X
London NaN NaN NaN NaN X
Madrid NaN NaN NaN NaN X
So it should add the new non existing cities to the city list and put NaN for the already present years for that city and also put an X in the newly added year for the cities where the event occurred.
I tried using df.append() but I get repeaded values
City 1994 1995 1996 1997 1998
Rome X NaN NaN NaN NaN
New York NaN NaN NaN NaN NaN
Paris NaN X NaN NaN NaN
Sidney NaN NaN NaN NaN NaN
Rome NaN NaN NaN NaN X
Sidney NaN NaN NaN NaN X
London NaN NaN NaN NaN X
Madrid NaN NaN NaN NaN X
I also tried using df.merge() but I didn’t get any better results. Any suggestions on how to do it using pandas? I can write some code to manually does it but it defeats the purpose of using such a powerful tool.
>Solution :
You can use df.merge() to achieve this.
import pandas as pd
data = [['Rome','X', None, None, None],['New York' ,None, None, None, None], ['Paris',None, 'X', None, None]]
df = pd.DataFrame(data, columns = ['City','1994', '1995', '1996' , '1997' ])
df.head()
City 1994 1995 1996 1997
0 Rome X None None None
1 New York None None None None
2 Paris None X None None
df2 = pd.DataFrame([['Rome', 'X', ],['Sydney','X'], ['London','X']], columns = ['City', '1998'])
City 1998
0 Rome X
1 Sydney X
2 London X
Then you can merge the two DataFrame
df.merge(df2, how='outer')
City 1994 1995 1996 1997 1998
0 Rome X None None None X
1 New York None None None None NaN
2 Paris None X None None NaN
3 Sydney NaN NaN NaN NaN X
4 London NaN NaN NaN NaN X
You need to specify outer as type of merge to be performed.
outer: use union of keys from both frames, similar to a SQL full outer
join; sort keys lexicographically.