Im very new to pandas
I have a CSV that contains 43 states and a count of how many times something has happened in that state.
STATE,Count
AL,1
AK,4
AZ,7
My CSV does not contain every state, how can I ensure that every state is accounted for? If its not in the original dataframe it should have a Count
of 0.
Heres what I have so far, but its giving me Count_x
and Count_y
and its still not got all 50 states.
# Original CSV only has 43 states
states = pd.read_csv("states.csv")
# Create a new dataframe with all states and count set to 0
all_states = [["AL", 0], ["AK", 0], ["AZ", 0], ["AR", 0], ["CA", 0], ["CO", 0], ["CT", 0], ["DE", 0], ["FL", 0], ["GA", 0], ["HI", 0], ["ID", 0], ["IL", 0], ["IN", 0], ["IA", 0], ["KS", 0], ["KY", 0], ["LA", 0], ["ME", 0], ["MD", 0], ["MA", 0], ["MI", 0], ["MN", 0], ["MS", 0], ["MO", 0], ["MT", 0], ["NE", 0], ["NV", 0], ["NH", 0], ["NJ", 0], ["NM", 0], ["NY", 0], ["NC", 0], ["ND", 0], ["OH", 0], ["OK", 0], ["OR", 0], ["PA", 0], ["RI", 0], ["SC", 0], ["SD", 0], ["TN", 0], ["TX", 0], ["UT", 0], ["VT", 0], ["VA", 0], ["WA", 0], ["WV", 0], ["WI", 0], ["WY", 0]]
all_states = pd.DataFrame(all_states, columns=["STATE", "Count"])
# Merge the two Dataframes
new_df = states.merge(all_states, on="STATE")
# Still only has 43 states
new_df
Notice AK
is still missing (and a few other states)
>Solution :
You are merging in the wrong direction; you need to merge states
with all_states
, not the other way around. Use a left
merge and then fillna
with 0
:
new_df = all_states.merge(states, on='STATE', how='left').fillna(0)
Note you may decide to get rid of the Count_x
column as it will be all 0
; you can do that using drop
:
new_df.drop('Count_x', axis=1, inplace=True)