Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How can I ensure every us state is accounted for in a pandas dataframe?

Im very new to pandas

I have a CSV that contains 43 states and a count of how many times something has happened in that state.

STATE,Count
AL,1
AK,4
AZ,7

My CSV does not contain every state, how can I ensure that every state is accounted for? If its not in the original dataframe it should have a Count of 0.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Heres what I have so far, but its giving me Count_x and Count_y and its still not got all 50 states.

# Original CSV only has 43 states
states = pd.read_csv("states.csv")

# Create a new dataframe with all states and count set to 0
all_states = [["AL", 0], ["AK", 0], ["AZ", 0], ["AR", 0], ["CA", 0], ["CO", 0], ["CT", 0], ["DE", 0], ["FL", 0], ["GA", 0], ["HI", 0], ["ID", 0], ["IL", 0], ["IN", 0], ["IA", 0], ["KS", 0], ["KY", 0], ["LA", 0], ["ME", 0], ["MD", 0], ["MA", 0], ["MI", 0], ["MN", 0], ["MS", 0], ["MO", 0], ["MT", 0], ["NE", 0], ["NV", 0], ["NH", 0], ["NJ", 0], ["NM", 0], ["NY", 0], ["NC", 0], ["ND", 0], ["OH", 0], ["OK", 0], ["OR", 0], ["PA", 0], ["RI", 0], ["SC", 0], ["SD", 0], ["TN", 0], ["TX", 0], ["UT", 0], ["VT", 0], ["VA", 0], ["WA", 0], ["WV", 0], ["WI", 0], ["WY", 0]]
all_states = pd.DataFrame(all_states, columns=["STATE", "Count"])

# Merge the two Dataframes
new_df = states.merge(all_states, on="STATE")

# Still only has 43 states
new_df

Notice AK is still missing (and a few other states)
States

>Solution :

You are merging in the wrong direction; you need to merge states with all_states, not the other way around. Use a left merge and then fillna with 0:

new_df = all_states.merge(states, on='STATE', how='left').fillna(0)

Note you may decide to get rid of the Count_x column as it will be all 0; you can do that using drop:

new_df.drop('Count_x', axis=1, inplace=True)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading