Home Masking the Zip Codes

Questions

Masking the Zip Codes

January 30, 2023

I’m taking a course and I need to solve the following assignment:
"In this part, you should write a for loop, updating the df_users dataframe.

Go through each user, and update their zip code, to Safe Harbor specifications:

If the user is from a zip code for the which the “Geographic Subdivision” is less than equal to 20,000, change the zip code in df_users to ‘0’ (as a string)
Otherwise, zip should be only the first 3 numbers of the full zip code
Do all this by directly updating the zip column of the df_users DataFrame
Hints:

This will be several lines of code, looping through the DataFrame, getting each zip code, checking the geographic subdivision with the population in zip_dict, and setting the zip_code accordingly.
Be very aware of your variable types when working with zip codes here."

Here you can find all the data necessary to understand the context:

https://raw.githubusercontent.com/DataScienceInPractice/Data/master/

assignment: ‘A4’

data_files: user_dat.csv, zip_pop.csv

After cleaning the data from user_dat.csv leaving only the columns: ‘age’, ‘zip’ and ‘gender’, and creating a dictionary from zip_pop.csv that contains the population of the first 3 digits from all the zipcodes; I wrote this code:

# Loop through the dataframe's to get each zipcode
for zipcode in df_users['zip']:
# check if the zipcode's 3 first numbers from the dataframe, correspond to a population of more or less than 20.000 people
    if zip_dict[zipcode[:len(zipcode) - 2]] <= 20000:

        # if less, change zipcode value to string zero.
        df_users.loc[df_users['zip'] == zipcode, 'zip'] = '0'
    else:

        # If more, preserve only the first 3 digits of the zipcode.
        df_users.loc[df_users['zip'] == zipcode, 'zip'] = zipcode[:len(zipcode) - 2]

This code works halfways and I don’t understand why.
It changes the zipcode to 0 if the population is less than 20.000 people, and also changes the first zipcodes (up until the ones that start with ‘078’) but then it returns this error message:

KeyError Traceback (most recent call last)
/var/folders/95/4vh4zhc1273fgmfs4wyntxn00000gn/T/ipykernel_44758/1429192050.py in < module >
1 for zipcode in df_users['zip']:
----> 2 if zip_dict[zipcode[:len(zipcode) - 2]] <= 20000:
3 df_users.loc[df_users['zip'] == zipcode, 'zip'] = '0'
4 else:
5 df_users.loc[df_users['zip'] == zipcode, 'zip'] = str(zipcode[:len(zipcode) - 2])

KeyError: '0'

I get that the problem is in the last line of code, because I’ve been doing every line at a time and each of them worked, until I put that last one. And if I just print the zipcodes instead of that last line, it also works!

Can anyone can help me understand why my code is wrong?

>Solution :

You’re modifying a collection of values (i.e. df_users['zip']) whilst you’re iterating over it. This is a common anti pattern. If a loop is absolutely required, then you could consider iterating over df_users['zip'].unique() instead. That creates a copy of all the unique zip codes, solving your current error, and it means that you aren’t redoing work when you encounter a duplicate zipcode.

If a loop is not required, then there are better (more pandas style) ways to go about your problem. I would suggest something like (untested):

zip_start = df_users['zip'].str[:-2]
df_users['zip'] = zip_start.where(zip_start.map(zip_dict) > 20000, other="0")

data-science

byMR

Published January 30, 2023

Add a comment

Selenium unable to locate button

byMR

January 30, 2023

Questions

An interest calculation task I'm working on doesn't output the proper results

byMR

January 30, 2023

Questions

Don't understand this collection class syntax

byMR

January 30, 2023

Questions

Intersect on HashSet results in compile error

byMR

January 30, 2023

Questions

Find youngest member of a list of people

byMR

January 30, 2023

Questions

How to create a new variable that tells us if the value in a given column is unique or not?

byMR

January 30, 2023

Masking the Zip Codes

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

Selenium unable to locate button

An interest calculation task I'm working on doesn't output the proper results

Don't understand this collection class syntax

Intersect on HashSet results in compile error

Find youngest member of a list of people

How to create a new variable that tells us if the value in a given column is unique or not?

Keep Up to Date with the Most Important News

Masking the Zip Codes

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

Selenium unable to locate button

An interest calculation task I'm working on doesn't output the proper results

Don't understand this collection class syntax

Intersect on HashSet results in compile error

Find youngest member of a list of people

How to create a new variable that tells us if the value in a given column is unique or not?

Discover more from Dev solutions