Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Operations on specific elements of a dataframe in Python

I’m trying to convert kilometer values in one column of a dataframe to mile values. I’ve tried various things and this is what I have now:

def km_dist(column, dist):
    length = len(column)
    for dist in zip(range(length), column):
        if (column == data["dist"] and dist in data.loc[(data["dist"] > 25)]):
            return dist / 5820
        else:
            return dist
    
data = data.apply(lambda x: km_dist(data["dist"], x), axis=1)

The dataset I’m working with looks something like this:

    past_score  dist    income  lab score   gender  race    income_bucket   plays_sports    student_id  lat long
0   8.091553    11.586920   67111.784934    0   7.384394    male    H   3   0   1   0.0 0.0
1   8.091553    11.586920   67111.784934    0   7.384394    male    H   3   0   1   0.0 0.0
2   7.924539    7858.126614 93442.563796    1   10.219626   F   W   4   0   2   0.0 0.0
3   7.924539    7858.126614 93442.563796    1   10.219626   F   W   4   0   2   0.0 0.0
4   7.726480    11.057883   96508.386987    0   8.544586    M   W   4   0   3   0.0 0.0

With my code above, I’m trying to loop through all the "dist" values and if those values are in the right column ("data["dist"]") and greater than 25, divide those values by 5820 (the number of feet in a kilometer). More generally, I’d like to find a way to operate on specific elements of dataframes. I’m sure this is at least a somewhat common question, I just haven’t been able to find an answer for it. If someone could point me towards somewhere with an answer, I would be just as happy.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

Instead your solution filter rows with mask and divide column dist by 5820:

data.loc[data["dist"] > 25, 'dist'] /= 5820

Working same like:

data.loc[data["dist"] > 25, 'dist'] = data.loc[data["dist"] > 25, 'dist'] / 5820

data.loc[data["dist"] > 25, 'dist'] /= 5820
print (data)
   past_score       dist        income  lab      score gender race  \
0    8.091553  11.586920  67111.784934    0   7.384394   male    H   
1    8.091553  11.586920  67111.784934    0   7.384394   male    H   
2    7.924539   1.350194  93442.563796    1  10.219626      F    W   
3    7.924539   1.350194  93442.563796    1  10.219626      F    W   
4    7.726480  11.057883  96508.386987    0   8.544586      M    W   

   income_bucket  plays_sports  student_id  lat  long  
0              3             0           1  0.0   0.0  
1              3             0           1  0.0   0.0  
2              4             0           2  0.0   0.0  
3              4             0           2  0.0   0.0  
4              4             0           3  0.0   0.0  
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading