Home Pandas: How do I replace a subset of column values with the same column values in a specific series?

Questions

Pandas: How do I replace a subset of column values with the same column values in a specific series?

October 10, 2024

I’m writing a Pandas script to perform data manipulation on an excel file. First, I load two sheets into dataframes. One is the original data df, the second is a sheet detailing replacements that need to be made in the original data replace.

The script needs to do two things for each row of df.

Replace each instance of 'Name' in df with 'NameReplace' (working)

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.
Visit Medevel
For the same rows in df, replace a slice of the columns (specified by a list) with the values in the same slice of columns in replace

Reproducible Minimal Example of my current implementation:

import pandas

df = pandas.DataFrame([["John", None, None],["Phil", None, None],["John", None, None],["Bob", None, None]], columns=["Name", "Age", "Height"])
replace = pandas.DataFrame([["John", "Dom", 25, 175],["Phil", "Kevin", 56, 145],["Bob", "Michael", 33, 180]], columns=["Name", "NameReplace", "Age", "Height"])

detailsList = ["Age", "Height"]

for i, row in replace.iterrows():
    df.loc[df['Name'] == row['Name'], 'Name'] = row['NameReplace']
    df.loc[df['Name'] == row['NameReplace'], detailsList] = row[detailsList]

print(df)

Step 1) is working with this implementation, but the detailsList columns in df do not get populated.
The current output is

      Name  Age Height
0      Dom  NaN    NaN
1    Kevin  NaN    NaN
2      Dom  NaN    NaN
3  Michael  NaN    NaN

The desired output is

      Name  Age Height
0      Dom  25    175
1    Kevin  56    145
2      Dom  25    175
3  Michael  33    180

I’ve been trying for a while now, and cannot seem to make progress. I also don’t really get why this doesn’t work, so any insight there would be extra appreciated!

Note: Using detailsList to specify the slice of columns is necessary, as in the real solution I am only operating on a specific slice of the full dataframe, unlike the example I’ve given.

>Solution :

The problem is the way that pandas tries to assign a series to a whole dataframe. Anyway, here’s a simple fix that leads to the intended behavior, taking advantage that pandas does the correct thing when you assign with a numpy array rather than with a series.

for i, row in replace.iterrows():
    df.loc[df['Name'] == row['Name'], 'Name'] = row['NameReplace']
    df.loc[df['Name'] == row['NameReplace'], detailsList] = row[detailsList].values

Other optimizations:

Note that you can reuse the df['Name'] == row['Name'] mask. In particular, you save some work with

for i, row in replace.iterrows():
    mask = df['Name'] == row['Name']
    df.loc[mask, 'Name'] = row['NameReplace']
    df.loc[mask, detailsList] = row[detailsList].values

You can avoid iterrows if you use a merge

df = (df[['Name']].merge(replace, on = 'Name')
                  .drop(columns='Name')
                  .rename(columns={'NameReplace':'Name'}))

The catch with this approach is that the rows might end up reordered.

data-manipulation

byMR

Published October 10, 2024

Add a comment

Python – if statements containing multiple boolean conditions – how is flow handled?

byMR

October 10, 2024

Questions

Where is the documentation for the XRandr header?

byMR

October 10, 2024

Questions

Firestore recursive delete function not deleting nested subcollections?

byMR

October 10, 2024

Questions

How to replace SizedBox with the new Gap widget for spacing?

byMR

October 10, 2024

Questions

Excel Lambdas (GROUPBY and PIVOTBY) – Providing a vector of lambdas in function arguments

byMR

October 10, 2024

Questions

Is it possible to write a horizontal if statement with a multi-line body?

byMR

October 10, 2024

Pandas: How do I replace a subset of column values with the same column values in a specific series?

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

Python – if statements containing multiple boolean conditions – how is flow handled?

Where is the documentation for the XRandr header?

Firestore recursive delete function not deleting nested subcollections?

How to replace SizedBox with the new Gap widget for spacing?

Excel Lambdas (GROUPBY and PIVOTBY) – Providing a vector of lambdas in function arguments

Is it possible to write a horizontal if statement with a multi-line body?

Keep Up to Date with the Most Important News

Pandas: How do I replace a subset of column values with the same column values in a specific series?

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

Python – if statements containing multiple boolean conditions – how is flow handled?

Where is the documentation for the XRandr header?

Firestore recursive delete function not deleting nested subcollections?

How to replace SizedBox with the new Gap widget for spacing?

Excel Lambdas (GROUPBY and PIVOTBY) – Providing a vector of lambdas in function arguments

Is it possible to write a horizontal if statement with a multi-line body?

Discover more from Dev solutions