Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Dataframe check next months data and return specific column

I want to return the Bill_Status for the next month. This information should be validated against two fields: Cus_Number and Cus_Account

Here is my DataFrame:

import pandas as pd  
  
# assign data of lists.  
data = {'Date': ['2022-01-31', '2022-02-28', '2022-03-31', '2022-04-30', '2022-01-31', '2022-02-28', '2022-03-31', '2022-04-30', '2022-01-31', '2022-02-28', '2022-03-31', '2022-04-30'],
        'Cus_Number': ['1487', '1487', '1487', '1487', '1278', '1278', '1278', '1278', '1278', '1278', '1278', '1278'],
        'Cus_Account': ['101', '101', '101', '101', '120', '120', '120', '120', '122', '122', '122', '122'],
        'Bill_Status': [0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0]}  
  
# Create DataFrame  
df = pd.DataFrame(data)  
  
# Print the output.  
print(df)  

For my current attempt I used the following code:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df=df.reset_index()
df['Date']=pd.to_datetime(df['Date'],infer_datetime_format=True)
df['Next_Month']=df['Date']+pd.DateOffset(months=1)
res=df.merge(df,right_on='Date',left_on='Next_Month',how='left')

With the following output:
enter image description here

This is only a small sample however in the main data I have thousands of Cus_Number and Cus_Account where both of them should be the same and return the Bill_Status for the upcoming month. I am struggling to find a solution to validate both Cus_Number and Cus_Account and return the Bill_Status for the next month. any assistance will be appreciated.

Expected Output:

enter image description here

>Solution :

I looked at your code and unless I miss the intent I do not believe you would want to shift your Bill_Status_Next_Month based on a different customers Bill_Status. I addressed this by using a groupby() and shift() so your Bill_Status_Next_Month will only shift for the specified customer/account grouping. If this is not what your intent is please let me know and I can update the code. I also went ahead and filled what would be N/A with 0

data = {'Date': ['2022-01-31', '2022-02-28', '2022-03-31', '2022-04-30', '2022-01-31', '2022-02-28', '2022-03-31', '2022-04-30'],
        'Cus_Number': ['1487', '1487', '1487', '1487', '1278', '1278', '1278', '1278'],
        'Cus_Account': ['101', '101', '101', '101', '120', '120', '120', '120'],
        'Bill_Status': [0, 0, 0, 1, 0, 1, 0, 0]}  
  
# Create DataFrame  
df = pd.DataFrame(data)  
  
df['Date'] = pd.to_datetime(df['Date'], infer_datetime_format=True)
df['Bill_Status_Next_Month'] = df.sort_values(['Date', 'Cus_Number', 'Cus_Account']).groupby(['Cus_Number', 'Cus_Account'])['Bill_Status'].shift(-1).fillna(0)
df
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading