Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Pandas – loop through dataframe, check column values if empty, if empty then add entire row to new list or df

Currently my database looks like this:

import pandas as pd

output =  [['Owner', 'Database', 'Schema', 'Table', 'Column', 'Comment', 'Status'], ['', 'VICE_DEV', 'AIRFLOW', 'TASK_INSTANCE', '_LOAD_DATETIME', 'Load datetime'], ['', 'VICE_DEV', 'AIRFLOW', 'TEST', '_LOAD_FILENAME', 'load file name', 'ADDED']]

df = pd.DataFrame(output[1:], columns=output[0])

It looks like this:

  Owner  Database   Schema  ...          Column         Comment Status
0        VICE_DEV  AIRFLOW  ...  _LOAD_DATETIME   Load datetime   None
1        VICE_DEV  AIRFLOW  ...  _LOAD_FILENAME  load file name  ADDED

I’m now trying to loop through the Status col, and if Status is None or empty I want to create a list of SQL string queries, and go to the next row and check, etc.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I’ve tried this:

query_list = []

for column in df['Status']:
    if column is None:
        sql = f"ALTER TABLE {df['Table']} ALTER {df['Column']} COMMENT {df['Comment']}; "
        # print(sql)
        query_list.append(sql)

But its not correct, its getting both rows, instead I want to only get the rows with the column df[‘Status’] None (or empty). Also, the formatting is off. I’ve tried by creating an empty list and an empty df.

>Solution :

If I understand it correctly you want to create one query if Status is None. This query uses the values from the particular row that is None. The problem then is that you insert whole columns into your query and not the values from the particular row with Status==None. I think what you want is to iterate over all columns in your dataframe and then check if the status is None. If so, create the query based on that row. This can be done with the following code for example:

import pandas as pd

output =  [['Owner', 'Database', 'Schema', 'Table', 'Column', 'Comment', 'Status'], ['', 'VICE_DEV', 'AIRFLOW', 'TASK_INSTANCE', '_LOAD_DATETIME', 'Load datetime'], ['', 'VICE_DEV', 'AIRFLOW', 'TEST', '_LOAD_FILENAME', 'load file name', 'ADDED']]

df = pd.DataFrame(output[1:], columns=output[0])

query_list = []

for index, row in df.iterrows():
    if row['Status'] is None:
        sql = f"ALTER TABLE {row['Table']} ALTER {row['Column']} COMMENT {row['Comment']}; "
        
        # print(sql)
        query_list.append(sql)
query_list

Output:

['ALTER TABLE TASK_INSTANCE ALTER _LOAD_DATETIME COMMENT Load datetime; ']
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading