Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

output data frame name and column names into a dataframe in pandas

I have a list of dataframes:

list_of_datasets = [master,
                    assessments,
                    vertical_list,
                    cleanedstids,
                    full_stack_eval,
                    sales]

they all have different sets of column names:

master = pd.DataFrame({'ID': [10010,10010,10010,10010,10010,10010,10010,10010,10010,10010,10010,10010,10010,10010,10010,10010,10013,10013,10013,10013,10013,10013,10013,10013,10013,10013,10013,10013,10013,10013,10013,10013],
'Type': ['Red','Blue','Red','Blue','Red','Blue','Red','Blue','Red','Blue','Red','Blue','Red','Blue','Red','Blue','Red','Blue','Red','Blue','Red','Blue','Red','Blue','Red','Blue','Red','Blue','Red','Blue','Red','Blue'],
'Year': [2018,2018,2019,2019,2020,2020,2021,2021,2022,2022,2018,2018,2019,2019,2021,2021,2018,2018,2019,2019,2020,2020,2021,2021,2022,2022,2018,2018,2019,2019,2021,2021],
'Score': [0,0,0,0,0,0,0,0,0,0,14,24,16,5,87,33,0,0,0,0,0,0,0,0,0,0,11,13,3,16,37,49]})

vertical_list= pd.DataFrame({'ID':[5863, 4506, 4514, 4543],
'Winner_ID': [13463, 20345, 21012, 20476]})

etc. etc.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I would like to get the same number of dataframes that have the name of the dataframe and the column names in 2 columns

master_names:

df_name     column_name

master      ID
master      Type
master      Year

vertical_list_names:

df_name     column_name

vertical_list      ID
vertical_list      Winner_ID

I did try:

column_names_list = list()

for i in list_of_datasets:
  column_names_list.append(i.columns.values)

but yeah this is a list and not a dataframe, and it’s not many dataframes

>Solution :

You can’t directly access an object bound variable name in python.

You should use a dictionary as initial container:

dict_of_dfs = {'master': master, 'vertical_list': vertical_list}

Then you can loop over the items and extract your data:

col_list = [pd.DataFrame({'df_name': [k]*len(d.columns), 'column_name': list(d.columns)})
            for k,d in dict_of_dfs.items()]

output:

[  df_name column_name
 0  master          ID
 1  master        Type
 2  master        Year
 3  master       Score,
          df_name column_name
 0  vertical_list          ID
 1  vertical_list   Winner_ID]

Or, as dictionary:

col_dict = {k: pd.DataFrame({'df_name': [k]*len(d.columns), 'column_name': list(d.columns)})
            for k,d in dict_of_dfs.items()}

output:

{'master':   df_name column_name
 0  master          ID
 1  master        Type
 2  master        Year
 3  master       Score,
 'vertical_list':          df_name column_name
 0  vertical_list          ID
 1  vertical_list   Winner_ID}
output as csv
for k,d in dict_of_dfs.items():
    (pd.DataFrame({'df_name': [k]*len(d.columns),
                   'column_name': list(d.columns)})
       .to_csv(f'{k}.csv')
    )
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading