I’m trying to loop through different sections in my survey in python to calculate the total time taken to answer each specific section (there are three sections: Warehouse, psychometrics and healthcare).
My data is in a csv format and for each section in the survey, time data is displayed in 9 columns that start with a common suffix (ie, ‘sectionname’.timedata). So to calculate the time spent in each section, I wrote a line of code which sums up all the values in those columns for the specific section:
Surveyresp['Warehouse_time'] = Surveyresp[[col for col in Surveyresp.columns if col.startswith('Warehouse.timedata')]].sum(axis=1)
Surveyresp['Psychometrics_time'] = Surveyresp[[col for col in Surveyresp.columns if col.startswith('Psychometric.timedata')]].sum(axis=1)
Surveyresp['Healthcare_time'] = Surveyresp[[col for col in Surveyresp.columns if col.startswith('Healthcare.timedata')]].sum(axis=1)
My question is, is there a way I can just loop through these 3 surveys to make this change?
I tried starting the loop:
Surveys = ['Warehouse.timedata', 'Psychometric.timedata']
for i in Surveys:
print(i)
Surveyresp['i_time'] = Surveyresp[[col for col in Surveyresp.columns if col.startswith('i')]].sum(axis=1)
But all this loop does is create one new variable (that is, ‘i_time’). What am I doing wrong with this loop?
Thanks!
>Solution :
Your idea of the loop was almost correct. 🙂
Surveys = ['Warehouse.timedata', 'Psychometric.timedata']
for i in Surveys:
print(i)
collum_name = i.split(".")
Surveyresp[] = Surveyresp[[col for col in Surveyresp.columns if col.startswith('i')]].sum(axis=1)
But it should look like this to correctly loop over all columns
Surveys = ['Warehouse', 'Psychometric']
for i in Surveys:
print(i)
Surveyresp[i+"_time"] = Surveyresp[[col for col in Surveyresp.columns if col.startswith(i+".timedata")]].sum(axis=1)