Obtaining specific column and row data from a dataframe in pandas in Python

June 6, 2022

I am newer to programming and am running into an issue I am trying to solve. I am working with a dataframe using panadas in python. I have sorted the table and created some extra columns: https://i.stack.imgur.com/Y6lkN.png

{'Part Number': ['K4SD',
  'K4SD',
  'K4SD',
  'K4SD',
  'K4SD',
  'K4SD',
  'K4SD',
  'K4SD',
  'K4SD',
  'K4SD',
  'QOL2',
  'QOL2',
  'QOL2',
  'QOL2',
  'QOL2',
  'QOL2',
  'QOL2',
  'QOL2',
  'QOL2',
  'QOL2'],
 'Date': [Timestamp('2021-05-17 00:00:00'),
  Timestamp('2021-05-23 00:00:00'),
  Timestamp('2021-07-08 00:00:00'),
  Timestamp('2021-08-17 00:00:00'),
  Timestamp('2021-08-17 00:00:00'),
  Timestamp('2021-10-18 00:00:00'),
  Timestamp('2021-12-18 00:00:00'),
  Timestamp('2021-12-20 00:00:00'),
  Timestamp('2022-02-10 00:00:00'),
  Timestamp('2022-03-31 00:00:00'),
  Timestamp('2021-10-04 00:00:00'),
  Timestamp('2021-10-18 00:00:00'),
  Timestamp('2021-11-03 00:00:00'),
  Timestamp('2021-11-03 00:00:00'),
  Timestamp('2021-11-17 00:00:00'),
  Timestamp('2021-11-24 00:00:00'),
  Timestamp('2021-11-27 00:00:00'),
  Timestamp('2021-12-22 00:00:00'),
  Timestamp('2021-12-24 00:00:00'),
  Timestamp('2022-03-21 00:00:00')],
 'Code': ['SF22',
  'KFS3',
  '3FFS',
  'Replacement needed',
  'LA52',
  'K2KA',
  'Belt Broke',
  'QET6',
  'QET6',
  'P0SF',
  'Testing Broken',
  'DP2L',
  'SR2F',
  'JKO2',
  'DP2L',
  'A2BF',
  'KLL2',
  'Light Off',
  'A3SA',
  'LA52'],
 'Fix': ['na',
  'na',
  'na',
  'Custom Status',
  'na',
  'na',
  'Remade',
  'na',
  'na',
  'na',
  'Testing Procedure Fixed',
  'na',
  'na',
  'na',
  'na',
  'na',
  'na',
  'Light Repair',
  'na',
  'na'],
 'Fixed': ['No',
  'No',
  'No',
  'Yes',
  'No',
  'No',
  'Yes',
  'No',
  'No',
  'No',
  'Yes',
  'No',
  'No',
  'No',
  'No',
  'No',
  'No',
  'Yes',
  'No',
  'No'],
 'Combined': ['SF22',
  'KFS3',
  '3FFS',
  'Replacement needed',
  'LA52',
  'K2KA',
  'Belt Broke',
  'QET6',
  'QET6',
  'P0SF',
  'Testing Broken',
  'DP2L',
  'SR2F',
  'JKO2',
  'DP2L',
  'A2BF',
  'KLL2',
  'Light Off',
  'A3SA',
  'LA52']}

I sorted the dataframe by date and now I would like to create a loop that goes down the table in order row by row. In the loop, if the row in the "Fixed" column is "No", I want to append the value in that row in the "Code" column to a list (which I called list_test). Then, when the row in the "Fixed" column becomes a "Yes", I want to create a new list variable that is a copy of the list_test.

Then, I want to clear the list_test to be an empty list so that it can repeat the process down the column (clearing itself every time there is a "Fix").

In my example table above, I would want the output to be something along the lines of:

Fixed_Before_3 = ["SF22", "KFS3", "3FFS"]
Fixed_Before_6 = ["LA52", "K2KA"]
Fixed_Before_10 = ["QET6", "QET6", "P0SF"]
Fixed_Before_17 = ["DP2L", "SR2F", "JKO2", "DP2L", "A2BF", "KLL2"]

This is one way I tried to approach the problem:

list_test = []
var_test = {}

for index in df.index:
    var_test[index] = "Fixed_Before_" + str(index)
    
    if df['Fixed'][index] == 'No':
        list_test.append(df['Code'])
    
    if df['Fixed'][index] == 'Yes':
            var_test[index] = list_test
            list_test = []
list_test

Although, when I run the code, the output (https://i.stack.imgur.com/RM0Pj.png) is a very large column and it looks like it includes everything in my column more than a few times rather than the output I included above. I think my problems might be with:

The way I iterate throughout the dataframe
The conditional statements about dataframes
- Maybe list_test.append(df[‘Code’]) gives me the whole column instead of the value of the column in the row in my conditional statement?
Using a dictionary to create new variables in my loop

If anyone has any helpful information about what I am doing wrong or another approach to my problem that would be greatly appreciated!

>Solution :

The exact expected output is unclear, but here is a suggestion:

mask = df['Fixed'].eq('Yes')
out = (df
 .assign(index=df.index)
 .groupby(['Part Number', mask.shift(fill_value=True).cumsum()])
 .agg(fix_before=('index', 'last'),
      list=('Code', lambda g: list(g.iloc[:-1])))
 .reset_index(drop=True)
 )

Output:

   fix_before                                  list
0           3                    [SF22, KFS3, 3FFS]
1           6                          [LA52, K2KA]
2           9                          [QET6, QET6]
3          10                                    []
4          17  [DP2L, SR2F, JKO2, DP2L, A2BF, KLL2]
5          19                                [A3SA]