Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Obtaining specific column and row data from a dataframe in pandas in Python

I am newer to programming and am running into an issue I am trying to solve. I am working with a dataframe using panadas in python. I have sorted the table and created some extra columns: https://i.stack.imgur.com/Y6lkN.png

{'Part Number': ['K4SD',
  'K4SD',
  'K4SD',
  'K4SD',
  'K4SD',
  'K4SD',
  'K4SD',
  'K4SD',
  'K4SD',
  'K4SD',
  'QOL2',
  'QOL2',
  'QOL2',
  'QOL2',
  'QOL2',
  'QOL2',
  'QOL2',
  'QOL2',
  'QOL2',
  'QOL2'],
 'Date': [Timestamp('2021-05-17 00:00:00'),
  Timestamp('2021-05-23 00:00:00'),
  Timestamp('2021-07-08 00:00:00'),
  Timestamp('2021-08-17 00:00:00'),
  Timestamp('2021-08-17 00:00:00'),
  Timestamp('2021-10-18 00:00:00'),
  Timestamp('2021-12-18 00:00:00'),
  Timestamp('2021-12-20 00:00:00'),
  Timestamp('2022-02-10 00:00:00'),
  Timestamp('2022-03-31 00:00:00'),
  Timestamp('2021-10-04 00:00:00'),
  Timestamp('2021-10-18 00:00:00'),
  Timestamp('2021-11-03 00:00:00'),
  Timestamp('2021-11-03 00:00:00'),
  Timestamp('2021-11-17 00:00:00'),
  Timestamp('2021-11-24 00:00:00'),
  Timestamp('2021-11-27 00:00:00'),
  Timestamp('2021-12-22 00:00:00'),
  Timestamp('2021-12-24 00:00:00'),
  Timestamp('2022-03-21 00:00:00')],
 'Code': ['SF22',
  'KFS3',
  '3FFS',
  'Replacement needed',
  'LA52',
  'K2KA',
  'Belt Broke',
  'QET6',
  'QET6',
  'P0SF',
  'Testing Broken',
  'DP2L',
  'SR2F',
  'JKO2',
  'DP2L',
  'A2BF',
  'KLL2',
  'Light Off',
  'A3SA',
  'LA52'],
 'Fix': ['na',
  'na',
  'na',
  'Custom Status',
  'na',
  'na',
  'Remade',
  'na',
  'na',
  'na',
  'Testing Procedure Fixed',
  'na',
  'na',
  'na',
  'na',
  'na',
  'na',
  'Light Repair',
  'na',
  'na'],
 'Fixed': ['No',
  'No',
  'No',
  'Yes',
  'No',
  'No',
  'Yes',
  'No',
  'No',
  'No',
  'Yes',
  'No',
  'No',
  'No',
  'No',
  'No',
  'No',
  'Yes',
  'No',
  'No'],
 'Combined': ['SF22',
  'KFS3',
  '3FFS',
  'Replacement needed',
  'LA52',
  'K2KA',
  'Belt Broke',
  'QET6',
  'QET6',
  'P0SF',
  'Testing Broken',
  'DP2L',
  'SR2F',
  'JKO2',
  'DP2L',
  'A2BF',
  'KLL2',
  'Light Off',
  'A3SA',
  'LA52']}

I sorted the dataframe by date and now I would like to create a loop that goes down the table in order row by row. In the loop, if the row in the "Fixed" column is "No", I want to append the value in that row in the "Code" column to a list (which I called list_test). Then, when the row in the "Fixed" column becomes a "Yes", I want to create a new list variable that is a copy of the list_test.

Then, I want to clear the list_test to be an empty list so that it can repeat the process down the column (clearing itself every time there is a "Fix").

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

In my example table above, I would want the output to be something along the lines of:

  • Fixed_Before_3 = ["SF22", "KFS3", "3FFS"]
  • Fixed_Before_6 = ["LA52", "K2KA"]
  • Fixed_Before_10 = ["QET6", "QET6", "P0SF"]
  • Fixed_Before_17 = ["DP2L", "SR2F", "JKO2", "DP2L", "A2BF", "KLL2"]

This is one way I tried to approach the problem:

list_test = []
var_test = {}

for index in df.index:
    var_test[index] = "Fixed_Before_" + str(index)
    
    if df['Fixed'][index] == 'No':
        list_test.append(df['Code'])
    
    if df['Fixed'][index] == 'Yes':
            var_test[index] = list_test
            list_test = []
list_test

Although, when I run the code, the output (https://i.stack.imgur.com/RM0Pj.png) is a very large column and it looks like it includes everything in my column more than a few times rather than the output I included above. I think my problems might be with:

  • The way I iterate throughout the dataframe
  • The conditional statements about dataframes
    • Maybe list_test.append(df[‘Code’]) gives me the whole column instead of the value of the column in the row in my conditional statement?
  • Using a dictionary to create new variables in my loop

If anyone has any helpful information about what I am doing wrong or another approach to my problem that would be greatly appreciated!

>Solution :

The exact expected output is unclear, but here is a suggestion:

mask = df['Fixed'].eq('Yes')
out = (df
 .assign(index=df.index)
 .groupby(['Part Number', mask.shift(fill_value=True).cumsum()])
 .agg(fix_before=('index', 'last'),
      list=('Code', lambda g: list(g.iloc[:-1])))
 .reset_index(drop=True)
 )

Output:

   fix_before                                  list
0           3                    [SF22, KFS3, 3FFS]
1           6                          [LA52, K2KA]
2           9                          [QET6, QET6]
3          10                                    []
4          17  [DP2L, SR2F, JKO2, DP2L, A2BF, KLL2]
5          19                                [A3SA]
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading