Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Find consecutive indices of common values in a group

For the DataFrame below:

df = pd.DataFrame({'Name': ['Tesla','Tesla','Tesla','Toyota','Ford','Ford','Ford','BMW','BMW','BMW','Mercedes','Mercedes','Mercedes'],
                   'Type': ['Model X','Model X','Model X','Corolla','Bronco','Bronco','Mustang','3 Series','-','-','Alpha','Alpha','S-Class'],
                   'Year': [2015, 2015, 2015, 2017, 2018, 2018, 2020, 2015, 2015, 2017, 2018, 2018, 2020],
                   'Price': [85000, 90000, 95000, 20000, 35000, 35000, 45000, 40000, 40000, 65000, 50000, 50000, 75000],
                   'Color': ['White','White','White','Red','Blue','Blue','Yellow','Silver','Silver','Black','White','White','Black']
                  })

df

    Name    Type     Year   Price   Color
0   Tesla   Model X  2015   85000   White
1   Tesla   Model X  2015   90000   White
2   Tesla   Model X  2015   95000   White
3   Toyota  Corolla  2017   20000   Red
4   Ford    Bronco   2018   35000   Blue
5   Ford    Bronco   2018   35000   Blue
6   Ford    Mustang  2020   45000   Yellow
7   BMW     3 Series 2015   40000   Silver
8   BMW  -   2015   40000   Silver
9   BMW  -   2017   65000   Black
10  Mercedes Alpha   2018   50000   White
11  Mercedes Alpha   2018   50000   White
12  Mercedes S-Class 2020   75000   Black

For each Name column groups, I am trying to find the corresponding start index and end index of consecutive common values in other columns. if the value repeat consecutively it should be considered as common and store it to a dictionary with key as start index and value as end index.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

For example Tesla has Model X common in the Type column, so in the Type column I should get the start index of Model X and end index of Model X as the first value in a dictionary with start index : `end index as the key value pair of the dictionary.

Similarly, for the Color column Blue should have start index as 4 and end index as 5 and for the Year column 2015 should have start index as 0 and end index as 2.

Expected Output:

Name :

{0: 2, 3: 3, 4: 6, 7: 9, 10: 12}

Type :

{0: 2, 3: 3, 4: 5, 6: 6, 7: 7, 8: 9, 10: 11, 12: 12}

Year :

{0: 2, 3: 3, 4: 5, 6: 6, 7: 8, 9: 9, 10: 11, 12: 12] }

Price:

{0: 0, 1: 1, 2: 2, 3: 3, 4: 5, 6: 6, 7: 8, 9: 9, 10: 11, 12: 12} 

Color:

{0: 2, 3: 3, 4: 5, 6: 6, 7: 8, 9: 9, 10: 11, 12:12}

What I tried

def find_indices(df, column):
    df1 = df.index.to_series().groupby(df[column]).agg(['first', 'last']).reset_index()
    df1 = df1.sort_values("first").reset_index()
    first_last_rows = df1.set_index('first')['last'].to_dict()
    print(column + ":")
    print(first_last_rows)

Output I got

Name:
{0: 2, 3: 3, 4: 6, 7: 9, 10: 12}
Type:
{0: 2, 3: 3, 4: 5, 6: 6, 7: 7, 8: 9, 10: 11, 12: 12}
Year:
{0: 8, 3: 9, 4: 11, 6: 12}
Price:
{0: 0, 1: 1, 2: 2, 3: 3, 4: 5, 6: 6, 7: 8, 9: 9, 10: 11, 12: 12}
Color:
{0: 11, 3: 3, 4: 5, 6: 6, 7: 8, 9: 12}

In the output I got Year and Color columns end values are from disjoint group so it is not finding the consecutive common values(or the values are for different Name)

>Solution :

It seems you want a group of consecutive values. For that a well known method is to use shift and cumsum and then groupby, and then for your problem you take first index and last index of each group and then build a dict as below:

col = "Year"
col_dict = dict(
    df.groupby(df[col].ne(df[col].shift()).cumsum())[col]
    .agg(lambda x: (x.index[0], x.index[-1]))
    .values
)
print(col_dict)

{0: 2, 3: 3, 4: 5, 6: 6, 7: 8, 9: 9, 10: 11, 12: 12}
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading