pandas groupby middle heading

How to group this data with pandas by heading in column sl. no
I have data like below in CSV or excel.

sl. No v1 v2 v3
Heading1
1 243 45 3244
2 3 25 33
3 43 324 54
Head2
1 3 45 54
2 24 4 42

Here is the dummy csv file

sl. No,v1,v2,v3,v4
Heading1,,,,
1,243,45,3244,
2,3,25,33,
3,43,324,54,
Head2,,,,
1,3,45,54,
2,24,4,42,

>Solution :

By pattern

To group by string/pattern match, you can use a string operation (str.startswith/str.contains/…) and cumsum to increment a group number:

group = df['sl. No'].str.startswith('Head').cumsum()

Or, to get the original names combine with where and ffill:

group = df['sl. No'].where(df['sl. No'].str.startswith('Head')).ffill()

By "type"

To identify the headings as non-integers, use a variant with pandas.to_numeric:

group = df['sl. No'].where(pd.to_numeric(df['sl. No'], errors='coerce').isna()).ffill()

To use the fact the other columns are empty/NaN, isna/isnull/eq+cumsum:

group = df['v1'].isna().cumsum()
# or for empty strings
# group = df['v1'].eq('').cumsum()

Or:

group = df['sl. No'].where(df['v1'].isna()).ffill()
# or for empty strings
# group = df['sl. No'].where(df['v1'].eq('')).ffill()

Demo:

df.assign(group1 = df['sl. No'].str.startswith('Head').cumsum(),
          group2 = df['sl. No'].where(df['sl. No'].str.startswith('Head')).ffill(),
          group3 = df['sl. No'].where(pd.to_numeric(df['sl. No'], errors='coerce').isna()).ffill(),
          group4 = df['v1'].isna().cumsum(),
          group5 = df['sl. No'].where(df['v1'].isna()).ffill(),
         )

     sl. No     v1     v2      v3  v4  group1    group2    group3  group4    group5
0  Heading1    NaN    NaN     NaN NaN       1  Heading1  Heading1       1  Heading1
1         1  243.0   45.0  3244.0 NaN       1  Heading1  Heading1       1  Heading1
2         2    3.0   25.0    33.0 NaN       1  Heading1  Heading1       1  Heading1
3         3   43.0  324.0    54.0 NaN       1  Heading1  Heading1       1  Heading1
4     Head2    NaN    NaN     NaN NaN       2     Head2     Head2       2     Head2
5         1    3.0   45.0    54.0 NaN       2     Head2     Head2       2     Head2
6         2   24.0    4.0    42.0 NaN       2     Head2     Head2       2     Head2

Leave a Reply