How to group this data with pandas by heading in column sl. no
I have data like below in CSV or excel.
sl. No | v1 | v2 | v3 |
---|---|---|---|
Heading1 |
|||
1 | 243 | 45 | 3244 |
2 | 3 | 25 | 33 |
3 | 43 | 324 | 54 |
Head2 |
|||
1 | 3 | 45 | 54 |
2 | 24 | 4 | 42 |
… | … | … | … |
Here is the dummy csv file
sl. No,v1,v2,v3,v4
Heading1,,,,
1,243,45,3244,
2,3,25,33,
3,43,324,54,
Head2,,,,
1,3,45,54,
2,24,4,42,
>Solution :
By pattern
To group by string/pattern match, you can use a string operation (str.startswith
/str.contains
/…) and cumsum
to increment a group number:
group = df['sl. No'].str.startswith('Head').cumsum()
Or, to get the original names combine with where
and ffill
:
group = df['sl. No'].where(df['sl. No'].str.startswith('Head')).ffill()
By "type"
To identify the headings as non-integers, use a variant with pandas.to_numeric
:
group = df['sl. No'].where(pd.to_numeric(df['sl. No'], errors='coerce').isna()).ffill()
To use the fact the other columns are empty/NaN, isna
/isnull
/eq
+cumsum
:
group = df['v1'].isna().cumsum()
# or for empty strings
# group = df['v1'].eq('').cumsum()
Or:
group = df['sl. No'].where(df['v1'].isna()).ffill()
# or for empty strings
# group = df['sl. No'].where(df['v1'].eq('')).ffill()
Demo:
df.assign(group1 = df['sl. No'].str.startswith('Head').cumsum(),
group2 = df['sl. No'].where(df['sl. No'].str.startswith('Head')).ffill(),
group3 = df['sl. No'].where(pd.to_numeric(df['sl. No'], errors='coerce').isna()).ffill(),
group4 = df['v1'].isna().cumsum(),
group5 = df['sl. No'].where(df['v1'].isna()).ffill(),
)
sl. No v1 v2 v3 v4 group1 group2 group3 group4 group5
0 Heading1 NaN NaN NaN NaN 1 Heading1 Heading1 1 Heading1
1 1 243.0 45.0 3244.0 NaN 1 Heading1 Heading1 1 Heading1
2 2 3.0 25.0 33.0 NaN 1 Heading1 Heading1 1 Heading1
3 3 43.0 324.0 54.0 NaN 1 Heading1 Heading1 1 Heading1
4 Head2 NaN NaN NaN NaN 2 Head2 Head2 2 Head2
5 1 3.0 45.0 54.0 NaN 2 Head2 Head2 2 Head2
6 2 24.0 4.0 42.0 NaN 2 Head2 Head2 2 Head2