Home Is there a Pandas function to compare and group a range of rows that satisfy a value

Questions

Is there a Pandas function to compare and group a range of rows that satisfy a value

August 4, 2023

I have a dataframe dfA like this

chromosome  basepair            
chrA        500      
chrA        1000      
chrA        7000      
chrA        20000      
chrA        23000     
chrA        24000    
chrA        35000         
chrB        13000      
chrB        14000     
chrB        14500

For each chromosome A position in dfA I would like to scan the basepair column of adjacent chromosome A rows to identify groups with a sequence separation of 5000 basepairs (i.e. 1-5000). Then repeat for chromosome B and write a new dataframe dfB with the list of all groups identified.

The output for dfB should be

chromosome  basepair    Group ID            
chrA        500         1     
chrA        1000        1      
chrA        20000       2      
chrA        23000       2     
chrA        24000       2
chrA        23000       3     
chrA        24000       3      
chrB        13000       4
chrB        14000       4 
chrB        14500       4

>Solution :

Assuming you want to change group whenever the value is > 5000, or when it goes backwards:

df['Group ID'] = (~df.groupby('chromosome')['basepair']
                     .diff().between(0, 5000)
                  ).cumsum()

Output:

  chromosome  basepair  Group ID
0       chrA       500         1
1       chrA      1000         1
2       chrA     20000         2
3       chrA     23000         2
4       chrA     24000         2
5       chrA     23000         3
6       chrA     24000         3
7       chrB     13000         4
8       chrB     14000         4
9       chrB     14500         4