Home What is the best way to return the group that has the largest streak of negative numbers in a column?

Questions

What is the best way to return the group that has the largest streak of negative numbers in a column?

August 2, 2024

My DataFrame is:

import pandas as pd
df = pd.DataFrame(
    {
        'a': [-3, -1, -2, -5, 10, -3, -13, -3, -2, 1, 2, -100],
    }
)

Expected output:

Logic:

I want to return the largest streak of negative numbers. And if there are more than one streak that are the largest, I want to return the first streak. In df there are two negative streaks with size of 4, so the first one is returned.

This is my attempt but whenever I use idxmax() in my code, I want to double check because it gets tricky sometimes in some scenarios.

import numpy as np 
df['sign'] = np.sign(df.a)
df['sign_streak'] = df.sign.ne(df.sign.shift(1)).cumsum()
m = df.sign.eq(-1)

group_sizes = df.groupby('sign_streak').size()
largest_group = group_sizes.idxmax()
largest_group_df = df[df['sign_streak'] == largest_group]

>Solution :

Your code is fine, you could simplify it a bit, avoiding the intermediate columns:

# get sign
s = np.sign(df['a'])
# form groups of successive identical sign
g = s.ne(s.shift()).cumsum()

# keep only negative, get size per group and first group with max size
out = df[g.eq(df[s.eq(-1)].groupby(g).size().idxmax())]

Or, since you don’t really care about the 0/+ difference:

# negative numbers
m = df['a'].lt(0)
# form groups
g = m.ne(m.shift()).cumsum()

out = df[g.eq(df[m].groupby(g).size().idxmax())]

Note: idxmax is always fine if you want the first match.

Output:

   a
0 -3
1 -1
2 -2
3 -5

dataframe

byMR

Published August 02, 2024

Add a comment

Spring boot @PropertySource not working for yaml, but working for properties

byMR

August 2, 2024

Questions

Python str subclass with lazy evaluation of its value

byMR

August 2, 2024

Questions

How to install apt-get in a docker container?

byMR

August 2, 2024

Questions

Fastify Swagger not managing enumerators as expected

byMR

August 2, 2024

Questions

Pandas resample dataframe based on one column but selecting corresponding row from other columns

byMR

August 2, 2024

Questions

Optional returns null value

byMR

August 2, 2024

What is the best way to return the group that has the largest streak of negative numbers in a column?

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

Spring boot @PropertySource not working for yaml, but working for properties

Python str subclass with lazy evaluation of its value

How to install apt-get in a docker container?

Fastify Swagger not managing enumerators as expected

Pandas resample dataframe based on one column but selecting corresponding row from other columns

Optional returns null value

Keep Up to Date with the Most Important News

What is the best way to return the group that has the largest streak of negative numbers in a column?

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

Spring boot @PropertySource not working for yaml, but working for properties

Python str subclass with lazy evaluation of its value

How to install apt-get in a docker container?

Fastify Swagger not managing enumerators as expected

Pandas resample dataframe based on one column but selecting corresponding row from other columns

Optional returns null value

Discover more from Dev solutions