Advertisements

I need to find out how many of the first N rows of a dataframe make up (just over) 50% of the sum of values for that column.

Here’s an example:

```
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.rand(10, 1), columns=list("A"))
0 0.681991
1 0.304026
2 0.552589
3 0.716845
4 0.559483
5 0.761653
6 0.551218
7 0.267064
8 0.290547
9 0.182846
```

therefore

```
sum_of_A = df["A"].sum()
```

4.868260213425804

and with this example I need to find, starting from row 0, how many rows I need to get a sum of at least 2.43413 (approximating 50% of sum_of_A).

Of course I could iterate through the rows and sum and break when I get over 50%, but is there a more concise/Pythonic/efficient way of doing this?

### >Solution :

I would use `.cumsum()`

, which we can use to get all the rows where the cumulative sum is at least half of the total sum:

```
df[df["A"].cumsum() < df["A"].sum() / 2]
```