In Pandas, calculating the number of Trues can easily done by the .sum() function on either a column (axis=0) or row (axis=1).
However, in polars, this only seems to work on individual columns:
Input:
s = pl.DataFrame({"a": [True, False, True], "b":[True, True, False]})
print(s)
# Number of Trues in each column (This works)
print(s.sum(axis=0))
# Number of Trues in each row (This does not work)
print(s.sum(axis=1))
Output:
shape: (3, 2)
┌───────┬───────┐
│ a ┆ b │
│ --- ┆ --- │
│ bool ┆ bool │
╞═══════╪═══════╡
│ true ┆ true │
│ false ┆ true │
│ true ┆ false │
└───────┴───────┘
shape: (1, 2)
┌─────┬─────┐
│ a ┆ b │
│ --- ┆ --- │
│ u32 ┆ u32 │
╞═════╪═════╡
│ 2 ┆ 2 │
└─────┴─────┘
---------------------------------------------------------------------------
PanicException Traceback (most recent call last)
Cell In[125], line 5
2 print(s)
3 print(s.sum(axis=0))
----> 5 s.sum(axis=1)
File c:\Users\xxxx\.venv\lib\site-packages\polars\dataframe\frame.py:7006, in DataFrame.sum(self, axis, null_strategy)
7004 return self._from_pydf(self._df.sum())
7005 if axis == 1:
-> 7006 return wrap_s(self._df.hsum(null_strategy))
7007 raise ValueError("Axis should be 0 or 1.")
PanicException: `add` operation not supported for dtype `bool`
How can I achieve the calculation over the axis=1?
For non-boolean values this works, but for boolean values not.
(My polars verion is 0.16.18)
Thanks.
>Solution :
Solution: update your polars version
On polars 0.18.4 this looks correct:
In [36]: s = pl.DataFrame({"a": [True, False, True], "b":[True, True, False]})
...: print(s)
...: # Number of Trues in each column (This works)
...: print(s.sum(axis=0))
...: # Number of Trues in each row (This does not work)
...: print(s.sum(axis=1))
shape: (3, 2)
┌───────┬───────┐
│ a ┆ b │
│ --- ┆ --- │
│ bool ┆ bool │
╞═══════╪═══════╡
│ true ┆ true │
│ false ┆ true │
│ true ┆ false │
└───────┴───────┘
shape: (1, 2)
┌─────┬─────┐
│ a ┆ b │
│ --- ┆ --- │
│ u32 ┆ u32 │
╞═════╪═════╡
│ 2 ┆ 2 │
└─────┴─────┘
shape: (3,)
Series: 'a' [u32]
[
2
1
1
]