Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Extract all entries from a pandas df where the values are the same across all years

I have a dataframe that looks like this (with many more other countries, this is a sample):

df_dict = {'country': ['Japan','Japan','Japan','Japan','Japan','Japan','Japan', 'Greece','Greece','Greece','Greece','Greece','Greece','Greece'],
           'year': [2016, 2017,2018,2019,2020,2021,2022,2016, 2017,2018,2019,2020,2021,2022],
           'value': [320, 416, 172, 652, 390, 570, 803, 100, 100, 100, 100, 100, 100,100]}

df = pd.DataFrame(df_dict)

I want to extract all the entries where the value is the same across all years. Sometimes it could be 100, sometimes it could be another value, but the example here is with 100.

I’m not really sure how to go about this

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

The output should look like this.

df_dict2 = {'country': ['Greece','Greece','Greece','Greece','Greece','Greece','Greece'],
           'year': [2016, 2017,2018,2019,2020,2021,2022],
           'value': [100, 100, 100, 100, 100, 100,100]}

df2 = pd.DataFrame(df_dict2)

>Solution :

If you want to know the countries that have the same value across all years use groupby.nunique:

s = df.groupby('country')['value'].nunique()

out = list(s[s.eq(1)].index)

Output: ['Greece']

If you also want the value, go for a groupby.agg with boolean indexing through loc:

(df.groupby('country')['value'].agg(['nunique', 'first'])
   .loc[lambda d: d.pop('nunique').eq(1), 'first']
)

Output:

country
Greece    100
Name: first, dtype: int64

edit: filtering the original DataFrame:

s = df.groupby('country')['value'].nunique()
df[df['country'].isin(s[s.eq(1)].index)]

Or directly:

df[df.groupby('country')['value'].transform('nunique').eq(1)]

Output:

   country  year  value
7   Greece  2016    100
8   Greece  2017    100
9   Greece  2018    100
10  Greece  2019    100
11  Greece  2020    100
12  Greece  2021    100
13  Greece  2022    100
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading