I have a dataframe that looks like this (with many more other countries, this is a sample):
df_dict = {'country': ['Japan','Japan','Japan','Japan','Japan','Japan','Japan', 'Greece','Greece','Greece','Greece','Greece','Greece','Greece'],
'year': [2016, 2017,2018,2019,2020,2021,2022,2016, 2017,2018,2019,2020,2021,2022],
'value': [320, 416, 172, 652, 390, 570, 803, 100, 100, 100, 100, 100, 100,100]}
df = pd.DataFrame(df_dict)
I want to extract all the entries where the value is the same across all years. Sometimes it could be 100, sometimes it could be another value, but the example here is with 100.
I’m not really sure how to go about this
The output should look like this.
df_dict2 = {'country': ['Greece','Greece','Greece','Greece','Greece','Greece','Greece'],
'year': [2016, 2017,2018,2019,2020,2021,2022],
'value': [100, 100, 100, 100, 100, 100,100]}
df2 = pd.DataFrame(df_dict2)
>Solution :
If you want to know the countries that have the same value across all years use groupby.nunique:
s = df.groupby('country')['value'].nunique()
out = list(s[s.eq(1)].index)
Output: ['Greece']
If you also want the value, go for a groupby.agg with boolean indexing through loc:
(df.groupby('country')['value'].agg(['nunique', 'first'])
.loc[lambda d: d.pop('nunique').eq(1), 'first']
)
Output:
country
Greece 100
Name: first, dtype: int64
edit: filtering the original DataFrame:
s = df.groupby('country')['value'].nunique()
df[df['country'].isin(s[s.eq(1)].index)]
Or directly:
df[df.groupby('country')['value'].transform('nunique').eq(1)]
Output:
country year value
7 Greece 2016 100
8 Greece 2017 100
9 Greece 2018 100
10 Greece 2019 100
11 Greece 2020 100
12 Greece 2021 100
13 Greece 2022 100