Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to insert a value in a column's specific ranging rows, according to a condition. Pandas

I am working on a dataframe that has a column named season (newly created, np.nan filled), another column is match_id, it’s values are like: match 1 has match_id 1, match 2 has match_id 2, … , match n has match_id n. It’s cricket (close to baseball) dataset so it’s ball by ball. 1 match has 20+20 overs max (Each over has 6 balls). So match_id 1 is approx from index 0 to 240. Then match_id 2 is approx from index 241 to 480. Data is ball by ball (1 row for 1 ball)/match by match(approx 240 rows for 1 match)/ Season by Season (approx 14160 rows for 1 season).

My condition is that if match_id is from 1 to 59, place 2017 in those season column rows.

In my dataset match_id and other columns pre existed. I created np.nan column season, now I want to fill it.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

my data looks like,

In[]: df_raw.head(6)
out[]:
    season  match_id    inning  batting_team         bowling_team                  over ball
0   NaN     1           1       Sunrisers Hyderabad  Royal Challengers Bangalore   1    1
1   NaN     1           1       Sunrisers Hyderabad  Royal Challengers Bangalore   1    2
2   NaN     1           1       Sunrisers Hyderabad  Royal Challengers Bangalore   1    3
3   NaN     1           1       Sunrisers Hyderabad  Royal Challengers Bangalore   1    4
4   NaN     1           1       Sunrisers Hyderabad  Royal Challengers Bangalore   1    5
5   NaN     1           1       Sunrisers Hyderabad  Royal Challengers Bangalore   1    6

I tried these type of methods but it didn’t helped,

n=1
for i in ["match_id"][:59]:  
    df_raw['match_id'] = df_raw['match_id'].mask(df_raw['match_id']==[n], 2017)
    n=n+1

["match_id"][:59] this is the issue, but how can I put a range as a condition? [:59] is meant to be the range values of match_id, not the index.

>Solution :

Alternatively use loc function:

df.loc[(df['match_id']<=59) & (df['match_id']>=1), 'season'] = 2017

Note that since season column contains NaNs it will be stored as floating point numbers. When you have finished filling in the season values you can convert the values to integers

df['season'] = df['season'].astype('int')
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading