How to insert a value in a column's specific ranging rows, according to a condition. Pandas

Advertisements

I am working on a dataframe that has a column named season (newly created, np.nan filled), another column is match_id, it’s values are like: match 1 has match_id 1, match 2 has match_id 2, … , match n has match_id n. It’s cricket (close to baseball) dataset so it’s ball by ball. 1 match has 20+20 overs max (Each over has 6 balls). So match_id 1 is approx from index 0 to 240. Then match_id 2 is approx from index 241 to 480. Data is ball by ball (1 row for 1 ball)/match by match(approx 240 rows for 1 match)/ Season by Season (approx 14160 rows for 1 season).

My condition is that if match_id is from 1 to 59, place 2017 in those season column rows.

In my dataset match_id and other columns pre existed. I created np.nan column season, now I want to fill it.

my data looks like,

In[]: df_raw.head(6)
out[]:
    season  match_id    inning  batting_team         bowling_team                  over ball
0   NaN     1           1       Sunrisers Hyderabad  Royal Challengers Bangalore   1    1
1   NaN     1           1       Sunrisers Hyderabad  Royal Challengers Bangalore   1    2
2   NaN     1           1       Sunrisers Hyderabad  Royal Challengers Bangalore   1    3
3   NaN     1           1       Sunrisers Hyderabad  Royal Challengers Bangalore   1    4
4   NaN     1           1       Sunrisers Hyderabad  Royal Challengers Bangalore   1    5
5   NaN     1           1       Sunrisers Hyderabad  Royal Challengers Bangalore   1    6

I tried these type of methods but it didn’t helped,

n=1
for i in ["match_id"][:59]:  
    df_raw['match_id'] = df_raw['match_id'].mask(df_raw['match_id']==[n], 2017)
    n=n+1

["match_id"][:59] this is the issue, but how can I put a range as a condition? [:59] is meant to be the range values of match_id, not the index.

>Solution :

Alternatively use loc function:

df.loc[(df['match_id']<=59) & (df['match_id']>=1), 'season'] = 2017

Note that since season column contains NaNs it will be stored as floating point numbers. When you have finished filling in the season values you can convert the values to integers

df['season'] = df['season'].astype('int')

Leave a ReplyCancel reply