# How to create a ranking variable/function for different periods in a panel data?

I have a dataset, `df`, that looks like this:

Date Code City State Population Quantity QTDPERCAPITA
2020-01 11001 Los Angeles CA 5000000 100000 0.02
2020-02 11001 Los Angeles CA 5000000 125000 0.025
2020-03 11001 Los Angeles CA 5000000 135000 0.027
2020-01 12002 Houston TX 3000000 150000 0.05
2020-02 12002 Houston TX 3000000 100000 0.033
2020-03 12002 Houston TX 3000000 200000 0.066
2021-07 11001 Los Angeles CA 5500499 340000 0.062
2021-07 12002 Houston TX 3250012 211000 0.065

Where`QTDPERCAPITA` is simply `Quantity/Population`. I have multiple cities (4149 to be more precise).

The quantities change according to every month, and so does the population.

I would like to create a new variable that serve as a ranking, ranging from `[0,1]`, where `0` is the city with the lowest `QTDPERCAPITA` in that month, and `1` is the city with the most quantity per capita in that month. Essentially, I want to create a new column that looks like this:

Date Code City State Population Quantity QTDPERCAPITA RANKING
2020-01 11001 Los Angeles CA 5000000 100000 0.02 0
2020-02 11001 Los Angeles CA 5000000 125000 0.025 0
2020-03 11001 Los Angeles CA 5000000 135000 0.027 0
2020-01 12002 Houston TX 3000000 150000 0.05 1
2020-02 12002 Houston TX 3000000 100000 0.033 1
2020-03 12002 Houston TX 3000000 200000 0.066 1
2021-07 11001 Los Angeles CA 5500499 340000 0.062 0
2021-07 12002 Houston TX 3250012 211000 0.065 1

How can I create this column such that the `RANKING` changes every month? I was thinking of a `for` loop that extracts the `QTDPERCAPITA` for every city on every unique date, and creates a new column, `df['RANKING']` with the same `date` and `city`.

### >Solution :

You can use:

``````# MinMax scaler: (rank - min) / (max - min)
ranking = lambda x: (x.rank() - 1) / (len(x) - 1)

# Rank between [0, 1] -> 0 the lowest, 1 the highest
df['RANKING'] = df.groupby('Date')['QTDPERCAPITA'].apply(ranking)

# Rank between [1, 4149] -> 1 the lowest, 4149 the highest
# df['RANKING'] = df.groupby('Date')['QTDPERCAPITA'].rank('dense')
``````

Output:

Date Code City State Population Quantity QTDPERCAPITA RANKING
2020-01 11001 Los Angeles CA 5000000 100000 0.02 0
2020-02 11001 Los Angeles CA 5000000 125000 0.025 0
2020-03 11001 Los Angeles CA 5000000 135000 0.027 0
2020-01 12002 Houston TX 3000000 150000 0.05 1
2020-02 12002 Houston TX 3000000 100000 0.033 1
2020-03 12002 Houston TX 3000000 200000 0.066 1
2021-07 11001 Los Angeles CA 5500499 340000 0.618 1
2021-07 12002 Houston TX 3250012 211000 0.065 0