Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to count the number of times a city appears in the dataframe per year?

I am working on a project for university in which I received one dataframe:

      import numpy as np
      import pandas as pd

      df = pd.DataFrame({'City': ['PARIS', 'MADRI', 'RIO', 'RIO',
                                  'PARIS', 'RIO', 'LISBOA', 'RIO'],
                        'Year': [2018, 2015, 2020, 2020, 
                                 2017, 2021, 2022, 2022]})

      print(df)

           City   Year
          PARIS   2018
          MADRI   2015
           RIO    2020
           RIO    2020
          PARIS   2017
           RIO    2021
         LISBOA   2022
           RIO    2022

I need to build a dataframe that displays how many times each city has appeared over the years. The desired output is as per this example dataframe:

      df_out_intended = pd.DataFrame({'Year': [2015,2016,2017,2018,2019,2020,2021,2022],
                                      'PARIS': [0,0,1,1,0,0,0,0],
                                      'MADRI': [1,0,0,0,0,0,0,0],
                                      'RIO': [0,0,0,0,0,2,1,1],
                                      'LISBOA': [0,0,0,0,0,0,0,1]                                                         
                                      })

       print(df_out_intended)

           Year   PARIS  MADRI  RIO LISBOA
           2015     0      1     0    0
           2016     0      0     0    0
           2017     1      0     0    0
           2018     1      0     0    0
           2019     0      0     0    0
           2020     0      0     2    0
           2021     0      0     1    0
           2022     0      0     1    1

I thought about using groupby to do this task. Implementing as follows:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

      df_test = df.groupby(['City', 'Year']).sum()

But the output is not the desired one, the output is:

      print(df_test)

       City     Year
      LISBOA    2022
      MADRI     2015
      PARIS     2017
                2018
      RIO       2020
                2021
                2022

I also tried using value_counts() as follows:

      df_test2 = df(['City', 'Year']).value_counts()

But the error message appears: "TypeError: ‘DataFrame’ object is not callable"

>Solution :

Use crosstab

pd.crosstab(df.Year,df.City)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading