Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

get the number of involved singer in a phase

I have a dataset like this

import pandas as pd 
df = pd.read_csv("music.csv") 
df
name date singer language phase
1 Yes or No 02.01.20 Benjamin Smith en 1
2 Parabens 01.06.21 Rafael Galvao;Simon Murphy pt;en 2
3 Love 12.11.20 Michaela Condell en 1
4 Paz 11.07.19 Ana Perez; Eduarda Pinto es;pt 3
5 Stop 12.01.21 Michael Conway;Gabriel Lee en;en 1
6 Shalom 18.06.21 Shimon Cohen hebr 1
7 Habibi 22.12.19 Fuad Khoury ar 3
8 viva 01.08.21 Veronica Barnes en 1
9 Buznanna 23.09.20 Kurt Azzopardi mt 1
10 Frieden 21.05.21 Gabriel Meier dt 1
11 Uruguay 11.04.21 Julio Ramirez es 1
12 Beautiful 17.03.21 Cameron Armstrong en 3
13 Holiday 19.06.20 Bianca Watson en 3
14 Kiwi 21.10.20 Lachlan McNamara en 1
15 Amore 01.12.20 Vasco Grimaldi it 1
16 La vie 28.04.20 Victor Dubois fr 3
17 Yom 21.02.20 Ori Azerad; Naeem al-Hindi hebr;ar 2
18 Elefthería 15.06.19 Nikolaos Gekas gr 1

I convert it to 1NF.

import pandas as pd
import numpy as np
 
df = pd.read_csv("music.csv") 
df['language']=df['language'].str.split(';')
df['singer']=df['singer'].str.split(";")
df.explode(['language','singer'])
d= pd.DataFrame(df)
d

And I create a dataframe. Now I would like to find out which phase has the most singers involved.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I used this

df= df.group.by('singer')
df['phase']. value_counts(). idxmax()

But I could not get a solution

The dataframe has 42 observations, so some singers occur again

>Solution :

You do not need to split/explode, you can directly count the number of ; per row and add 1:

df['singer'].str.count(';').add(1).groupby(df['phase']).sum()

If you want the classical split/explode:

(df.assign(singer=df['singer'].str.split(';'))
   .explode('singer')
   .groupby('phase')['singer'].count()
)

output:

phase
1    12
2     4
3     6
Name: singer, dtype: int64
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading