Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Pandas GroupBy values within a range

CSV File
I have csv file imported as a dataframe on Python pandas and wish to take 2 columns clean the data and produce a Boxplot.
The selected columns are both quantitative so I wish to group all the values in one of the columns and make it qualitative of sorts.

Here is my code so far:

import pandas as pd
import numpy as np
import seaborn as sns

bottle['Depthm'] = pd.to_numeric(bottle['Depthm'], errors='coerce')
bottle['R_TEMP'] = pd.to_numeric(bottle['R_TEMP'], errors='coerce')

df = pd.DataFrame(bottle)

df['R_TEMP']=df['R_TEMP'].replace(0, np.nan)
df['Depthm']=df['Depthm'].replace(0, np.nan) 

df1 = df.groupby(['Depthm', 'R_TEMP'])
print(df.groupby(['Depthm', 'R_TEMP']).groups)

df1 = pd.DataFrame({'Depthm':['0-10','10-20','20-30','30-40','40-50']})

bins= [0,10,20,30,40,50]
df.groupby(pd.cut(df["0-10"], np.arange(0, 10, 10))).sum()
print(df['DepthGroup'])

%matplotlib inline
sns.boxplot(x ='R_TEMP', y =['DepthGroup'], grid = False, data = df)
plt.xlabel('Reported Temperature')
plt.ylabel('Depth group')

I am a first year uni student and am genuinely at a loss on how to solve this issue after browsing both the web and my notes. This is my first time sending a quesiton in so I apologise for my formatting and any help is appreciated.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

The expected result is something such as this:

Expected Result

>Solution :

Use:

temp = pd.DataFrame({'depth': np.random.randint(1,51, 100)})
pd.cut(temp['depth'], [10,20,30,40,50])

pd.cut bin values into discrete intervals. Then you can do groupby on the results. Since you did not provide data I made a scenario:

temp = pd.DataFrame({'depth': np.random.randint(1,51, 100), 'reported temp':np.random.randint(0,40, 100)})
temp['g'] = pd.cut(temp['depth'], [0,10,20,30,40,50])
import seaborn as sns
temp2 = temp.groupby('g')['reported temp'].apply(list).reset_index()
temp3 = temp2.explode('reported temp')
ax = sns.boxplot(x=temp3['g'], y=temp3['reported temp'])

Output:
enter image description here

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading