CSV File
I have csv file imported as a dataframe on Python pandas and wish to take 2 columns clean the data and produce a Boxplot.
The selected columns are both quantitative so I wish to group all the values in one of the columns and make it qualitative of sorts.
Here is my code so far:
import pandas as pd
import numpy as np
import seaborn as sns
bottle['Depthm'] = pd.to_numeric(bottle['Depthm'], errors='coerce')
bottle['R_TEMP'] = pd.to_numeric(bottle['R_TEMP'], errors='coerce')
df = pd.DataFrame(bottle)
df['R_TEMP']=df['R_TEMP'].replace(0, np.nan)
df['Depthm']=df['Depthm'].replace(0, np.nan)
df1 = df.groupby(['Depthm', 'R_TEMP'])
print(df.groupby(['Depthm', 'R_TEMP']).groups)
df1 = pd.DataFrame({'Depthm':['0-10','10-20','20-30','30-40','40-50']})
bins= [0,10,20,30,40,50]
df.groupby(pd.cut(df["0-10"], np.arange(0, 10, 10))).sum()
print(df['DepthGroup'])
%matplotlib inline
sns.boxplot(x ='R_TEMP', y =['DepthGroup'], grid = False, data = df)
plt.xlabel('Reported Temperature')
plt.ylabel('Depth group')
I am a first year uni student and am genuinely at a loss on how to solve this issue after browsing both the web and my notes. This is my first time sending a quesiton in so I apologise for my formatting and any help is appreciated.
The expected result is something such as this:
>Solution :
Use:
temp = pd.DataFrame({'depth': np.random.randint(1,51, 100)})
pd.cut(temp['depth'], [10,20,30,40,50])
pd.cut bin values into discrete intervals. Then you can do groupby on the results. Since you did not provide data I made a scenario:
temp = pd.DataFrame({'depth': np.random.randint(1,51, 100), 'reported temp':np.random.randint(0,40, 100)})
temp['g'] = pd.cut(temp['depth'], [0,10,20,30,40,50])
import seaborn as sns
temp2 = temp.groupby('g')['reported temp'].apply(list).reset_index()
temp3 = temp2.explode('reported temp')
ax = sns.boxplot(x=temp3['g'], y=temp3['reported temp'])
