Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to calculate corr from a dataframe with non-numeric columns

I have these data set as shown below:

enter image description here

which belong to Pokemon dataset
https://elitedatascience.com/wp-content/uploads/2022/07/Pokemon.csv

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I want to plot the heatmap as shown below:

# Calculate correlations
corr = stats_df.corr()
 
# Heatmap
plt.figure(figsize=(9,8))
sns.heatmap(corr)

But I get this error below; how can I solve it?

enter image description here

>Solution :

To compute the (Pearson) correlation you need to have numeric data.

Try:

df = pd.read_csv('Pokemon.csv', encoding='latin1', index_col='#')
corr = df.select_dtypes('number').drop(columns=['Total', 'Generation']).corr()
sns.heatmap(data=corr)
plt.tight_layout()
plt.show()

Output:

>>> corr
               HP    Attack   Defense   Sp. Atk   Sp. Def     Speed
HP       1.000000  0.422386  0.239622  0.362380  0.378718  0.175952
Attack   0.422386  1.000000  0.438687  0.396362  0.263990  0.381240
Defense  0.239622  0.438687  1.000000  0.223549  0.510747  0.015227
Sp. Atk  0.362380  0.396362  0.223549  1.000000  0.506121  0.473018
Sp. Def  0.378718  0.263990  0.510747  0.506121  1.000000  0.259133
Speed    0.175952  0.381240  0.015227  0.473018  0.259133  1.000000

enter image description here

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading