PCA select components graphically

I carry out a PCA for the data seta dataset data(decathlon) from the package FactoMineR like:

install.packages("FactoMineR")
library(FactoMineR)

install.packages("devtools")
library("devtools")
 
install_github("kassambara/factoextra")
library("factoextra")
 
install.packages("corrplot")
library("corrplot")
 
 
data("decathlon")
head( decathlon[c("Shot.put", "Shot.put", "Long.jump", "1500m", "Discus", "Competition", "400m", "Javeline", "100m")])
  

options(ggrepel.max.overlaps = Inf)
res.pca <- PCA( decathlon[c("Shot.put", "Shot.put", "Long.jump", "1500m", "Discus", "400m", "Javeline", "100m")], scale.unit=TRUE, ncp=15, graph=TRUE)

and I get a PCA graph of variables.

How can I select an appropriate number of components graphically?

>Solution :

It depends on you, but you may consider cumulative percentage of variance.

You may use factoextra::fviz_eig or

library(dplyr)
res.pca$eig %>%
  as.data.frame() %>%
  mutate(n = row_number()) %>%
  ggplot(aes(x = n, y = `cumulative percentage of variance`)) + 
  geom_col(fill = "steelblue") +
  geom_line() +
  theme_classic() +
  geom_hline(aes(yintercept = 90), lty = 2, color = "red")

Cutoff value 0.9(=90%) can be changed.

In this case select PC1 to PC4(or 5) that explains about 90% of variance of the data.

enter image description here

Leave a Reply