Check for class of columns in dataframe

I have a toy dataframe df with 2 columns of class integer and factor. However, when I tried to check whether a column is factor, I got incorrect result as follows

num <- c(1:5)
fac <- factor(letters[1:5])
df <- data.frame(num, fac)
df
# num fac
1   1   a
2   2   b
3   3   c
4   4   d
5   5   e
cols <- colnames(df)    
for (col in cols) {
  print(col)
  print(is.factor(df$col))
}
[1] "num"
[1] FALSE
[1] "fac"
[1] FALSE

What I did wrong. How can I check whether a column in a dataframe is factor or binary?

>Solution :

The problem is that df$col refers to a column named "col" in the data frame df. That column doesn’t exist – you’d think that is.factor might return an error in that case, but it returns FALSE instead.

You could refer to the column a different way:

for(col in cols) {
  print(col)
  print(is.factor(df1[, col]))
}

[1] "num"
[1] FALSE
[1] "fac"
[1] TRUE

But most people would use str to summarise column types:

str(df)
'data.frame':   5 obs. of  2 variables:
 $ num: int  1 2 3 4 5
 $ fac: Factor w/ 5 levels "a","b","c","d",..: 1 2 3 4 5

Or you could use sapply instead of a loop:

sapply(df, is.factor)
  num   fac 
FALSE  TRUE 

sapply(df1, class, USE.NAMES = TRUE)
      num       fac 
"integer"  "factor"

Leave a Reply