I have a toy dataframe df
with 2 columns of class integer
and factor
. However, when I tried to check whether a column is factor, I got incorrect result as follows
num <- c(1:5)
fac <- factor(letters[1:5])
df <- data.frame(num, fac)
df
# num fac
1 1 a
2 2 b
3 3 c
4 4 d
5 5 e
cols <- colnames(df)
for (col in cols) {
print(col)
print(is.factor(df$col))
}
[1] "num"
[1] FALSE
[1] "fac"
[1] FALSE
What I did wrong. How can I check whether a column in a dataframe is factor or binary?
>Solution :
The problem is that df$col
refers to a column named "col" in the data frame df
. That column doesn’t exist – you’d think that is.factor
might return an error in that case, but it returns FALSE instead.
You could refer to the column a different way:
for(col in cols) {
print(col)
print(is.factor(df1[, col]))
}
[1] "num"
[1] FALSE
[1] "fac"
[1] TRUE
But most people would use str
to summarise column types:
str(df)
'data.frame': 5 obs. of 2 variables:
$ num: int 1 2 3 4 5
$ fac: Factor w/ 5 levels "a","b","c","d",..: 1 2 3 4 5
Or you could use sapply
instead of a loop:
sapply(df, is.factor)
num fac
FALSE TRUE
sapply(df1, class, USE.NAMES = TRUE)
num fac
"integer" "factor"