how to count and group categorical data by range in r

Advertisements

I have data from a questionnaire that has a column for year of birth. So the range of data was too large and my mapping became confusing. I’m now trying to take the years, group them up by decade decade, and then chart them. But I don’t know how to group them.

my data is like:

birth_year <- data.frame("years"=c(
  "1920","1923","1930","1940","1932","1935","1942","1944","1952","1956","1996","1961",
  "1962","1966","1978","1987","1998","1999","1967","1934","1945","1988","1976","1978",
  "1951","1986","1942","1999","1935","1920","1933","1987","1998","1999","1931","1977",
  "1920","1931","1977","1999","1967","1992","1998","1984"
))

and my plot is like:

However, I want my data by group as:

birth_year   count
(1920-1930]:  5
(1931-1940]:  8
(1941-1950]:  4
(1951-1960]:  3
(1961-1970]:  5
(1971-1980]:  5
(1981-1990]:  5
(1991-2000]:  9

and then plot as a range group.

>Solution :

We can use cut() to group the data, and then plot with ggplot().

birth_year <- data.frame("years"=c(
     "1920","1923","1930","1940","1932","1935","1942","1944","1952","1956","1996","1961",
     "1962","1966","1978","1987","1998","1999","1967","1934","1945","1988","1976","1978",
     "1951","1986","1942","1999","1935","1920","1933","1987","1998","1999","1931","1977",
     "1920","1931","1977","1999","1967","1992","1998","1984"
))

birth_year$yearGroup <- cut(as.integer(birth_year$years),breaks = 8,dig.lab = 4,
                            include.lowest = FALSE)
library(ggplot2)

ggplot(birth_year,aes(x = yearGroup)) + geom_bar()

Leave a ReplyCancel reply