I am following this tutorial here https://rpubs.com/brandonkopp/creating-a-treemap-in-r on data visualization.
Since the dataset used for this tutorial is no longer available, I learned how to obtain this dataset in the last question I asked (Finding a Dataset That Is No Longer Available?):
outlays <- read.csv('https://web.archive.org/web/20160705194932if_/https://www.whitehouse.gov/sites/default/files/omb/budget/fy2017/assets/outlays.csv')
str(data)
From here, I am now trying to follow the rest of the tutorial:
library(xtable)
library(dplyr)
library(treemap)
#Remove commas from numeric values in number columns
outlays$X2017 <- gsub(",","", outlays$X2017)
#Convert numeric columns to a numeric data type
outlays$X2017 <- as.numeric(outlays$X2017)
#Create new data frame for positive (spending) values
spending <- outlays %>% select(Agency.Name, Bureau.Name, Account.Name, X2017) %>%
group_by(Agency.Name, Bureau.Name, Account.Name) %>%
summarize(X2017 = sum(X2017, na.rm=FALSE)) %>%
filter(X2017 > 0)
#Create new data frame for negative (income) values
receipts <- outlays %>% select(Agency.Name, Bureau.Name, Account.Name, X2017) %>%
group_by(Agency.Name, Bureau.Name, Account.Name) %>%
summarize(X2017 = sum(X2017, na.rm=FALSE)) %>%
filter(X2017 < 0) %>%
mutate(X2017 = abs(X2017))
I get an error when I try to run the next part of code:
treemap(spending, #Your data frame object
index=c("Agency.Name","Bureau.Name","Account.Name"), #A list of your categorical variables
vSize = "X2017", #This is your quantitative variable
type="index", #Type sets the organization and color scheme of your treemap
palette = "Reds", #Select your color palette from the RColorBrewer presets or make your own.
title="Spending in President Obama's 2017 Budget", #Customize your title
fontsize.title = 14 #Change the font size of the title
)
This is the error I get:
Error in treemap(spending, index = c("Agency.Name", "Bureau.Name", "Account.Name"), :
<index> contains invalid column names
It says there are "invalid" column names – but is there a way to figure out which of these column names are invalid? Does anyone have any ideas how I can resolve this problem and continue with the visualizations?
Thank you!
>Solution :
Running this code:
outlays <- read.csv('https://web.archive.org/web/20160705194932if_/https://www.whitehouse.gov/sites/default/files/omb/budget/fy2017/assets/outlays.csv')
str(outlays)
library(xtable)
library(dplyr)
library(treemap)
#Remove commas from numeric values in number columns
outlays$X2017 <- gsub(",","", outlays$X2017)
#Convert numeric columns to a numeric data type
outlays$X2017 <- as.numeric(outlays$X2017)
#Create new data frame for positive (spending) values
spending <- outlays %>% select(Agency.Name, Bureau.Name, Account.Name, X2017) %>%
group_by(Agency.Name, Bureau.Name, Account.Name) %>%
summarize(X2017 = sum(X2017, na.rm=FALSE)) %>%
filter(X2017 > 0)
#Create new data frame for negative (income) values
receipts <- outlays %>% select(Agency.Name, Bureau.Name, Account.Name, X2017) %>%
group_by(Agency.Name, Bureau.Name, Account.Name) %>%
summarize(X2017 = sum(X2017, na.rm=FALSE)) %>%
filter(X2017 < 0) %>%
mutate(X2017 = abs(X2017))
treemap(spending, #Your data frame object
index=c("Agency.Name","Bureau.Name","Account.Name"), #A list of your categorical variables
vSize = "X2017", #This is your quantitative variable
type="index", #Type sets the organization and color scheme of your treemap
palette = "Reds", #Select your color palette from the RColorBrewer presets or make your own.
title="Spending in President Obama's 2017 Budget", #Customize your title
fontsize.title = 14 #Change the font size of the title
)
Produces the following output:
Which seems to be the expected output?