Fail to extract the gender information from the first name in R

Advertisements I attempted to extract gender information from the first name using gender package in R. I tried both ‘ssa’ and ‘genderize’ for argument method. Here is my demo sample code. unique_id <- seq(0:6) first_name <- c("annie j", "Juan", "Richard", "Aj", "Dana", "annie j", "liyuan") demo1 <- as.data.frame(cbind(unique_id, first_name)) For ssa, it uses names based… Read More Fail to extract the gender information from the first name in R

Merge rows if previous row contains a string that starts with a particular sign

Advertisements I have a data frame that looks like this: df <- as.data.frame(rbind(">A1", "aaaa", "bbb", "cccc", ">B2", "dddd", "eeeee","ff", ">C3", "ggggggg", "hhhhh", "iiiii", "jjjjj")) This is what I want to get: df1 <- as.data.frame(rbind(">A1", "aaaabbbcccc", ">B2", "ddddeeeeeff", ">C3", "ggggggghhhhhiiiiijjjjj")) As you can see, I want to merge every row between two rows that contain a… Read More Merge rows if previous row contains a string that starts with a particular sign

Filter dataset by values IF column contains certain string

Advertisements I have a dataset that looks like this: > dput(df) structure(list(Car = c("Mazda", "Mazda", "Mazda", "Mazda", "Mazda", "Mazda", "Lexus"), Date = c("2/20/20", "2/21/20", "2/22/20", "2/23/20", "2/24/20", "2/25/20", "9/3/20")), class = "data.frame", row.names = c(NA, -7L)) I would only like to filter the Dates where the Car=="Mazda". In this scenerio, I would like to remove… Read More Filter dataset by values IF column contains certain string

Conditionally running a function on a row based on values in another column using a function

Advertisements I’m looking at writing an R function that operates on each row of a dataframe. The function needs to perform an adjustment calculation on a measurement column that corresponds to mass/volume of a food or drink where the adjustment calculation differs depending on if it’s a food or a drink, and if it’s a… Read More Conditionally running a function on a row based on values in another column using a function

Conditionally replace first character (or first two) of a column with value from that row's other column (R)

Advertisements I have a dataframe such as the following: ID country region 1 32 32001 2 32 1001 3 68 68001 4 214 19017 5 214 214017 All variables are character, despite being integers. I am cleaning the data and need all of the region variable to be the same. However, due to a clerical… Read More Conditionally replace first character (or first two) of a column with value from that row's other column (R)

Delete columns in R that do not match another dataframe

Advertisements I have two dataframes that look like this: > dput(df) structure(list(first_column = c("value_1", "value_2"), second_column = c("value_1", "value_2")), class = "data.frame", row.names = c(NA, -2L)) > dput(df_new) structure(list(first_column = c("value_1", "value_2"), second_column = c("value_1", "value_2"), third_column = c("value_1", "value_2")), class = "data.frame", row.names = c(NA, -2L)) I would like to match the df_new dataframe… Read More Delete columns in R that do not match another dataframe