I have a text variable with reviews of movies and another variables with ratings – I want to try to use the text reviews to predict the ratings.
Here are some example data:
movie_reviews <- c("I really loved the movie plot", "This movie really sucked", "I really found this movie thought provoking", "ahh what a boring movie", "A wonderful movie, with a wonderful end", "Great action movie: Very thrilling", "Worst movie ever, it never stopped being cheesy", "Enjoying, feelgood movie for the entire family", "I will definitely watch this movie again") movie_ratings <- c(8, 2, 6, 3, 9, 8.5, 3.5, 9.5, 7.5) movie_df <- tibble(movie_reviews, movie_ratings)
For this you can use the
# Create word embedding representations of your text help(textEmbed) reviews_embeddings <- textEmbed(movie_df, model = "bert-base-uncased", # Select model you want from huggingface layers = 11:12) # Select which layers you want to use # Train the word embeddings to the numeric variable using ridge regression reviews_rating_model <- textTrain(reviews_embeddings$movie_reviews, movie_df$movie_ratings) # See the results reviews_rating_model
$results Pearson's product-moment correlation data: predy_y$predictions and predy_y$y t = 5.621, df = 7, p-value = 0.0003991 alternative hypothesis: true correlation is greater than 0 95 percent confidence interval: 0.6785761 1.0000000 sample estimates: cor 0.9047823