Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Clean content in R to desired tibble

I have downloaded a CSV file that looks rather messy named Henry and shows the following content:

Name;Number;Year;Drink;Country
Henry;252;2020;Fanta;France

Name of persen;Part of a larger test;Food drink type; Value of gear type, Phonetype;Country of fabrication

From period;Period from time;Country;Latitud;Longitud
2020-02-01 00:00:00;France;142.63;45.5232;19.5653

Date;Time;Food;M/F
2020-08-15;09:00:00;Hotdog;M
2020-08-15;10:00:00;Hamburger;M
2020-08-15;11:00:00;Hamburger;F
2020-08-15;12:00:00;Hotdog;M
2020-08-15;13:00:00;Hamburger;F
2020-08-15;14:00:00;Hotdog;M
2020-08-15;15:00:00;Hamburger;F
#With 250 more rows

Removed to make it easier to help:
Down below is a dataset ## 4

I want to clean it and extract the following tibble. So taking the firs two columns Name/Number, then remove the rest of the content until the data and then remove M/F column. Like this:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Name Number  Date                   Time           Food
Henry   252      2020-08-15     09:00:00     Hotdog
Henry   252      2020-08-15     10:00:00     Hamburger
Henry   252      2020-08-15     11:00:00     Hamburger
Henry   252      2020-08-15     12:00:00     Hotdog
Henry   252      2020-08-15     13:00:00     Hamburger
#With 252 more rows

I suppose it has to do with the delim function?

>Solution :

Here is a version how we could form a data frame after reading in with read_delim. This is hard coded for this example only. The problem is that we do not have the full dataset:

library(tidyverse)

test <- read_delim("test.csv", delim = ";", 
                   escape_double = FALSE, trim_ws = TRUE)

test %>% 
  slice(7:n()) %>% 
  mutate(Date = Name, 
         Time = Number,
         Food = Year,
         Name = NA_character_, 
         Number= NA_character_) %>% 
  select(-c(Year, Drink, Country)) %>% 
  {. ->>intermediateResult} %>%  # this saves intermediate 
  bind_rows(test) %>% 
  select(Name, Number, Date, Time, Food) %>% 
  filter(row_number()==9) %>% 
  bind_rows(intermediateResult) %>% 
  fill(c(Name, Number), .direction = "down") %>% 
  na.omit()
  Name  Number Date       Time     Food     
  <chr> <chr>  <chr>      <chr>    <chr>    
1 Henry 252    2020-08-15 09:00:00 Hotdog   
2 Henry 252    2020-08-15 10:00:00 Hamburger
3 Henry 252    2020-08-15 11:00:00 Hamburger
4 Henry 252    2020-08-15 12:00:00 Hotdog   
5 Henry 252    2020-08-15 13:00:00 Hamburger
6 Henry 252    2020-08-15 14:00:00 Hotdog   
7 Henry 252    2020-08-15 15:00:00 Hamburger
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading