Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

R Remove string characters from a range of rows in a column

I have a column in a dataset in which I am wanting to remove the first two characters from the rows. Now, the thing is not all rows have these characters, so I don’t want to change those rows and some rows are empty.

How can I replace the characters in the rows that have them along with removing the rows that are empty and not effect the rows that don’t need any modification?

Please note that the original dataset has 305 rows.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Sample Data

    Date = c("AA 1/27/2020",
             "BB 1/29/2020",
             "CC 1/30/2020",
             "DD 2/1/2020",
             "2/9/2020",
             "2/15/2020",
             " ",
             " ",
             "EE 2/16/2020",
             "VV 2/17/2020",
             "2/18/2020",
             "2/22/2020",
             "2/25/2020",
             "2/28/2020") 

Date_Approved = c("1/28/2020",
             "1/30/2020",
             "1/31/2020",
             "2/2/2020",
             "2/10/2020",
             "2/16/2020",
             "2/17/2020",
             "2/18/2020",
             "2/17/2020",
             "2/19/2020",
             "2/20/2020",
             "2/23/2020",
             "2/26/2020",
             "2/29/2020") 

Code

    library(tidyverse)
    
   df = data.frame(Date, Date_Approved)

    # Normally I would use
    # Remove Acronyms from date.received column
    df = Date %>% 
             mutate(Date_New= str_sub(Date[], 3, -1))
          

>Solution :

If we want to substring and filter, an option is to use trimws (trims out the characters by default whitespace at either end of the string – if we want only left or right, specify the which by default is ‘both’) with whitespace as regex i.e. matching zero or more upper case letters followed by zero or more spaces ([A-Z]*\\s*), and then filter the rows where the elements are not blank

library(dplyr)
df %>% 
  mutate(Date = trimws(Date, whitespace = "[A-Z]*\\s*")) %>% 
  filter(nzchar(Date))

-output

       Date Date_Approved
1  1/27/2020     1/28/2020
2  1/29/2020     1/30/2020
3  1/30/2020     1/31/2020
4   2/1/2020      2/2/2020
5   2/9/2020     2/10/2020
6  2/15/2020     2/16/2020
7  2/16/2020     2/17/2020
8  2/17/2020     2/19/2020
9  2/18/2020     2/20/2020
10 2/22/2020     2/23/2020
11 2/25/2020     2/26/2020
12 2/28/2020     2/29/2020
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading