Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Regex expression: filter files with year numerically larger then a value? Year encoded in the string

I have a set of files with names like: 01_Jan/193501asc.gz, 09_Sep/188209asc.gz, 01_Jan/202101asc.gz, 07_Jul/201107asc.gz, read from the folder and zipped file (month/yearmonthasc.gz). The first four characeters of the file name represent the year.

I wonder how to write a regex expression to filter only the files that are from year 2000 and more? (2000-2021?) Would be great it it coule be applicable to any yearly threshold: from 1850, from 1950..

I have tried:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

file_ls <- list.files(paste(myPath, "data", sep = "/"), 
                          pattern = "[>2000]",
                          pattern = "20",
                          #pattern = "[2000-2021]",
                          #pattern="*\\.gz$", # ending character
                          recursive=TRUE)

and especially pattern = "20" seems promising. BUt does not handle if 20 is present somewhere else, like in 09_Sep/188209asc.gz – here the year is 1882.

Thank you!

>Solution :

If the filenames all start with the year in YYYY, then using pattern = "^20" should work. That restricts the search to the beginning of the string.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading