I have a set of files with names like: 01_Jan/193501asc.gz, 09_Sep/188209asc.gz, 01_Jan/202101asc.gz, 07_Jul/201107asc.gz, read from the folder and zipped file (month/yearmonthasc.gz). The first four characeters of the file name represent the year.
I wonder how to write a regex expression to filter only the files that are from year 2000 and more? (2000-2021?) Would be great it it coule be applicable to any yearly threshold: from 1850, from 1950..
I have tried:
file_ls <- list.files(paste(myPath, "data", sep = "/"),
pattern = "[>2000]",
pattern = "20",
#pattern = "[2000-2021]",
#pattern="*\\.gz$", # ending character
recursive=TRUE)
and especially pattern = "20" seems promising. BUt does not handle if 20 is present somewhere else, like in 09_Sep/188209asc.gz – here the year is 1882.
Thank you!
>Solution :
If the filenames all start with the year in YYYY, then using pattern = "^20" should work. That restricts the search to the beginning of the string.