Unix get lines between timestamps on multiple files

December 27, 2022

I keep daily log files (like logfile-2022-01-01.log, logfile-2022-01-02.log, and so on).
Every line on the files starts with a timestamp, e.g: [2022-05-01 10:00:34.550] …some strings…. –> this being YYYY-MM-DD HH:MM:SS.sss

I need to filter all the lines between two timestamps, this could mean search in more than one file.
For instance:

logfile-2022-01-01.log
[2022-01-01 00:00:25.550] here comes some logging info
[2022-01-01 00:02:25.550] here comes some more logging info
….
[2022-01-01 23:58:29.480] here comes some more logging info

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.
Visit Medevel

logfile-2022-01-02.log
[2022-01-02 00:01:25.550] here comes some logging info from the next day
[2022-01-02 00:04:25.550] here comes some more logging info from the next day
….
[2022-01-02 23:59:29.480] here comes some more logging info from the next day

I wish to extract the lines between 2022-01-01 20:00:00 (this is contained in the first file) and 2022-01-02 08:00:00 (this is contained in the second file).
I’m expecting to get something like this:

[2022-01-01 23:58:29.480] here comes some more logging info
[2022-01-02 00:01:25.550] here comes some logging info from the next day
[2022-01-02 00:04:25.550] here comes some more logging info from the next day

Any ideas on how to achieve this?

So far I’ve tried using this:

grep logfile-2022-01-01.log logfile-2022-01-02.log | grep "here comes some" | awk
‘/^2022-01-01 20:00/,/^2022-01-02 08:00/ {print}’

grep logfile-2022-01-01.log logfile-2022-01-02.log | grep "here comes some" | awk
‘$1" "$2 > "2022-01-01 20:00" && $1" "$2 < "2022-01-02 08:00"’

grep logfile-2022-01-01.log logfile-2022-01-02.log | grep "here comes some" | awk -v beg=’2022-01-01 20:00′ -v end=’2022-01-02 08:00′ ‘{cur=$1" "$2} beg<=cur && cur<=end’

Both run without errors but didn’t print anything

>Solution :

Adding some lines to both input files so we can confirm matching on specific strings; also updating file names to match the timestamp date (05 instead of 01):

$ head  logfile*
==> logfile-2022-05-01.log <==
[2022-05-01 00:00:25.550] here comes some logging info
[2022-05-01 00:02:25.550] here comes some more logging info
[2022-05-01 23:56:30.332] here comes more logging info
[2022-05-01 23:58:29.480] here comes some more logging info


==> logfile-2022-05-02.log <==
[2022-05-01 23:58:29.480] here comes some more logging info
[2022-05-02 00:01:25.550] here comes some logging info from the next day
[2022-05-02 00:02:39.224] here comes logging info from the next day
[2022-05-02 00:04:25.550] here comes some more logging info from the next day

Tweaking one of OP’s current set of code:

$ cat logfile-2022-05-01.log logfile-2022-05-02.log | grep "here comes some" | awk -F'[][]' '$2 >= "2022-05-01 20:00" && $2 <= "2022-05-02 08:00"'

Where:

replace first grep with cat
add awk dual field delimiter of ] and [
modifyawk to only compare the 2nd field
modify awk tests to use inclusive ranges
update file names and datetime stamps for May (05) instead of Jan (01)

This generates:

[2022-05-01 23:58:29.480] here comes some more logging info
[2022-05-01 23:58:29.480] here comes some more logging info
[2022-05-02 00:01:25.550] here comes some logging info from the next day
[2022-05-02 00:04:25.550] here comes some more logging info from the next day

While this generates OP’s desired results (per comment OP has stated duplicate lines are ok), once you decide to use awk there’s typically no need for separate cat and grep calls.

One unified awk idea that utilizes input variables while also removing duplicate (consecutive) lines:

start='2022-05-01 20:00:00'
end='2022-05-02 08:00:00'
string='here comes some'

awk -F'[][]' -v start="$start" -v end="$end" -v str="$string" '
$2 >= start { printme=1 }
printme     { if ($2 > end) {                       # disable printme flag if beyond the "end"
                 printme=0
                 next                               # replace with "exit" if sure follow-on input files will not contain any lines of interest, ie, input files are processed in timestamp order
              }

              if (! index($0,str)) next             # if current line does not contain our "string" then skip to next input line
              if ($0==last) next                    # skip duplicate (consecutive) lines
              print $0
              last=$0
              next
            }
' logfile-2022-05-??.log

This generates:

[2022-05-01 23:58:29.480] here comes some more logging info
[2022-05-02 00:01:25.550] here comes some logging info from the next day
[2022-05-02 00:04:25.550] here comes some more logging info from the next day