I keep daily log files (like logfile-2022-01-01.log, logfile-2022-01-02.log, and so on).
Every line on the files starts with a timestamp, e.g: [2022-05-01 10:00:34.550] …some strings…. –> this being YYYY-MM-DD HH:MM:SS.sss
I need to filter all the lines between two timestamps, this could mean search in more than one file.
For instance:
logfile-2022-01-01.log
[2022-01-01 00:00:25.550] here comes some logging info
[2022-01-01 00:02:25.550] here comes some more logging info
….
[2022-01-01 23:58:29.480] here comes some more logging info
logfile-2022-01-02.log
[2022-01-02 00:01:25.550] here comes some logging info from the next day
[2022-01-02 00:04:25.550] here comes some more logging info from the next day
….
[2022-01-02 23:59:29.480] here comes some more logging info from the next day
I wish to extract the lines between 2022-01-01 20:00:00 (this is contained in the first file) and 2022-01-02 08:00:00 (this is contained in the second file).
I’m expecting to get something like this:
[2022-01-01 23:58:29.480] here comes some more logging info
[2022-01-02 00:01:25.550] here comes some logging info from the next day
[2022-01-02 00:04:25.550] here comes some more logging info from the next day
Any ideas on how to achieve this?
So far I’ve tried using this:
grep logfile-2022-01-01.log logfile-2022-01-02.log | grep "here comes some" | awk
‘/^2022-01-01 20:00/,/^2022-01-02 08:00/ {print}’grep logfile-2022-01-01.log logfile-2022-01-02.log | grep "here comes some" | awk
‘$1" "$2 > "2022-01-01 20:00" && $1" "$2 < "2022-01-02 08:00"’grep logfile-2022-01-01.log logfile-2022-01-02.log | grep "here comes some" | awk -v beg=’2022-01-01 20:00′ -v end=’2022-01-02 08:00′ ‘{cur=$1" "$2} beg<=cur && cur<=end’
Both run without errors but didn’t print anything
>Solution :
Adding some lines to both input files so we can confirm matching on specific strings; also updating file names to match the timestamp date (05 instead of 01):
$ head logfile*
==> logfile-2022-05-01.log <==
[2022-05-01 00:00:25.550] here comes some logging info
[2022-05-01 00:02:25.550] here comes some more logging info
[2022-05-01 23:56:30.332] here comes more logging info
[2022-05-01 23:58:29.480] here comes some more logging info
==> logfile-2022-05-02.log <==
[2022-05-01 23:58:29.480] here comes some more logging info
[2022-05-02 00:01:25.550] here comes some logging info from the next day
[2022-05-02 00:02:39.224] here comes logging info from the next day
[2022-05-02 00:04:25.550] here comes some more logging info from the next day
Tweaking one of OP’s current set of code:
$ cat logfile-2022-05-01.log logfile-2022-05-02.log | grep "here comes some" | awk -F'[][]' '$2 >= "2022-05-01 20:00" && $2 <= "2022-05-02 08:00"'
Where:
- replace first
grepwithcat - add
awkdual field delimiter of]and[ - modify
awkto only compare the 2nd field - modify
awktests to use inclusive ranges - update file names and datetime stamps for May (
05) instead of Jan (01)
This generates:
[2022-05-01 23:58:29.480] here comes some more logging info
[2022-05-01 23:58:29.480] here comes some more logging info
[2022-05-02 00:01:25.550] here comes some logging info from the next day
[2022-05-02 00:04:25.550] here comes some more logging info from the next day
While this generates OP’s desired results (per comment OP has stated duplicate lines are ok), once you decide to use awk there’s typically no need for separate cat and grep calls.
One unified awk idea that utilizes input variables while also removing duplicate (consecutive) lines:
start='2022-05-01 20:00:00'
end='2022-05-02 08:00:00'
string='here comes some'
awk -F'[][]' -v start="$start" -v end="$end" -v str="$string" '
$2 >= start { printme=1 }
printme { if ($2 > end) { # disable printme flag if beyond the "end"
printme=0
next # replace with "exit" if sure follow-on input files will not contain any lines of interest, ie, input files are processed in timestamp order
}
if (! index($0,str)) next # if current line does not contain our "string" then skip to next input line
if ($0==last) next # skip duplicate (consecutive) lines
print $0
last=$0
next
}
' logfile-2022-05-??.log
This generates:
[2022-05-01 23:58:29.480] here comes some more logging info
[2022-05-02 00:01:25.550] here comes some logging info from the next day
[2022-05-02 00:04:25.550] here comes some more logging info from the next day