Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

grep cannot find all occurrences of a substring in a very big file

I am on a Ubuntu OS, in a Bash shell, trying to use grep to find all occurrences of substring engineBreakdown() inside a .tra extention log file, let’s say my_log_16.tra, and save the results inside a file, let’s say results_16.txt

So I run

cat /path/to/my_log_16.tra | grep "engineBreakdown()" > results_16.txt

and when I run less results_16.txt I actually see that there inside are saved some lines containing the substring, but they are not all the lines I expected.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

In fact, when I manually search the occurrences of engineBreakdown() down my_log_16.tra, I see that there are other lines containing the substring, but these are not saved into results_16.txt. So it seems that my command only saves the first occurrences of the substring.

I think the grep may fail because my_log_16.tra is a very large file ( about 100 MB ).

If this is the cause, is there a more reliable way to find all occurrences of a substring in a very big file?

version of grep on my machine

grep --version
grep (GNU grep) 2.25
Copyright (C) 2016 Free Software Foundation, Inc.     
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.     
This is free software: you are free to change and redistribute it.     
There is NO WARRANTY, to the extent permitted by law.         

Written by Mike Haertel and others, see <http://git.sv.gnu.org/cgit/grep.git/tree/AUTHORS>.

Example of lines from my_log_16.tra

lines correctly detected and saved into results_16.txt

[I 2022-10-16 07:26:35.449 Rservice:75] engineBreakdown()
[I 2022-10-16 07:26:35.846 Rservice:75] engineBreakdown()
[I 2022-10-16 07:26:35.848 Rservice:75] engineBreakdown()

a piece of the file where the substring appears, but it is not saved into results_16.txt

[I 2022-10-16 11:32:48.039 web:2064] 200 GET /static/ui-src/default/img/Customer.png?v=0.9702853857687699 (127.0.0.1) 10.49ms
[I 2022-10-16 11:32:49.778 Rservice:75] engineBreakdown()
[I 2022-10-16 11:32:50.122 websocketclient:62] Connection : url::ws://localhost:3333/ws
[I 2022-10-16 11:32:50.125 Rservice:75] engineBreakdown()
[I 2022-10-16 11:32:50.128 Rservice:75] engineBreakdown()
[I 2022-10-16 11:32:55.123 websocketclient:62] Connection : url::ws://localhost:3333/ws
[I 2022-10-16 11:32:55.128 Rservice:75] engineBreakdown()
[I 2022-10-16 11:32:55.134 Rservice:75] engineBreakdown()

another piece of the file where the substring appears, but it is not saved into results_16.txt

[I 2022-10-17 04:00:35.127 Rservice:75] engineBreakdown()
[I 2022-10-17 04:00:35.138 Rservice:75] engineBreakdown()
[I 2022-10-17 04:00:39.206 websocketclient:62] Connection : url::ws://127.0.0.1:9999/request
[I 2022-10-17 04:00:39.220 websocketclient:62] Connection : url::ws://127.0.0.1:9999/auxiliary
[I 2022-10-17 04:00:39.228 channels:75] _on_connection_error, host=127.0.0.1, port=9999
[I 2022-10-17 04:00:39.233 channels:82] _on_connection_close, host=127.0.0.1, port=9999
[I 2022-10-17 04:00:39.237 channels:75] _on_connection_error, host=127.0.0.1, port=9999
[I 2022-10-17 04:00:39.243 channels:82] _on_connection_close, host=127.0.0.1, port=9999
[I 2022-10-17 04:00:40.122 websocketclient:62] Connection : url::ws://localhost:3333/ws
[I 2022-10-17 04:00:40.128 Rservice:75] engineBreakdown()
[I 2022-10-17 04:00:40.133 Rservice:75] engineBreakdown()
[I 2022-10-17 04:00:44.206 websocketclient:62] Connection : url::ws://127.0.0.1:9999/request
[I 2022-10-17 04:00:44.221 websocketclient:62] Connection : url::ws://127.0.0.1:9999/auxiliary
[I 2022-10-17 04:00:44.227 channels:75] _on_connection_error, host=127.0.0.1, port=9999
[I 2022-10-17 04:00:44.232 channels:82] _on_connection_close, host=127.0.0.1, port=9999
[I 2022-10-17 04:00:44.234 channels:75] _on_connection_error, host=127.0.0.1, port=9999
[I 2022-10-17 04:00:44.237 channels:82] _on_connection_close, host=127.0.0.1, port=9999
[I 2022-10-17 04:00:45.122 websocketclient:62] Connection : url::ws://localhost:3333/ws
[I 2022-10-17 04:00:45.126 Rservice:75] engineBreakdown()
[I 2022-10-17 04:00:45.128 Rservice:75] engineBreakdown()

update 1

I also tryed with

grep "engineBreakdown()" /path/to/my_log_16.tra > results_16.txt

but the result is the same.

update 2

As suggested, double quotes might not be enough to handle the parentheses properly, so I removed the parentheses from the input substring and changed the double quotes to single ones

grep "engineBreakdown" /path/to/my_log_16.tra > results_16.txt

grep 'engineBreakdown' /path/to/my_log_16.tra > results_16.txt

but the result is the same.

>Solution :

Seems like your grep command is behaving oddly (perhaps because you are using an old version that has some bug that was fixed later).

Here’s an alternative with sed:

sed -n '/engineBreakdown()/p' /path/to/my_log_16.tra > op.txt

I’d recommend updating your grep installation. ripgrep is another alternative.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading