I have received a large file with many lines that have the same build. Here are a few lines as an example:
14 23456 12356 1234 15 1456 1245 123456 23456 chrysanthemums
12456 123456 34 1236 123456 1234 45 123456 whitings
14 356 124 6 12345 6 1245 malformations
12456 23456 2356 12345 12345 123456 6 furnishings
2345 16 345 126 345 126 3 12456 3 stoned
245 34 123456 123456 12 346 134 4 245 1245 146 6 gravitate
12456 34 34 356 12356 15 26 13 gastrointestinal
23456 1 234 3 5 12356 lawyer
123456 3456 123456 123456 16 123456 12356 12 46 12456 45 1346 tuba
2356 345 12345 4 4 1 6 gripped
123456 123456 123456 123456 35 12456 123456 123456 23 356 23456 25 replenishes
As you can see it’s multiple combinations of numbers 1-6 from small to large and divided by spaces. At the end there is a word.
I’d like to use the grep command to display lines where consecutive ‘words’ don’t share any characters. This can apply for the entire line, because the last word won’t share any characters with the numbers anyways.
An example of a line that should be displayed:
12456 3 4 12356 4 12356 4 12356 4 156 234 cool
An example of a line that shouldn’t be displayed:
234 5 12456 13456 136 23456 2346 5 6 345 angry
(the second word is ‘5’ and the third also contains ‘5’…)
Please help me!
I don’t have a clue of what to do, but I’d like it to be done with a single grep command using regex.
>Solution :
grep -E '\b\w*(\w)\w*\b\s\b\w*\1\w*\b'
finds the next best repeat character of consecutive words.
Use the -v argument to remove matching lines from any input:
cat input.txt | grep -vE '\b\w*(\w)\w*\b\s\b\w*\1\w*\b'