Remove items from one list if they contain strings from another list

I’m looking for the most efficient way to remove items from one list if they contain strings from another list.

For example:

B list contains:

TomWentFishing
SueStayedHome
JohnGoesToSchool
JimPlaysTennis

A list contains:

GoesToSchool
SueStayed

C list should contain:

TomWentFishing
JimPlaysTennis

I’ve used this code, but it takes up a lot of time as the lists are very large:

static void Main(string[] args)
    {
        string[] b = File.ReadAllLines(@"C:\b.txt");
        string[] a = File.ReadAllLines(@"C:\a.txt");

        foreach (string firststring in b)
        {
            bool contains = false;
            foreach (string secondstring in a)
            {
                if (firststring.ToLower().Contains(secondstring.ToLower()))
                {
                    contains = true;
                    break;
                }
            }

            if (contains == false)
            {
                File.AppendAllText(@"C:\c.txt", firststring + Environment.NewLine);
            }


        }

    }

>Solution :

You can make this significantly faster if you can sort the a list into something that can support binary (or faster) lookups.

Unfortunately, the Contains() search makes this challenging. But there are still some things we can do:

  • Avoid loadomg all of b into RAM. Ever.
  • On the other hand, lookups into a will be faster if we preload into RAM once, and do as much work to support the lookups for this one copy as we can.
  • It will be more efficient to do all of the write operations at once, rather than re-opening the output file to append the lines as we find them.
  • As a bonus, we’ll do all this in significantly less code.
static void Main(string[] args)
{
    var b = File.ReadLines(@"C:\b.txt");
    var a = File.ReadLines(@"C:\a.txt").Select(line => line.ToLower()).ToList();

    var result = b.Where(bline => {
       var lowered = bline.ToLower();
       return !a.Any(aline => lowered.Contains(aline);
    });

    File.AppendAllLines(@"C:\c.txt", result);
}

Leave a Reply