Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Remove items from one list if they contain strings from another list

I’m looking for the most efficient way to remove items from one list if they contain strings from another list.

For example:

B list contains:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

TomWentFishing
SueStayedHome
JohnGoesToSchool
JimPlaysTennis

A list contains:

GoesToSchool
SueStayed

C list should contain:

TomWentFishing
JimPlaysTennis

I’ve used this code, but it takes up a lot of time as the lists are very large:

static void Main(string[] args)
    {
        string[] b = File.ReadAllLines(@"C:\b.txt");
        string[] a = File.ReadAllLines(@"C:\a.txt");

        foreach (string firststring in b)
        {
            bool contains = false;
            foreach (string secondstring in a)
            {
                if (firststring.ToLower().Contains(secondstring.ToLower()))
                {
                    contains = true;
                    break;
                }
            }

            if (contains == false)
            {
                File.AppendAllText(@"C:\c.txt", firststring + Environment.NewLine);
            }


        }

    }

>Solution :

You can make this significantly faster if you can sort the a list into something that can support binary (or faster) lookups.

Unfortunately, the Contains() search makes this challenging. But there are still some things we can do:

  • Avoid loadomg all of b into RAM. Ever.
  • On the other hand, lookups into a will be faster if we preload into RAM once, and do as much work to support the lookups for this one copy as we can.
  • It will be more efficient to do all of the write operations at once, rather than re-opening the output file to append the lines as we find them.
  • As a bonus, we’ll do all this in significantly less code.
static void Main(string[] args)
{
    var b = File.ReadLines(@"C:\b.txt");
    var a = File.ReadLines(@"C:\a.txt").Select(line => line.ToLower()).ToList();

    var result = b.Where(bline => {
       var lowered = bline.ToLower();
       return !a.Any(aline => lowered.Contains(aline);
    });

    File.AppendAllLines(@"C:\c.txt", result);
}
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading