I’m looking for the most efficient way to remove items from one list if they contain strings from another list.
For example:
B list contains:
TomWentFishing
SueStayedHome
JohnGoesToSchool
JimPlaysTennis
A list contains:
GoesToSchool
SueStayed
C list should contain:
TomWentFishing
JimPlaysTennis
I’ve used this code, but it takes up a lot of time as the lists are very large:
static void Main(string[] args)
{
string[] b = File.ReadAllLines(@"C:\b.txt");
string[] a = File.ReadAllLines(@"C:\a.txt");
foreach (string firststring in b)
{
bool contains = false;
foreach (string secondstring in a)
{
if (firststring.ToLower().Contains(secondstring.ToLower()))
{
contains = true;
break;
}
}
if (contains == false)
{
File.AppendAllText(@"C:\c.txt", firststring + Environment.NewLine);
}
}
}
>Solution :
You can make this significantly faster if you can sort the a
list into something that can support binary (or faster) lookups.
Unfortunately, the Contains()
search makes this challenging. But there are still some things we can do:
- Avoid loadomg all of
b
into RAM. Ever. - On the other hand, lookups into
a
will be faster if we preload into RAM once, and do as much work to support the lookups for this one copy as we can. - It will be more efficient to do all of the write operations at once, rather than re-opening the output file to append the lines as we find them.
- As a bonus, we’ll do all this in significantly less code.
static void Main(string[] args)
{
var b = File.ReadLines(@"C:\b.txt");
var a = File.ReadLines(@"C:\a.txt").Select(line => line.ToLower()).ToList();
var result = b.Where(bline => {
var lowered = bline.ToLower();
return !a.Any(aline => lowered.Contains(aline);
});
File.AppendAllLines(@"C:\c.txt", result);
}