Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to search string in large text file?

I want to get the line containing a certain word that cannot be repeated like profile ID without make loop to read each of line separately, Because if the word I am looking for is in the last line of the text file, this will take a lot of time to get it, and if the search process is for more than one word and extract the line that contains it, I think it will take a lot of time.

Example for line text file
name,id,image,age,place,link

string word = "13215646";
string output = string.Empty;
    
using (var fileStream = File.OpenRead(FileName))
using (var streamReader = new StreamReader(fileStream, Encoding.UTF8))
{
    String line;
    while ((line = streamReader.ReadLine()) != null)
    {
        string[] strList = line.Split(',');
        if (word == strList[1]) // check if word = id
        {
            output = line;
            break;
        }
    }
}

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

You can use this to search the file:

var output = File.ReadLines(FileName).
    Where(line => line.Split(',')[1] == word).
    FirstOrDefault();

But it won’t solve this:

if the word I am looking for is in the last line of the text file, this will take a lot of time to get it, and if the search process is for more than one word and extract the line that contains it, I think it will take a lot of time.

There’s not a practical way to avoid this for a basic file.

The only ways around actually reading through the file is either maintaining an index, which requires absolute control over everything that might write into the file, or if you can guarantee the file is already sorted by the columns that matter, in which case you can do something like a binary search.

But neither is likely for a random csv file. This is one of the reasons people use databases.

However, we also need to stop and check whether this is really a problem for you. I’d expect the code above to handle files up to a couple hundred MB in around 1 to 2 seconds on modern hardware, even if you need to look through the whole file.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading