Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

how to write this romove_stopwords faster python?

i have a function remove_stopwords like this, how to make it run faster?

temp.reverse()

def drop_stopwords(text):
    
    for x in temp:
        elif len(x.split()) > 1:
            text_list = text.split()  
            for y in range(len(text_list)-len(x.split())):
                if " ".join(text_list[y:y+len(x.split())]) == x:
                    del text_list[y:y+len(x.split())]
                    text = " ".join(text_list)
        
        else:
            text = " ".join(text for text in text.split() if text not in vietnamese)

    return text

time for solve a text in my data is 14s and if i have some trick like this time for will decrease to 3s:


temp.reverse()

def drop_stopwords(text):
    
    for x in temp:
        if len(x.split()) >2:
            if x in text:
                text = text.replace(x,'')

        elif len(x.split()) > 1:
            text_list = text.split()  
            for y in range(len(text_list)-len(x.split())):
                if " ".join(text_list[y:y+len(x.split())]) == x:
                    del text_list[y:y+len(x.split())]
                    text = " ".join(text_list)
        
        else:
            text = " ".join(text for text in text.split() if text not in vietnamese)

    return text

but i think it may get wrong some where in my language. How can i rewrite this function in python to make it faster ( in C and C++ i can solve it ez with func above :(( )

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

Your function does a lot of the same thing over and over, particularly repeated split and join of the same text. Doing a single split, operating on the list, and then doing a single join at the end might be faster, and would definitely lead to simpler code. Unfortunately I don’t have any of your sample data to test the performance with, but hopefully this gives you something to experiment with:

temp = ["foo", "baz ola"]


def drop_stopwords(text):
    text_list = text.split()
    text_len = len(text_list)
    for word in temp:
        word_list = word.split()
        word_len = len(word_list)
        for i in range(text_len + 1 - word_len):
            if text_list[i:i+word_len] == word_list:
                text_list[i:i+word_len] = [None] * word_len
    return ' '.join(t for t in text_list if t)


print(drop_stopwords("the quick brown foo jumped over the baz ola dog"))
# the quick brown jumped over the dog

You could also just try iteratively doing text.replace in all cases and seeing how that performs compared to your more complex split-based solution:

temp = ["foo", "baz ola"]


def drop_stopwords(text):
    for word in temp:
        text = text.replace(word, '')
    return ' '.join(text.split())


print(drop_stopwords("the quick brown foo jumped over the baz ola dog"))
# the quick brown jumped over the dog
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading