Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Getting an array iteration error, wondering how to fix it

I am currently building a web scraper for Real Estate data. I’m working in Python and I’ve come across an error I can’t seem to be able to fix.

for i in range(len(s)):
                        if '$' in s[i]:
                                price.append(s[i])

                        elif 'bath' in s[i]:
                                left = s[i].partition(",")[0]
                                right = s[i].partition(",")[2]
                                bed_bath.append(left)
                                sqft_lot.append(right)

                        elif 'fort collins' in s[i].lower():
                                address0 = s[i-1]+' '+s[i]
                                address.append(address0)

                        elif s[i].lower() == 'advertisement':
                                del s[i]

                        else:
                                continue

Value of ‘s’ being:

                display = Display(visible=0, size=(800, 600))
                display.start()
                browser = webdriver.Firefox()
                browser.get(realtor.format(format))
                p = browser.find_element(By.XPATH, "//ul[@class='jsx-343105667 property-list list-unstyle']")
                content = p.text
                s = re.split('\n',content)

This is basically supposed to iterate through the array s, and add them to a separate array [price,bed_bath,sqrft_lot,address] to be used in a DataFrame. I know that it is indexing properly, I’ve printed each line consecutively using for i in range(len(s)): print s[i], which works, but then when I try to implement logic it’s just breaking.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Getting error:

if '$' in s[i]:
**IndexError: list index out of range**

Any input into why this is happening would be much appreciated.

>Solution :

As @quamrana mentioned, most likely the problem is that you do del s[i], so s get’s shorter and thus some indexes will no longer exist in s. I have 2 possible fix ideas.
Fix 1:

for i in range(len(s)):
    if i >= len(s): # check if index is still in bounds
        break
    
    if '$' in s[i]:
            price.append(s[i])

    elif 'bath' in s[i]:
            left = s[i].partition(",")[0]
            right = s[i].partition(",")[2]
            bed_bath.append(left)
            sqft_lot.append(right)

    elif 'fort collins' in s[i].lower():
            address0 = s[i-1]+' '+s[i]
            address.append(address0)

    elif s[i].lower() == 'advertisement':
            del s[i]
    else:
            continue

Fix 2:

indexes_to_remove = []

for i in range(len(s)):
    if '$' in s[i]:
            price.append(s[i])

    elif 'bath' in s[i]:
            left = s[i].partition(",")[0]
            right = s[i].partition(",")[2]
            bed_bath.append(left)
            sqft_lot.append(right)

    elif 'fort collins' in s[i].lower():
            address0 = s[i-1]+' '+s[i]
            address.append(address0)

    elif s[i].lower() == 'advertisement':
            indexes_to_remove.append(i)
    else:
            continue


for index in indexes_to_remove[::-1]: # if you iterate through it backward, you won't have that problem.
    del s[i]
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading