Getting an array iteration error, wondering how to fix it

February 19, 2022

I am currently building a web scraper for Real Estate data. I’m working in Python and I’ve come across an error I can’t seem to be able to fix.

for i in range(len(s)):
                        if '$' in s[i]:
                                price.append(s[i])

                        elif 'bath' in s[i]:
                                left = s[i].partition(",")[0]
                                right = s[i].partition(",")[2]
                                bed_bath.append(left)
                                sqft_lot.append(right)

                        elif 'fort collins' in s[i].lower():
                                address0 = s[i-1]+' '+s[i]
                                address.append(address0)

                        elif s[i].lower() == 'advertisement':
                                del s[i]

                        else:
                                continue

Value of ‘s’ being:

                display = Display(visible=0, size=(800, 600))
                display.start()
                browser = webdriver.Firefox()
                browser.get(realtor.format(format))
                p = browser.find_element(By.XPATH, "//ul[@class='jsx-343105667 property-list list-unstyle']")
                content = p.text
                s = re.split('\n',content)

This is basically supposed to iterate through the array s, and add them to a separate array [price,bed_bath,sqrft_lot,address] to be used in a DataFrame. I know that it is indexing properly, I’ve printed each line consecutively using for i in range(len(s)): print s[i], which works, but then when I try to implement logic it’s just breaking.

Getting error:

if '$' in s[i]:
**IndexError: list index out of range**

Any input into why this is happening would be much appreciated.

>Solution :

As @quamrana mentioned, most likely the problem is that you do del s[i], so s get’s shorter and thus some indexes will no longer exist in s. I have 2 possible fix ideas.
Fix 1:

for i in range(len(s)):
    if i >= len(s): # check if index is still in bounds
        break
    
    if '$' in s[i]:
            price.append(s[i])

    elif 'bath' in s[i]:
            left = s[i].partition(",")[0]
            right = s[i].partition(",")[2]
            bed_bath.append(left)
            sqft_lot.append(right)

    elif 'fort collins' in s[i].lower():
            address0 = s[i-1]+' '+s[i]
            address.append(address0)

    elif s[i].lower() == 'advertisement':
            del s[i]
    else:
            continue

Fix 2:

indexes_to_remove = []

for i in range(len(s)):
    if '$' in s[i]:
            price.append(s[i])

    elif 'bath' in s[i]:
            left = s[i].partition(",")[0]
            right = s[i].partition(",")[2]
            bed_bath.append(left)
            sqft_lot.append(right)

    elif 'fort collins' in s[i].lower():
            address0 = s[i-1]+' '+s[i]
            address.append(address0)

    elif s[i].lower() == 'advertisement':
            indexes_to_remove.append(i)
    else:
            continue


for index in indexes_to_remove[::-1]: # if you iterate through it backward, you won't have that problem.
    del s[i]