I am currently building a web scraper for Real Estate data. I’m working in Python and I’ve come across an error I can’t seem to be able to fix.
for i in range(len(s)):
if '$' in s[i]:
price.append(s[i])
elif 'bath' in s[i]:
left = s[i].partition(",")[0]
right = s[i].partition(",")[2]
bed_bath.append(left)
sqft_lot.append(right)
elif 'fort collins' in s[i].lower():
address0 = s[i-1]+' '+s[i]
address.append(address0)
elif s[i].lower() == 'advertisement':
del s[i]
else:
continue
Value of ‘s’ being:
display = Display(visible=0, size=(800, 600))
display.start()
browser = webdriver.Firefox()
browser.get(realtor.format(format))
p = browser.find_element(By.XPATH, "//ul[@class='jsx-343105667 property-list list-unstyle']")
content = p.text
s = re.split('\n',content)
This is basically supposed to iterate through the array s, and add them to a separate array [price,bed_bath,sqrft_lot,address] to be used in a DataFrame. I know that it is indexing properly, I’ve printed each line consecutively using for i in range(len(s)): print s[i], which works, but then when I try to implement logic it’s just breaking.
Getting error:
if '$' in s[i]:
**IndexError: list index out of range**
Any input into why this is happening would be much appreciated.
>Solution :
As @quamrana mentioned, most likely the problem is that you do del s[i], so s get’s shorter and thus some indexes will no longer exist in s. I have 2 possible fix ideas.
Fix 1:
for i in range(len(s)):
if i >= len(s): # check if index is still in bounds
break
if '$' in s[i]:
price.append(s[i])
elif 'bath' in s[i]:
left = s[i].partition(",")[0]
right = s[i].partition(",")[2]
bed_bath.append(left)
sqft_lot.append(right)
elif 'fort collins' in s[i].lower():
address0 = s[i-1]+' '+s[i]
address.append(address0)
elif s[i].lower() == 'advertisement':
del s[i]
else:
continue
Fix 2:
indexes_to_remove = []
for i in range(len(s)):
if '$' in s[i]:
price.append(s[i])
elif 'bath' in s[i]:
left = s[i].partition(",")[0]
right = s[i].partition(",")[2]
bed_bath.append(left)
sqft_lot.append(right)
elif 'fort collins' in s[i].lower():
address0 = s[i-1]+' '+s[i]
address.append(address0)
elif s[i].lower() == 'advertisement':
indexes_to_remove.append(i)
else:
continue
for index in indexes_to_remove[::-1]: # if you iterate through it backward, you won't have that problem.
del s[i]