how to extract data from autocomplete box with selenium python

By : mrki
Source: Stackoverflow.com
Question!

I am trying to extract data from a search box, you can see a good example on wikipedia

This is my code:

driver = webdriver.Firefox()
    driver.get(response.url)                
    city = driver.find_element_by_id('searchInput') 
    city.click()
    city.clear()
    city.send_keys('a')
    time.sleep(1.5) #waiting for ajax to load              
    selen_html = driver.page_source
    #print selen_html.encode('utf-8')
    hxs = HtmlXPathSelector(text=selen_html)
    ajaxWikiList = hxs.select('//div[@class="suggestions"]')
    items=[]
    for city in ajaxWikiList:
        item=TestItem()
        item['ajax'] = city.select('/div[@class="suggestions-results"]/a/@title').extract()
        items.append(item)
    print items    

Xpath expression is ok, I checked on a static page. If I uncomment the line that prints out scrapped html code the code for the box shows at the end of the file. But for some reason I can't extract data from it with the above code? I must miss something since I tried 2 different sources, wikipedia page is just another source where I can't get these data extracted. Any advice here? Thanks!

By : mrki


Answers

Instead of passing the .page_source which in your case contains an empty suggestions div, get the innerHTML of the element and pass it to the Selector:

selen_html = driver.find_element_by_class_name('suggestions').get_attribute('innerHTML')

hxs = HtmlXPathSelector(text=selen_html)
suggestions = hxs.select('//div[@class="suggestions-results"]/a/@title').extract()
for suggestion in suggestions:
    print suggestion

Outputs:

Animal
Association football
Arthropod
Australia
AllMusic
African American (U.S. Census)
Album
Angiosperms
Actor
American football

Note that it would be better to use selenium Waits feature to wait for the element to be accessible/visible, see:

Also, note that HtmlXPathSelector is deprecated, use Selector instead.

By : alecxe


This video can help you solving your question :)
By: admin