Iterating over links in selenium with scrapy

By : Deepak
Source: Stackoverflow.com
Question!

I am learning to scrape with selenium and scrapy. I have a page with list of links. I want to click the first link , visit the page crawl items and again come back to main(previous page with list of links) and click on second link and crawl and repeat the process until the desired links are over. All i could do was click the first link and then my crawler stops. What could be done to again crawl the second link and remaining ones?

My spider looks so:

class test(InitSpider):
    name="test"
    start_urls = ["http://www.somepage.com"]

    def __init__(self):
        InitSpider.__init__(self)
        self.browser = webdriver.Firefox()

    def parse(self,response):
        self.browser.get(response.url)
        time.sleep(2)
        items=[]
        sel = Selector(text=self.browser.page_source)
        links = self.browser.find_elements_by_xpath('//ol[@class="listing"]/li/h4/a')
        for link in links:
            link.click()
            time.sleep(10)
            #do some crawling and go back and repeat the process.
            self.browser.back()

Thanks

By : Deepak


Answers

You can take another approach: call browser.get() for every href of a link in the loop:

links = self.browser.find_elements_by_xpath('//ol[@class="listing"]/li/h4/a')
for link in links:
    link = link.get_attribute('href')
    self.browser.get(link)
    # crawl

If a link is relative you would need to join it with the http://www.somepage.com.

By : alecxe


This video can help you solving your question :)
By: admin