python scrapy: scraping dynamic information


I'm trying to scrap information from I want to do the following: - Select "Dentist" from the dropdown on the top of the page - Click search - Notice that information at the bottom of the page changes dynamically using javascript - Click on hyperlinks of practitioner names and a popup shows up - I want to save all that information in a json/csv file for each practitioner - I also want the information on other pages that are linkedin at the bottom of the page, that change the information in the save div.

I'm very new to scrapy and just looked into selenium because I read somewhere you need selenium for dynamic information

So I'm using Selenium inside a scrapy app. Not sure if that's right or not. I have no clue what's the best way of doing it. I have the following code so far. I'm getting this error",

line 21, in DmozSpider
    all_options = element.find_elements_by_tag_name("option")
NameError: name 'element' is not defined

from scrapy.spider import Spider
from scrapy.selector import Selector
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from scrapytutorial.items import SchItem
from import Select

class DmozSpider(Spider):
    name = "sch"

    driver = webdriver.Firefox()
    select = Select(driver.find_element_by_name('ctl00$m$g_28bc0e11_4b8f_421f_84b7_d671de504bc3$ctl00$drp_practitionerType'))
    all_options = element.find_elements_by_tag_name("option")

    for option in all_options:
        if option.get_attribute("value") == "4":  #Dentist


    def parse(self, response):

        all_docs = element.find_elements_by_tag_name("td")
        for name in all_docs:
            alert = driver.switch_to_alert()
            sel = Selector(response)
            ma = sel.xpath('//table')
            items = []
            for site in ma:
                item = SchItem()
                item['name'] = site.xpath("//span[@id='PractitionerDetails1_lbl_Name']/text()").extract()
                item['profession'] = site.xpath("//span[@id='PractitionerDetails1_lbl_Profession']/text()").extract()
                item['scope_of_practise'] = site.xpath("//span[@id='PractitionerDetails1_lbl_sop']/text()").extract()
                item['instituition'] = site.xpath("//span[@id='PractitionerDetails1_lbl_institution']/text()").extract()
                item['license'] = site.xpath("//span[@id='PractitionerDetails1_lbl_LicenceNo']/text()").extract()
                item['license_expiry_date'] = site.xpath("//span[@id='PractitionerDetails1_lbl_LicenceExpiry']/text()").extract()
                item['qualification'] = site.xpath("//span[@id='PractitionerDetails1_lbl_Qualification']/text()").extract()

            return items

from scrapy.item import Item, Field

class SchItem(Item):

    name = Field()
    profession = Field()
    scope_of_practise = Field()
    instituition = Field()
    license = Field()
    license_expiry_date = Field()
    qualification = Field()
By : James L.


Shouldn't you change the element.find_elements .. in the below code to select.find_element..

  select = Select(driver.find_element_by_name('ctl00$m$g_28bc0e11_4b8f_421f_84b7_d671de504bc3$ctl00$drp_practitionerType'))
  all_options = element.find_elements_by_tag_name("option")

Or rather should n't use select.options ?

By : Biswanath

This video can help you solving your question :)
By: admin