Selenium scrapy python : No data in csv/jsob

Question!

I'm trying to scrap information from http://www.qchp.org.qa/en/Pages/searchpractitioners.aspx. I want to do the following:

  • Select "Dentist" from the dropdown on the top of the page
  • Click search
  • Notice that information at the bottom of the page changes dynamically using javascript
  • Click on hyperlinks of practitioner names and a popup shows up
  • I want to save all that information in a json/csv file for each practitioner.

I also want the information on other pages that are linked at the bottom of the page that changes the information in the save div.

I tried export the data to a json file but it generate an empty file. I don't see any errors in the console.

spider.py

from scrapy.spider import Spider
from scrapy.selector import Selector
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from scrapytutorial.items import SchItem
from selenium.webdriver.support.ui import Select

class DmozSpider(Spider):
    name = "sch"

    driver = webdriver.Firefox()
    driver.get("http://www.qchp.org.qa/en/Pages/searchpractitioners.aspx")

    dropdown = driver.find_element_by_name("ctl00$m$g_28bc0e11_4b8f_421f_84b7_d671de504bc3$ctl00$drp_practitionerType")
    all_options = dropdown.find_elements_by_tag_name("option")

    for option in all_options:
        if option.get_attribute("value") == "4":  #Dentist
            option.click()
            break

    driver.find_element_by_name("ctl00$m$g_28bc0e11_4b8f_421f_84b7_d671de504bc3$ctl00$Searchbtn").click()


    def parse(self, response):

        all_docs = element.find_elements_by_tag_name("td")
        for name in all_docs:
            name.click()
            alert = driver.switch_to_alert()
            sel = Selector(response)
            ma = sel.xpath('//table')
            items = []
            for site in ma:
                item = SchItem()
                item['name'] = site.xpath("//span[@id='PractitionerDetails1_lbl_Name']/text()").extract()
                item['profession'] = site.xpath("//span[@id='PractitionerDetails1_lbl_Profession']/text()").extract()
                item['scope_of_practise'] = site.xpath("//span[@id='PractitionerDetails1_lbl_sop']/text()").extract()
                item['instituition'] = site.xpath("//span[@id='PractitionerDetails1_lbl_institution']/text()").extract()
                item['license'] = site.xpath("//span[@id='PractitionerDetails1_lbl_LicenceNo']/text()").extract()
                item['license_expiry_date'] = site.xpath("//span[@id='PractitionerDetails1_lbl_LicenceExpiry']/text()").extract()
                item['qualification'] = site.xpath("//span[@id='PractitionerDetails1_lbl_Qualification']/text()").extract()

                items.append(item)
            return items

Here is items.py from scrapy.item import Item

class SchItem(Item):

    name = Field()
    profession = Field()
    scope_of_practise = Field()
    instituition = Field()
    license = Field()
    license_expiry_date = Field()
    qualification = Field()
By : James L.


Answers

2 possible ideas: * indentation of 'return items' - move by 4 units to the left; ** instead of sel = Selector(response) try sel = Selector(response.url) -> you're not working with scrapy response, rather with selenium response.

By : mnd


This video can help you solving your question :)
By: admin