Using xlwings with excel, which of these two approaches is the quickest/ preferred?

Tags: python xlwings
By : QBG
Source: Stackoverflow.com
Question!

I have just started to learn Python and am using xlwings to write to an excel spreadsheet. I am really new to coding (and this is my first question) so this may be a bit of a simple question but any comments would be really appreciated.

I am reading the page source of a website (using selenium and beautiful soup) to get a few pieces of information about a product, such as price and weight. I am then writing these values to cells in excel.

I have two ways of doing this - the first runs a function and then writes the values to excel before moving on to the next function:

(these are excerpts of the main script - both ways work ok)

   while rowNum < lastRow + 1:    

    urlCellRef = (rowNum, colNum)
    url = wb.sheets[0].range(urlCellRef).value

    # Parse HTML with beautiful soup
    getPageSource()            

    # Find a product price field value within HTML
    getProductPrice() 
    itemPriceRef = (rowNum, colNum + 1)
    # Write price value back to Excel sheet
    wb.sheets[0].range(itemPriceRef).value = productPrice

    getProductWeight()
    itemWeightRef = (rowNum, colNum + 2)
    wb.sheets[0].range(itemWeightRef).value = productWeight     

    getProductQuantity()
    itemQuantityRef = (rowNum, colNum + 4)
    wb.sheets[0].range(itemQuantityRef).value = productQuantity

    getProductCode()
    prodCodeRef = (rowNum, colNum + 6)
    wb.sheets[0].range(prodCodeRef).value = productCode


    rowNum = rowNum + 1

The second runs all of the functions and then writes each of the stored values to excel in one go:

   while rowNum < lastRow + 1:    

    urlCellRef = (rowNum, colNum)
    url = wb.sheets[0].range(urlCellRef).value


    getPageSource()            
    getProductPrice() 
    getProductWeight()
    getProductQuantity()
    getProductCode()


    itemPriceRef = (rowNum, colNum + 1)
    wb.sheets[0].range(itemPriceRef).value = productPrice  

    itemWeightRef = (rowNum, colNum + 2)
    wb.sheets[0].range(itemWeightRef).value = productWeight

    itemQuantityRef = (rowNum, colNum + 4)
    wb.sheets[0].range(itemQuantityRef).value = productQuantity

    prodCodeRef = (rowNum, colNum + 6)
    wb.sheets[0].range(prodCodeRef).value = productCode


    rowNum = rowNum + 1 

I was wondering, which is the preferred method for doing this? I haven't noticed much of a speed difference but my laptop is pretty slow so if one approach is considered best practice then I would prefer to go with that as I will be increasing the number of urls that will be used.

Many thanks for your help!

By : QBG


Answers

The overhead of the Excel call reigns supreme. When using XLWings, write to your spreadsheet as infrequently as possible.

I've found rewriting the whole sheet (or area of the sheet to be changed) using the Range object to be leaps and bounds faster than writing individual cells, rows, or columns. If I'm not doing any heavy data manipulation I just use nested lists - Whether it'll be better for you to treat the sublists as columns or rows (the tranpose option is used for this) is up to how you're handling your data. If you're working with larger datasets or doing more intensive work you may want to use NumPy arrays or Panda.



This video can help you solving your question :)
By: admin