I'm using scrapy to process documents like this one:
... <div class="contents"> some text <ol> <li> more text </li> ... </ol> </div> ...
I want to collect all the text inside the contents area into a string.
I also need the '1., 2., 3....' from the
<li> elements, so my result should be
'some text 1. more text...'
So, I'm looping over
<div class="contents">'s children
for n in response.xpath('//div[@class="contents"]/node()'): if n.xpath('self::ol'): result += process_list(n) else: result += n.extract()
nis an ordered list, I loop over its elements and add a number to
nis a text node itself, I just read its value.
'some text' doesn't seem to be part of the node set, since the loop doesn't get inside the
else part. My result is
'1. more text'
Finding text nodes relative to their parent node works:
finds all the text, but this way I can't add the list item numbers.
What am I doing wrong and is there a better way to achieve my task?