I have an html file which looks like:
... <p> <strong>This is </strong> <strong>a lin</strong> <strong>e which I want to </strong> <strong>join.</strong> </p> <p> 2. <strong>But do not </strong> <strong>touch this</strong> <em>Maybe some other tags as well.</em> bla bla blah... </p> ...
What I need is, if all the tags in a 'p' block are 'strong', then combine them into one line, i.e.
<p> <strong>This is a line which I want to join.</strong> </p>
Without touching the other block since it contains something else.
Any suggestions? I am using lxml.
So far I tried:
for p in self.tree.xpath('//body/p'): if p.tail is None: #no text before first element children = p.getchildren() for child in children: if len(children)==1 or child.tag!='strong' or child.tail is not None: break else: etree.strip_tags(p,'strong')
With these code I was able to strip off the strong tag in the part desired, giving:
<p> This is a line which I want to join. </p>
So now I just need a way to put the tag back in...