Hello i have this xml
<title> Something for title»</title>
<description><![CDATA[<div class="feed-description"><div class="feed-image"><img src="pictureUrl.jpg" /></div>text for desc</div>]]></description>
<pubDate>Thu, 11 Jun 2015 16:50:16 +0300</pubDate>
I try to get the img src with path:
//description//div[@class='feed-description']//div[@class='feed-image']//img/@src but it doesn't work
is there any solution?
Best How To :
A CDATA section escapes its contents. In other words, CDATA prevents its contents from being parsed as markup when the rest of the document is parsed. So the
<div>s in there are not seen as XML elements, only as flat text. The
<description> element has no element children ... only a single text child. As such, XPath can't select any
<div> descendant of
<description> because none exists in the parsed XML tree.
What to do?
If your XPath environment supports XPath 3.0, you could use parse-xml() to turn the flat text into a tree, then use XPath to select
//div[@class='feed-description']//div[@class='feed-image']//img/@src from the resulting tree.
Otherwise, your best workaround may be to use primitive string-processing functions like
match(). (The latter uses regular expressions and requires XPath 2.0.) Of course, many people will tell you not to use regular expressions to analyze markup like XML and HTML. For good reason: in the general case, it's very difficult to do it right (with regexes or with plain string searches). But for very restricted cases where the input is highly predictable, and in absence of better tools, it can be the best tool for a less-than-ideal job.
For example, for the data shown in your question, you could use
substring-before(substring-after(//description, 'img src="'), '"')
In this case, the inner call
substring-after(//description, 'img src="') returns
pictureUrl.jpg" /></div>text for desc</div>, of which the substring before
This isn't really robust, for example it'll fail if there's a space between
=. But if the exact formatting is predictable, you'll be OK.