I am trying to remove some special characters which was existed in XML tags, we can use some regsubs or string map function to eliminate XML special chars in tagged text, But It is lengthy/time consuming process because our log file was very huge around ~25 MB.
Is there any special method/tip to eliminate special chars in XML tags
Here is a sample looks like
<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<CompanyName>Blauer See Delikatessen</CompanyName>
<CompanyName>Split Rail Beer & Ale</CompanyName>
Best How To :
If you mean the ampersand, it is not in a tag, it is in the text that appears between two tags.
The reason people choose to use XML for data interchange is that it's a standard, and there's lots of software around to handle it. That advantage disappears entirely if you try to use something that's almost XML but not quite.
By far the best solution is to fix the program that is generating this not-quite-XML.
If you really can't do that, you'll have to try and repair it, and the way of doing that depends on the nature of the damage. You could for example use any language that supports regular expressions to replace the ampersand in any sequence of characters where the ampersand isn't followed by either '#' or a sequence of alphanumerics and then a semicolon, by
"&". However, if the data contains this error, then it means it's been generated carelessly, and so it could contain any number of other errors as well.