java 正则替换cdata,Java：从xml中删除cdata标记

2023-08-07 13:06:46

xpath is nice for parsing xml files, but its not working for data inside the cdata tag:

more text and tags

... ]]>

My solution: Get the content of the xml first and remove

"".

After that I would run xpath "to reach everything" from the xml file. Is there a better solution? If not, how can I do it with a regular expression?

解决方案

The reason for the CDATA tags there is that everything inside them is pure text, nothing which should be interpreted directly as XML. You could write your document fragment in the question alternatively as

Some Text <p>more text and tags</p>...

(with a leading and trailing space).

If you really want to interpret this as XML, extract the text from your document, and submit it to an XML parser again.