|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectinfo.monitorenter.unicode.decoder.DecodeUtil
Easy to use utility functions with scope on decoding to unicode.
Be careful with the methods that work on String data (vs. Streams): Large
documents will cause an .
OutOfMemoryError
| Method Summary | |
static String |
decodeHtmlEntities(String html,
boolean recursive)
Decodes HTML Entities(e.g. |
static void |
main(String[] args)
Main hook used for short test. |
| Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Method Detail |
public static String decodeHtmlEntities(String html,
boolean recursive)
throws antlr.RecognitionException,
antlr.TokenStreamException,
IOException
This method should perform quick as an ANTLR generated parser is used.
HTML entities are described in http://www.w3.org/TR/html401/sgml/entities.html
For enterprise support of arbitrary large files prefer the approach of
.
HtmlEntityDecoderReader
html - the html data to decode HTML Entities in.recursive - if true the input will be processed until there are no character
entity references contained any more (decoding ö will
produce ö).
IOException - if sth. goes wrong.
antlr.TokenStreamException - if invalid character data was found in the underlying stream.
This is unlikely to happen as the lexer covers all characters,
but if it should happen (ANTLR error?) this method cannot deal
with the problem and does not catch the exception.
antlr.RecognitionException - if invalid format was found in the given html. This is unlikely
to happen as the grammar accepts any tokens , but if it should
happen (ANTLR error?) this method cannot deal with the problem
and does not catch the exception.
public static void main(String[] args)
throws antlr.RecognitionException,
antlr.TokenStreamException,
IOException,
jargs.gnu.CmdLineParser.IllegalOptionValueException,
jargs.gnu.CmdLineParser.UnknownOptionException
args - ignored.
antlr.RecognitionException - if sth. in the parser goes wrong.
antlr.TokenStreamException - if sth. in the lexer goes wrong.
IOException - if sth. in io goes wrong.
UnknownOptionException - if arguments are wrong.
IllegalOptionValueException - if arguments are wronger.
jargs.gnu.CmdLineParser.IllegalOptionValueException
jargs.gnu.CmdLineParser.UnknownOptionException
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||