|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectinfo.monitorenter.cpdetector.io.AbstractCodepageDetector
info.monitorenter.cpdetector.io.JChardetFacade
public final class JChardetFacade
A fac�ade for jchardet codepage detection. JChardet is the java port of Frank Yung-Fong Tang's Mozilla charset detector.
This charset detector works on guessing the codepage. "The algorithm looks into the byte sequence and based on the values of each byte uses a elimination logic to narrow down to the final charset. If there is a tie between EUC charsets, it uses the second logic to narrow down. This logic uses the frequency statistics of characters in a given language." ( source of description ).
It is a singleton for performance reasons (buffer allocation). Because it is
stateful (internal buffer) the method
detectCodepage(InputStream, int)(delegated to by
AbstractCodepageDetector.detectCodepage(URL)has to be synchronized.
| Method Summary | |
|---|---|
Charset |
detectCodepage(InputStream in,
int length)
This method allows to detect the charset encoding from every source (even a String, which an URL does not decorate!). |
static JChardetFacade |
getInstance()
|
boolean |
isGuessing()
|
void |
Notify(String charset)
|
void |
Reset()
|
void |
setGuessing(boolean guessing)
If it was impossible to narrow down possible results to one, an internal set of possible character encodings exists. |
| Methods inherited from class info.monitorenter.cpdetector.io.AbstractCodepageDetector |
|---|
compareTo, detectCodepage, open |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Method Detail |
|---|
public static JChardetFacade getInstance()
public Charset detectCodepage(InputStream in,
int length)
throws IOException
ICodepageDetectorThis method allows to detect the charset encoding from every source (even a String, which an URL does not decorate!).
Note that you cannot reuse the given InputStream unless it supports marking (InputStream.markSupported() ==
true), you mark the initial position with a sufficient readlimit and invoke
reset afterwards (without getting any exception).
detectCodepage in interface ICodepageDetectorin - An InputStream for the document, that supports mark and a
readlimit of argument length.length - The amount of bytes to take into account. This number should not
be longer than the amount of bytes retrievable from the
InputStream but should be as long as possible to give the fallback
detection (chardet) more hints to guess.
IOExceptionpublic void Notify(String charset)
Notify in interface org.mozilla.intl.chardet.nsICharsetDetectionObservernsICharsetDetectionObserver.Notify(java.lang.String)public void Reset()
public boolean isGuessing()
public void setGuessing(boolean guessing)
If it was impossible to narrow down possible results to one, an internal
set of possible character encodings exists. By setting guessing to true,
the call to detectCodepage(InputStream, int) and
AbstractCodepageDetector.detectCodepage(URL) will return an arbitrary possible Charset.
Currently the following precedence is implemented to choose the possible Charset:
UnsupportedCharset is returned.
guessing - The guessing to set.
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||