info.monitorenter.cpdetector.io
Class ASCIIDetector

java.lang.Object
  extended by info.monitorenter.cpdetector.io.AbstractCodepageDetector
      extended by info.monitorenter.cpdetector.io.ASCIIDetector
All Implemented Interfaces:
ICodepageDetector, Serializable, Comparable

public final class ASCIIDetector
extends AbstractCodepageDetector

A simple detector that may be used to detect plain ASCII. This instance should never be used as the first strategy of the CodepageDetectorProxy: Many different encodings are multi-byte and may be verified to be ASCII by this instance, because all their bytes are in the range from 0x00 to 0x7F.

It is recommended to use this as a fall-back, if all different strategies (e.g. JChardetFacade, ParsingDetector) fail. This is most often the case for ASCII data, as guessing and exclusion based on the content is especially hard for ASCII: almost all character sets define the ASCII range (compatibility). Therefore this is a good fall-back.

It is a singleton for performance-reasons: The constructor is private. Use getInstance()or SingletonLoader.getInstance() and SingletonLoader.newInstance(String) on the result.

Author:
Achim Westermann
See Also:
Serialized Form

Method Summary
 Charset detectCodepage(InputStream in, int length)
           This method allows to detect the charset encoding from every source (even a String, which an URL does not decorate!).
static ICodepageDetector getInstance()
           
 
Methods inherited from class info.monitorenter.cpdetector.io.AbstractCodepageDetector
compareTo, detectCodepage, open
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Method Detail

getInstance

public static ICodepageDetector getInstance()

detectCodepage

public Charset detectCodepage(InputStream in,
                              int length)
                       throws IOException
Description copied from interface: ICodepageDetector

This method allows to detect the charset encoding from every source (even a String, which an URL does not decorate!).

Note that you cannot reuse the given InputStream unless it supports marking (InputStream.markSupported() == true), you mark the initial position with a sufficient readlimit and invoke reset afterwards (without getting any exception).

Parameters:
in - An InputStream for the document, that supports mark and a readlimit of argument length.
length - The amount of bytes to take into account. This number should not be longer than the amount of bytes retrievable from the InputStream but should be as long as possible to give the fallback detection (chardet) more hints to guess.
Throws:
IOException
See Also:
ICodepageDetector.detectCodepage(java.io.InputStream, int)


Copyleft ㊢ 2003-2004 MPL 1.1, All Rights Footloose.