info.monitorenter.cpdetector.io
Interface ICodepageDetector

All Superinterfaces:
Comparable, Serializable
All Known Implementing Classes:
AbstractCodepageDetector, ASCIIDetector, ByteOrderMarkDetector, CodepageDetectorProxy, HTMLCodepageDetector, JChardetFacade, ParsingDetector, UnicodeDetector

public interface ICodepageDetector
extends Serializable, Comparable

Author:
Achim Westermann

Method Summary
 Charset detectCodepage(InputStream in, int length)
           This method allows to detect the charset encoding from every source (even a String, which an URL does not decorate!).
 Charset detectCodepage(URL url)
           Low-level method that detects the codepage (charset) of the document specified by the given URL.
 Reader open(URL url)
           High level method to open documents in the correct codepage.
 
Methods inherited from interface java.lang.Comparable
compareTo
 

Method Detail

open

Reader open(URL url)
            throws IOException

High level method to open documents in the correct codepage. Implementations of this method should delegate to the low-level method detectCodepage(URL).

Detect the codepage of the document pointed to by the URL argument. If the codepage could not be detected, null has to be returned. If the given URL does not point to a document or it is not possible to open the document specified by the given URL, an IOException is thrown.

Returns:
null, if the codepage of the document specified by the given URL was not detected or a Reader that reads the document in the detected codepage.
Throws:
IOException - thrown to indicate that it is was not possible to open the document specified by the given URL.

detectCodepage

Charset detectCodepage(URL url)
                       throws IOException

Low-level method that detects the codepage (charset) of the document specified by the given URL.

Returns:
null, if the codepage of the document specified by the given URL was not detected or the Charsetthat represents the document's codepage.
Throws:
IOException - thrown to indicate that it is was not possible to open the document specified by the given URL.

detectCodepage

Charset detectCodepage(InputStream in,
                       int length)
                       throws IOException

This method allows to detect the charset encoding from every source (even a String, which an URL does not decorate!).

Note that you cannot reuse the given InputStream unless it supports marking (InputStream.markSupported() == true), you mark the initial position with a sufficient readlimit and invoke reset afterwards (without getting any exception).

Parameters:
in - An InputStream for the document, that supports mark and a readlimit of argument length.
length - The amount of bytes to take into account. This number should not be longer than the amount of bytes retrievable from the InputStream but should be as long as possible to give the fallback detection (chardet) more hints to guess.
Throws:
IOException


Copyleft ㊢ 2003-2004 MPL 1.1, All Rights Footloose.