kmail

Provides encoding detection capabilities. More...

#include <encodingdetector.h>

List of all members.

Public Types

enum  EncodingChoiceSource {
  DefaultEncoding, AutoDetectedEncoding, BOM, EncodingFromXMLHeader,
  EncodingFromMetaTag, EncodingFromHTTPHeader, UserChosenEncoding
}
enum  AutoDetectScript {
  None, SemiautomaticDetection, Arabic, Baltic,
  CentralEuropean, ChineseSimplified, ChineseTraditional, Cyrillic,
  Greek, Hebrew, Japanese, Korean,
  NorthernSaami, SouthEasternEurope, Thai, Turkish,
  Unicode, WesternEuropean
}

Public Member Functions

 EncodingDetector ()
 EncodingDetector (TQTextCodec *codec, EncodingChoiceSource source, AutoDetectScript script=None)
bool setEncoding (const char *encoding, EncodingChoiceSource type)
const char * encoding () const
bool visuallyOrdered () const
void setAutoDetectLanguage (AutoDetectScript)
AutoDetectScript autoDetectLanguage () const
EncodingChoiceSource encodingChoiceSource () const
bool analyze (const char *data, int len)
bool analyze (const TQByteArray &data)

Static Public Member Functions

static AutoDetectScript scriptForName (const TQString &lang)
static TQString nameForScript (AutoDetectScript)
static AutoDetectScript scriptForLanguageCode (const TQString &lang)
static bool hasAutoDetectionForScript (AutoDetectScript)

Protected Member Functions

bool errorsIfUtf8 (const char *data, int length)
TQTextDecoder * decoder ()

Detailed Description

Provides encoding detection capabilities.

Searches for encoding declaration inside raw data -- meta and xml tags. In the case it can't find it, uses heuristics for specified language.

If it finds unicode BOM marks, it changes encoding regardless of what the user has told

Intended lifetime of the object: one instance per document.

Typical use:

 TQByteArray data;
 ...
 EncodingDetector detector;
 detector.setAutoDetectLanguage(EncodingDetector::Cyrillic);
 TQString out=detector.decode(data);

Do not mix decode() with decodeWithBuffering()

Guess encoding of char array

Definition at line 57 of file encodingdetector.h.


Constructor & Destructor Documentation

EncodingDetector::EncodingDetector (  ) 

Default codec is latin1 (as html spec says), EncodingChoiceSource is default, AutoDetectScript=Semiautomatic.

Definition at line 796 of file encodingdetector.cpp.

EncodingDetector::EncodingDetector ( TQTextCodec *  codec,
EncodingChoiceSource  source,
AutoDetectScript  script = None 
)

Allows to set Default codec, EncodingChoiceSource, AutoDetectScript.

Definition at line 800 of file encodingdetector.cpp.


Member Function Documentation

bool EncodingDetector::analyze ( const char *  data,
int  len 
)

Analyze text data.

Returns:
true if there was enough data for accurate detection

Definition at line 906 of file encodingdetector.cpp.

bool EncodingDetector::analyze ( const TQByteArray &  data  ) 

Analyze text data.

Returns:
true if there was enough data for accurate detection

Definition at line 901 of file encodingdetector.cpp.

TQTextDecoder * EncodingDetector::decoder (  )  [protected]
Returns:
TQTextDecoder for detected encoding

Definition at line 841 of file encodingdetector.cpp.

const char * EncodingDetector::encoding (  )  const

Convenience method.

Returns:
mime name of detected encoding

Definition at line 824 of file encodingdetector.cpp.

bool EncodingDetector::errorsIfUtf8 ( const char *  data,
int  length 
) [protected]

Check if we are really utf8.

Taken from kate

Returns:
true if current encoding is utf8 and the text cannot be in this encoding

Please somebody read http://de.wikipedia.org/wiki/UTF-8 and check this code...

Definition at line 732 of file encodingdetector.cpp.

EncodingDetector::AutoDetectScript EncodingDetector::scriptForName ( const TQString &  lang  )  [static]

Takes lang name _after_ it were i18n()'ed.

Definition at line 1166 of file encodingdetector.cpp.

bool EncodingDetector::setEncoding ( const char *  encoding,
EncodingChoiceSource  type 
)
Returns:
true if specified encoding was recognized

Definition at line 846 of file encodingdetector.cpp.


The documentation for this class was generated from the following files: