Reference for unit 'unicodedata'

Unicode data management

Overview

The fpwidestring unit relies on having relevant Unicode collation data linked in the binary. The Unicode data is managed using the routines in the unicodedata unit. The FPC project distributes some Unicode collation data in .bco files which can be loaded using the LoadCollation routines. The LoadCollation is the main routine of this unit.

All collation data requires at least the Default Unicode Collation Element Table to be registered (called DUCET). The DUCET encoding is provided by the unicodeducet unit, part of the rtl-Unicode package.

There are two ways to register collations :

at compile time: by including the desired collation unit, for example for Russian and Japanese languages to be available you will have to include collation_ru and collation_ja from package "rtl-unicode". at runtime using the above mentioned LoadCollation function.

The two ways can co-exist: some collations may be compile time included (for example for most used collations) and others can be loaded at runtime in the same application.

The binary collation files are endian sensitive:

there are files for little endian systems named collation__le.bco (such as collation_ru_le.bco and collation_ja_le.bco) there are files for big endian systems named collation__be.bco (such as collation_ru_be.bco and collation_ja_be.bco).

Note that the compile time units collation units (collation_lang.pas) include already the unicodeducet.pas (DUCET) unit so it is not necessary to include it manually, contrary to the binary files. So an application that only uses the binary collation files should at least include the unicodeducet unit or manually load the binary collation collation_ducet_le.bco or collation_ducet_be.bco, depending on the endianness of the platform. The LoadCollation call using a directory and the language ducet automatically select the correct file.