Locale::Recode (3)
Leading comments
Automatically generated by Pod::Man 4.09 (Pod::Simple 3.35) Standard preamble: ========================================================================
NAME
Locale::Recode - Object-Oriented Portable Charset ConversionSYNOPSIS
use Locale::Recode; $cd = Locale::Recode->new (from => 'UTF-8', to => 'ISO-8859-1'); die $cd->getError if $cd->getError; $cd->recode ($text) or die $cd->getError; $mime_name = Locale::Recode->resolveAlias ('latin-1'); $supported = Locale::Recode->getSupported; $complete = Locale::Recode->getCharsets;
DESCRIPTION
This module provides routines that convert textual data from one codeset to another in a portable way. The module has been started before Encode(3) was written. It's main purpose today is to provide charset conversion even when Encode(3) is not available on the system. It should also work for older Perl versions without Unicode support.Internally Locale::Recode(3) will use Encode(3) whenever possible, to allow for a faster conversion and for a wider range of supported charsets, and will only fall back to the Perl implementation when Encode(3) is not available or does not support a particular charset that Locale::Recode(3) does.
Locale::Recode(3) is part of libintl-perl, and it's main purpose is actually to implement a portable charset conversion framework for the message translation facilities described in Locale::TextDomain(3).
CONSTRUCTOR
The constructor "new()" requires two named arguments:- from
- The encoding of the original data. Case doesn't matter, aliases are resolved.
- to
- The target encoding. Again, case doesn't matter, and aliases are resolved.
The constructor will never fail. In case of an error, the object's internal state is set to bad and it will refuse to do any conversions. You can inquire the reason for the failure with the method getError().
OBJECT METHODS
The following object methods are available.- recode (STRING)
-
Converts STRINGfrom the source encoding into the destination encoding. In case of success, a truth value is returned, false otherwise. You can inquire the reason for the failure with the method getError().
- getError
- Returns either false if the object is not in an error state or an error message.
CLASS METHODS
The object provides some additional class methods:- getSupported
-
Returns a reference to a list of all supported charsets. This
may implicitely load additional Encode(3) conversions like
Encode::HanExtra(3) which may produce considerable load on your
system.
The method is therefore not intended for regular use but rather for getting resp. displaying once a list of available encodings.
The members of the list are all converted to uppercase!
- getCharsets
- Like getSupported() but also returns all available aliases.
SUPPORTED CHARSETS
The range of supported charsets is system-dependent. The following somewhat special charsets are always available:- UTF-8
-
UTF-8is available independently of your Perl version. For Perl 5.6 or better or in the presence of Encode(3), conversions are not done in Perl but with the interfaces provided by these facilities which are written in C, hence much faster.
Encoding data into
UTF-8is fast, even if it is done in Perl. Decoding it in Perl may become quite slow. If you frequently have to decodeUTF-8with Locale::Recode you will probably want to make sure that you do that with Perl 5.6 or beter, or install Encode(3) to speed up things. - INTERNAL
-
UTF-8is fast to write but hard to read for applications. It is therefore not the worst for internal string representation but not far from that. Locale::Recode(3) stores strings internally as a reference to an array of integer values like most programming languages (Perl is an exception) do, trading memory for performance.
The integer values are the
UCS-4codes of the characters in host byte order.The encoding
INTERNALis directly availabe via Locale::Recode(3) but of course you should not really use it for data exchange, unless you know what you are doing.
Locale::Recode(3) has native support for a plethora of other encodings, most of them 8 bit encodings that are fast to decode, including most encodings used on popular micros like the ISO-8859-* series of encodings, most Windows-* encodings (also known as CP*), Macintosh, Atari, etc.
NAMES AND ALIASES
Each charset resp. encoding is available internally under a unique name. Whenever the information was available, the preferredAlias handling is quite strict. The module does not make wild guesses at what you mean (``What's the meaning of the acronym
The module knows all aliases that are listed with the
CONVERSION TABLES
The conversion tables have either been taken from official sources like theThe few encodings that are affected are so simple that you will not experience any real performance penalty unless you convert large chunks of data. But the package is not really intended for such use anyway, and since Encode(3) is relatively new, I rather think that the differences are bugs in Encode which will be fixed soon.
BUGS
The module should provide fall back conversions for other Unicode encoding schemes likeThe pure Perl
AUTHOR
Copyright (C) 2002-2016 Guido Flohr <www.guido-flohr.net> (<mailto:guido.flohr@cantanea.com>), all rights reserved. See the source code for details!code for details!SEE ALSO
Encode(3), iconv(3), iconv(1), recode(1), perl(1)POD ERRORS
Hey! The above document had some coding errors, which are explained below:- Around line 365:
-
=cut found outside a pod block. Skipping to next block.