Why Unicode?
Article created 2001-03-12 by Rainer Gerhards.
Unicode is a standard to encode all of the world's languages correctly on
computers. In this article, Rainer Gerhards explains what Unicode is and why
Adiscon bases all of its products on the Unicode standard.
What is Unicode?
It is an international standard. Its goal is to resolve ambiguities that
traditionally arise when displaying complex scripts like Japanese, Arabian or
Chinese on computer systems. Beside solving many Internationalization issues,
Unicode-enabled programs also run faster under Windows NT, 2000 and XP (and
following versions).
So what does Unicode do?
It's really easy. Traditional character sets (like the ANSI alphabet) base on
8 bit characters called a byte. A single byte can represent up to 256 different
values and thus characters. This is well enough to represent western scripts
like being used in English, French or German language. However, if it comes to
more complex languages like Japanese or Korean, 256 different characters is
simply insufficient.
So users of this languages have developed so called double byte character
sets, called DBCS. In DBCS, each character is represented by either one or more
bytes. Character encoding specifies how to interpret the byte values and whether
or not a byte is a single character or just part of a larger set of bytes
representing a multi-byte character.
Unfortunately, there are many different DBCS encodings for a given language.
To make matters worse, different operating systems and different programming
languages tend to use different DBCS encodings. Also, programming is relatively
complex because of byte parsing operations.
Unicode's goal is to solve this issue by using more than one byte for each
character. In a typical implementation, 2 bytes are used, being able to
represent 65,564 different characters. This is enough to store most of the
world's characters. So with Unicode, all different characters can be stored in
one string. As all characters have a fixed width, programming complexity is
greatly reduced.
Why do Unicode Applications run faster under Windows NT/2000/XP?
That reduced complexity of course provides better performance for
applications. Complex character mappings and detects need not to be done. This
will definitely improve performance.
Under the Windows NT based operating systems, there is an additional big, big
performance plus. Windows NT itself internally bases on Unicode only. So all
operating system calls (APIs) do expect characters encoded in Unicode - even on
e.g. US English language versions of the operating system. However, there are
also APIs available for the many applications the work with ANSI strings (with
8-bit characters). But these APIs are so called "wrappers" - the wrap
the Unicode version of the API. All the ANSI version does is to convert the ANSI
string to a Unicode string and then pass it to the Unicode version.
These translations not only involve the actual conversion but also allocation
and de-allocation of temporary buffers to hold the converted strings. Easy guess
that this will take a lot of time.
So Unicode-only applications can perform a lot better if run on Windows NT.
Internationalization
An additional big plus is much easier internationalization. Applications
using Unicode internally are able to store and process all of the world's
characters. These removes many difficulties traditionally involved when
internationalizing an application.
Of course, successful internationalizations is much more than Unicode
enabling an application. It required careful screen design (different languages
need different space to display the same sentence), translation and cultural
understanding.
Unicode, however, is a building block to successful internationalization and
already has solved many troubles developers traditionally experience.
How about Adiscon Products?
As part of our internationalization strategy, we will base all of our
products internally on Unicode. We also expect a notable performance gain from
that step.
At the time of this writing, the EventReporter
product (beginning with version 5.1) is already natively based on Unicode. Work
on the WinSyslog 3.3 release is
already in progress and includes native Unicode support. The other products will
follow.
Our products running under Windows 95/98/me will also be based on Unicode
internally. Unfortunately, these operating systems do not provide an Unicode
API. So Adiscon products on that platforms will support ANSI characters
externally.
Anything else in Stock?
Indeed, there are more good news. Unicode enables us to reliable store all of
the world's characters. But we honor the fact the world is not yet Unicode
based. So supporting DBCS is extremely important when it comes to system
interoperability. However, with everything stored in Unicode, we can easily
convert to other DBCS encodings when it comes to forwarding information. The
EventReporter product, for example, can forward messages in JIS, SJIS and EUC-JP
encodings. That capability will also be included in all products.
Want to know more on Unicode?
We hope you got a first start on the benefits of Unicode and how it enhances
Adiscon products. If you would like to find more detailed information, please
visit the Unicode Consortium.
This is the body dedicated to the advance and promotion of Unicode. It's web-site
offers a wealth of information and useful resources.
I hope that this article helpful. If you have any questions or remarks, please do not
hesitate to contact me at rgerhards@adiscon.com.
|