Unicode Support

<< Click to Display Table of Contents >>

Navigation:  Image Manager > CVB Technology >

Unicode Support

 

The Common Vision Blox API now generally supports the use of Unicode strings almost anywhere strings can be passed into or retrieved from function calls.
There are currently only three exceptions to this rule:

The tools CVC Color and CVC Barcode and GigE Vision Server do not yet support Unicode strings.

Wherever strings are being handled that by definition do not exceed the ASCII character range (0-127), no Unicode functions have been implemented.
Example: GetMagicNumber does not have a Unicode equivalent because by definition neither the tool ID nor the magic number may use characters outside the ASCII range.

 

 

Implementation

 

For all functions exported by our unmanaged DLLs/shared objects (with the exception of the aforementioned set) that accept a pointer to a zero-terminated char string, an equivalent function has been added that accepts a pointer to a zero-terminated Unicode string.
These new functions have been named identical to the ones after which they have been modeled with the character "W" appended to the function name.
For example:

 

  IMPORT(cvbbool_tLoadImageFile(const charszFileNameIMGImage);
  IMPORT(cvbbool_tLoadImageFileW(const wchar_tszFileNameIMGImage);

 

(note that the more obvious option of simply adding a function overload was not available to us as the extern "C" declaration implied by the IMPORT macro forbids overloads).

 

Subtle differences in character encoding exist between the Windows™ and the Linux platform:

 

Windows

Linux

char

usually codepage-based character encoding

UTF8

wchar_t

UTF16

UTF32

 

These differences have been preserved and addressed:

Functions that accept a char pointer on Windows treat the input string as if it's been encoded using the system's current default codepage.

On Linux those same functions expect the input to follow UTF8 encoding rules.

wchar_t is treated as UTF16 input on Windows and UTF32 input on Linux.

 

 

Usage in Different Programming Languages

 

Usage of the already existing char versions of the functions of Common Vision Blox has not changed, neither on Windows nor on Linux, and users may continue using these functions as they did before.
Users who want to make use of the newly added Unicode functions, however, should be aware of the following:

 

C++

 

As previously described, the char and the wchar_t versions of the functions are directly accessible.
For convenience, #define statements have been added for these functions that handle mapping automatically according to the current Character Set setting of your Visual Studio project.
To stick with the previous example: When working with C++ the following variants of LoadImageFile are available:

 

Function

Input/Meaning

LoadImageFile

codepage-based character encoding

LoadImageFileW

UTF16

LoadImageFileT

maps to LoadImageFileW if the preprocessor symbol UNICODE has been defined;
otherwise: LoadImageFile

 

C#/VB.Net

 

C# and VB.Net programmers have the easiest approach to using Unicode in Common Vision Blox.
The type System.String has been using UTF16 all along.
In the past the marshaling code has taken care of the necessary conversion to codepage strings when calling functions from Common Vision Blox managed wrappers (potentially losing the information that cannot be mapped to the current code page).
Now those conversions simply do not happen any more and at the same time no information is lost in the transition between the managed code's System.String and the Common Vision Blox function.

 

In other words: whenever .Net code calls e.g. Cvb.Image.LoadImageFile the unmanaged function that actually gets called is now LoadImageFileW and no changes need to be made to .Net programs for the sake of using Unicode - recompilation against the new managed wrappers is sufficient.

 

ActiveX Controls

 

The API of the Common Vision Blox ActiveX controls has not changed at all - they continue using BSTR strings internally which usually are Unicode strings. The only difference now is that if an application does actually pass an UTF16 string it will now be properly handled where before the unmappable characters have usually been replaced with '?'.

 

 

Container Formats

 

One particular challenge in the introduction of Unicode in Common Vision Blox was the handling of Common Vision Blox's proprietary container formats that store strings (for example the Minos Training Set or Classifier files). The aim was to preserve as much backward compatibility as possible and switching those container formats to UTF16 would have broken that backward compatibility.

Therefore a different approach was taken:
The new builds of those DLLs that are affected (MinosCVC.dll, ShapeFinder.dll, Polimago.dll) now always store UTF8-encoded strings in their containers - making it generally possible to open these containers also in the older builds of those tools.
In the newly saved containers, the encoding is identified by a prepended BOM (Byte Order Map).
So when opening containers saved with the new builds of these DLLs in an older version, you will notice three extra characters ("" - byte sequence 0xef 0xbb 0xbf) in front of each string.
These characters may be safely removed or skipped. Characters beyond the ASCII range, however, are likely to have been modified during the UTF8 conversion.

The other way round, if one of the new Unicode-capable DLLs loads an older container file, the strings contained in it will automatically be converted to their proper internal representation used during runtime.
In other words: Older files may be opened just like before.