Case conversion for UNICODE and ANSI strings

Perhaps the most elegant way to write C++ code which converts string characters case is to use function objects with generic STL algorithm. Function objects are also known as functional objects or functors. This is a feature of C++ which allows programmers to overload operator().

The following fragment illustrates how templates are used to create code which supports both UNICODE and ANSI strings:


#include <string>
#include <locale>
#include <algorithm>
#include <functional>

using namespace std;

//
//This is a function object
//it has operator() overloaded and also templated so
//it supports both wchar_t and char symbols
//
class ToLowerConverter{
private:
	std::locale my_loc;
public:
	ToLowerConverter(const std::locale &loc) : my_loc(loc){}

	template <typename CharType>
	CharType operator()(CharType ch){
		return std::tolower(ch, my_loc);
	}
};

//
// in this example wstring class is used to show
// how UNICODE string can be processed
// it is also possible to use ANSI strings here
//
void main(){
	wstring str;
	str = L"SOME string WITH Cyrillic: абвгд";

	//create locale and store it in converter object
	std::locale loc("Russian");
	ToLowerConverter conv(loc);

	//use generic algorithm to process all characters
	transform(str.begin(), str.end(), str.begin(), conv);

	//print
	wprintf(L"%s\n", str.c_str());
}

However when I first saw this construction there was a mysterious riddle for me.
It is known that STL generic algorithm such as transform() requires a function pointer to process each element of a given container. While we can use here a pointer to static function or plain function pointer but we are unable to use a pointer to a class member function. This is because class member functions need to receive additional parameter (to form this pointer) thus they are compiled using special calling conventions (thiscall in Microsoft Visual C++).

What was confusing to me is that operator() is also a member function, which requires additional this parameter but we can use it for generic algorithms.
The answer came when I took a look at STL code:

...
transform(_InIt _First, _InIt _Last, _OutIt _Dest, _Fn1 _Func){
	for (; _First != _Last; ++_First, ++_Dest)
		*_Dest = _Func(*_First);
...


This code has no restrictions for a pointer type at _Func place.
It only executes _Func(*_First) thus what we need is to provide something which can be executed
in this way.
Function object is one particular case and plain C function pointer is another.
Both cases are supported and in this I see the elegance.

Another beautiful thing is that with the help of templates our code supports all kinds of STL strings.

The only problem with this code is that for ANSI string in some languages (locales) it would not work because sometimes a lower case character can be stored in one string element while the same character in the upper case requires two elements.

Leave a Reply