ModErn Text Analysis
META Enumerates Textual Applications
Classes | Functions
meta::utf Namespace Reference

Functions for converting to and from various character sets. More...

Classes

class  icu_handle
 Internal class that ensures that ICU cleans up all of its "still-reachable" memory before program termination. More...
 
class  segmenter
 Class that encapsulates segmenting unicode strings. More...
 
class  transformer
 Class that encapsulates transliteration of unicode strings. More...
 

Functions

std::string to_utf8 (const std::string &str, const std::string &charset)
 Converts a string from the given charset to utf8. More...
 
std::u16string to_utf16 (const std::string &str, const std::string &charset)
 Converts a string fro the given charset to utf16. More...
 
std::string to_utf8 (const std::u16string &str)
 Converts a string from utf16 to utf8. More...
 
std::u16string to_utf16 (const std::string &str)
 Converts a string from utf8 to utf16. More...
 
std::string tolower (const std::string &str)
 Lowercases a utf8 string. More...
 
std::string toupper (const std::string &str)
 Uppercases a utf8 string. More...
 
std::string foldcase (const std::string &str)
 Folds the case of a utf8 string. More...
 
std::string transform (const std::string &str, const std::string &id)
 Transliterates a utf8 string, using the rules defined in ICU. More...
 
template<class Predicate >
std::string remove_if (const std::string &str, Predicate &&pred)
 Removes UTF-32 codepoints that match the given function. More...
 
template<class Function >
std::string transform (const std::string &str, Function &&fun)
 Transforms a utf8 string using the provided function object applied to each codepoint in the string. More...
 
uint64_t length (const std::string &str)
 
bool isalpha (uint32_t codepoint)
 
bool isblank (uint32_t codepoint)
 
bool isspace (uint32_t codepoint)
 
std::u16string icu_to_u16str (const icu::UnicodeString &icu_str)
 Helper method that converts an ICU string to a std::u16string. More...
 
std::string icu_to_u8str (const icu::UnicodeString &icu_str)
 Helper method that converts an ICU string to a std::string in utf8. More...
 

Detailed Description

Functions for converting to and from various character sets.

Function Documentation

§ to_utf8() [1/2]

std::string meta::utf::to_utf8 ( const std::string &  str,
const std::string &  charset 
)

Converts a string from the given charset to utf8.

Parameters
strThe string to convert
charsetThe charset of the given string
Returns
a utf8 string

§ to_utf16() [1/2]

std::u16string meta::utf::to_utf16 ( const std::string &  str,
const std::string &  charset 
)

Converts a string fro the given charset to utf16.

Parameters
strThe string to convert
charsetThe charset of the given string
Returns
a utf string

§ to_utf8() [2/2]

std::string meta::utf::to_utf8 ( const std::u16string &  str)

Converts a string from utf16 to utf8.

Parameters
strThe string to convert
Returns
a utf8 string

§ to_utf16() [2/2]

std::u16string meta::utf::to_utf16 ( const std::string &  str)

Converts a string from utf8 to utf16.

Parameters
strThe string to convert
Returns
a utf16 string

§ tolower()

std::string meta::utf::tolower ( const std::string &  str)

Lowercases a utf8 string.

Parameters
strThe string to convert
Returns
a lowercased utf8 string

§ toupper()

std::string meta::utf::toupper ( const std::string &  str)

Uppercases a utf8 string.

Parameters
strThe string to convert
Returns
an uppercased utf8 string.

§ foldcase()

std::string meta::utf::foldcase ( const std::string &  str)

Folds the case of a utf8 string.

This is like lowercase, but a bit more general.

Parameters
strThe string to convert
Returns
a case-folded utf8 string

§ transform() [1/2]

std::string meta::utf::transform ( const std::string &  str,
const std::string &  id 
)

Transliterates a utf8 string, using the rules defined in ICU.

See also
http://userguide.icu-project.org/transforms
Parameters
strThe string to transliterate
idThe ICU identifier for the transliteration method to use
Returns
the transliterated string, in utf8

§ remove_if()

template<class Predicate >
std::string meta::utf::remove_if ( const std::string &  str,
Predicate &&  pred 
)

Removes UTF-32 codepoints that match the given function.

Parameters
strThe string to remove characters from
predThe predicate that returns true for codepoints that should be removed
Returns
a utf8 formatted string with all codepoints matching pred removed

§ transform() [2/2]

template<class Function >
std::string meta::utf::transform ( const std::string &  str,
Function &&  fun 
)

Transforms a utf8 string using the provided function object applied to each codepoint in the string.

Parameters
strThe string to transform
funThe function to transform each codepoint with
Returns
the transformed string

§ length()

uint64_t meta::utf::length ( const std::string &  str)
Returns
the number of code points in a utf8 string.
Parameters
strThe string to find the length of

§ isalpha()

bool meta::utf::isalpha ( uint32_t  codepoint)
Returns
whether a code point is a letter character
Parameters
codepointThe codepoint in question

§ isblank()

bool meta::utf::isblank ( uint32_t  codepoint)
Returns
whether a code point is a blank character
Parameters
codepointThe codepoint in question

§ isspace()

bool meta::utf::isspace ( uint32_t  codepoint)
Returns
whether a code point is a space character
Parameters
codepointThe codepoint in question

§ icu_to_u16str()

std::u16string meta::utf::icu_to_u16str ( const icu::UnicodeString &  icu_str)
inline

Helper method that converts an ICU string to a std::u16string.

Parameters
icu_strThe ICU string to be converted
Returns
a std::u16string from the given ICU string

§ icu_to_u8str()

std::string meta::utf::icu_to_u8str ( const icu::UnicodeString &  icu_str)
inline

Helper method that converts an ICU string to a std::string in utf8.

Parameters
icu_strThe ICU string to be converted
Returns
a std::string in utf8 from the given ICU string