ModErn Text Analysis
META Enumerates Textual Applications
Classes | Functions
meta::analyzers::tokenizers Namespace Reference

Contains tokenizers that start off a filter chain. More...

Classes

class  character_tokenizer
 Converts documents into streams of characters. More...
 
class  icu_tokenizer
 Converts documents into streams of tokens by following the unicode standards for sentence and word segmentation. More...
 
class  whitespace_tokenizer
 Converts documents into streams of whitespace delimited tokens. More...
 

Functions

template<class Tokenizer >
std::unique_ptr< token_streammake_tokenizer (const cpptoml::table &)
 Factory method for creating a tokenizer. More...
 
template<>
std::unique_ptr< token_streammake_tokenizer< icu_tokenizer > (const cpptoml::table &config)
 Specialization of the factory method use to create icu_tokenizers.
 
template<>
std::unique_ptr< token_streammake_tokenizer< whitespace_tokenizer > (const cpptoml::table &config)
 Specialization of the factory method use to create whitespace_tokenizers.
 

Detailed Description

Contains tokenizers that start off a filter chain.

Function Documentation

§ make_tokenizer()

template<class Tokenizer >
std::unique_ptr<token_stream> meta::analyzers::tokenizers::make_tokenizer ( const cpptoml::table &  )

Factory method for creating a tokenizer.

This should be specialized if your given tokenizer requires special construction behavior.