ModErn Text Analysis
META Enumerates Textual Applications
Public Member Functions | Public Attributes | Private Member Functions | List of all members
meta::analyzers::analyzer Class Referenceabstract

An class that provides a framework to produce token counts from documents. More...

#include <analyzer.h>

Inheritance diagram for meta::analyzers::analyzer:
meta::analyzers::ngram_analyzer

Public Member Functions

virtual ~analyzer ()=default
 A default virtual destructor.
 
template<class T >
feature_map< T > analyze (const corpus::document &doc)
 Tokenizes a document. More...
 
virtual std::unique_ptr< analyzerclone () const =0
 Clones this analyzer.
 

Public Attributes

friend multi_analyzer
 

Private Member Functions

virtual void tokenize (const corpus::document &doc, featurizer &counts)=0
 The tokenization function that actually does the heavy lifting. More...
 

Detailed Description

An class that provides a framework to produce token counts from documents.

All analyzers inherit from this class and (possibly) implement tokenize().

The template argument for an analyzer indicates the supported feature value for the analyzer, which is either uint64_t for inverted_index or double for forward_index.

When defining your own sublcass of analyzer, you should ensure to subclass from the appropriate type.

Member Function Documentation

§ analyze()

template<class T >
feature_map<T> meta::analyzers::analyzer::analyze ( const corpus::document doc)
inline

Tokenizes a document.

Parameters
docThe document to be tokenized
Returns
a feature_map that maps the observed features to their counts in the document

§ tokenize()

virtual void meta::analyzers::analyzer::tokenize ( const corpus::document doc,
featurizer counts 
)
privatepure virtual

The tokenization function that actually does the heavy lifting.

This should be overridden in derived classes.

Parameters
docThe document to be tokenized
countsThe featurizer to record feature values with

The documentation for this class was generated from the following file: