ModErn Text Analysis
META Enumerates Textual Applications
Namespaces | Classes | Typedefs | Functions
meta::classify Namespace Reference

Algorithms for multi-class and binary classification. More...

Namespaces

 kernel
 Kernel functions for linear classifiers.
 
 loss
 Loss functions for sgd.
 

Classes

class  binary_classifier
 A classifier which classifies documents as "positive" or "negative". More...
 
class  binary_classifier_factory
 Factory that is responsible for creating binary classifiers from configuration files. More...
 
class  binary_classifier_loader
 Factory that is responsible for loading binary_classifiers from input streams. More...
 
class  binary_dataset_view
 A non-owning view of a dataset with binary class labels. More...
 
class  classifier
 A classifier uses a document's feature space to identify which group it belongs to. More...
 
class  classifier_exception
 Exception thrown from classifier operations. More...
 
class  classifier_factory
 Factory that is responsible for creating classifiers from configuration files. More...
 
class  classifier_loader
 Factory that is responsible for loading classifiers from input streams. More...
 
class  confusion_matrix
 Allows interpretation of classification errors. More...
 
class  dual_perceptron
 Implements a perceptron classifier, but using the dual formulation of the problem. More...
 
class  knn
 Implements the k-Nearest Neighbor lazy learning classification algorithm. More...
 
class  knn_exception
 Basic exception for knn interactions. More...
 
class  linear_model
 A storage class for multiclass linear classifier models. More...
 
class  linear_model_exception
 Exception thrown during interactions with linear_models. More...
 
class  logistic_regression
 Multinomial logistic regression. More...
 
class  multiclass_dataset
 
class  multiclass_dataset_view
 A non-owning view of a dataset with categorical class labels. More...
 
class  naive_bayes
 Implements the Naive Bayes classifier, a simplistic probabilistic classifier that uses Bayes' theorem with strong feature independence assumptions. More...
 
class  naive_bayes_exception
 
class  nearest_centroid
 Implements the nearest centroid classification algorithm. More...
 
class  nearest_centroid_exception
 Basic exception for nearest_centroid interactions. More...
 
class  one_vs_all
 Generalizes binary classifiers to operate over multiclass types using the one vs all method. More...
 
class  one_vs_one
 Ensemble method adaptor for extending binary_classifiers to the multi-class classification case by using a one-vs-one strategy. More...
 
class  online_binary_classifier
 A binary classifier that can be updated in-place with new documents, either in batches or one-at-a-time. More...
 
class  online_classifier
 A multi-class classifier that can be updated in-place with new documents either in batches or one-at-a-time. More...
 
class  sgd
 Implements stochastic gradient descent for learning binary linear classifiers. More...
 
class  svm_wrapper
 Wrapper class for liblinear (http://www.csie.ntu.edu.tw/~cjlin/liblinear/) and libsvm (http://www.csie.ntu.edu.tw/~cjlin/libsvm/) implementation of support vector machine classification. More...
 
class  winnow
 Implements the Winnow classifier, a simplistic linear classifier for linearly-separable data. More...
 

Typedefs

using binary_dataset = learn::labeled_dataset< bool >
 

Functions

template<class Index , class Classifier >
void batch_train (Index &idx, Classifier &cls, const std::vector< doc_id > &training_set, uint64_t batch_size)
 This trains a classifier in an online fashion, using batches of size batch_size from the training_set. More...
 
std::unique_ptr< binary_classifiermake_binary_classifier (const cpptoml::table &config, binary_dataset_view training)
 (Non-template): Convenience method for creating a binary classifier using the factory. More...
 
std::unique_ptr< binary_classifierload_binary_classifier (std::istream &stream)
 Convenience method for loading a binary_classifier using the factory. More...
 
template<class Classifier >
void register_binary_classifier ()
 Registration method for binary classifiers. More...
 
template<class Creator >
confusion_matrix cross_validate (Creator &&creator, classifier::dataset_view_type docs, size_t k, bool even_split=false)
 Performs k-fold cross-validation on a set of documents. More...
 
confusion_matrix cross_validate (const cpptoml::table &config, classifier::dataset_view_type docs, size_t k, bool even_split=false)
 Performs k-fold cross-validation on a set of documents. More...
 
template<>
std::unique_ptr< classifiermake_classifier< dual_perceptron > (const cpptoml::table &, multiclass_dataset_view training)
 Specialization of the factory function used to create dual_perceptrons.
 
template<>
std::unique_ptr< classifiermake_multi_index_classifier< knn > (const cpptoml::table &config, multiclass_dataset_view training, std::shared_ptr< index::inverted_index > inv_idx)
 Specialization of the factory method used to create knn classifiers.
 
template<>
std::unique_ptr< classifiermake_classifier< logistic_regression > (const cpptoml::table &, multiclass_dataset_view training)
 Specialization of the factory method used for creating logistic_regression classifiers.
 
template<>
std::unique_ptr< classifiermake_classifier< naive_bayes > (const cpptoml::table &config, multiclass_dataset_view training)
 Specialization of the factory method used for creating naive bayes classifiers.
 
template<>
std::unique_ptr< classifiermake_multi_index_classifier< nearest_centroid > (const cpptoml::table &, multiclass_dataset_view training, std::shared_ptr< index::inverted_index >)
 Specialization of the factory method used to create nearest_centroid classifiers.
 
template<>
std::unique_ptr< classifiermake_classifier< one_vs_all > (const cpptoml::table &, multiclass_dataset_view training)
 Specialization of the factory method used to create one_vs_all classifiers.
 
template<>
std::unique_ptr< classifiermake_classifier< one_vs_one > (const cpptoml::table &, multiclass_dataset_view training)
 Specialization of the factory method used to create one_vs_all classifiers.
 
template<>
std::unique_ptr< binary_classifiermake_binary_classifier< sgd > (const cpptoml::table &config, binary_dataset_view training)
 Specialization of the factory method used to create sgd classifiers.
 
template<>
std::unique_ptr< classifiermake_classifier< svm_wrapper > (const cpptoml::table &, multiclass_dataset_view training)
 Specialization of the factory method used for creating svm_wrapper classifiers.
 
template<>
std::unique_ptr< classifiermake_classifier< winnow > (const cpptoml::table &config, multiclass_dataset_view training)
 Specialization of the factory method used for creating winnow classifiers.
 
std::unique_ptr< classifiermake_classifier (const cpptoml::table &config, multiclass_dataset_view training, std::shared_ptr< index::inverted_index > inv_idx=nullptr)
 Convenience method for creating a classifier using the factory. More...
 
template<class Classifier >
std::unique_ptr< classifiermake_classifier (const cpptoml::table &, multiclass_dataset_view training)
 Factory method for creating a classifier. More...
 
template<class Classifier >
std::unique_ptr< classifiermake_multi_index_classifier (const cpptoml::table &, multiclass_dataset_view training, std::shared_ptr< index::inverted_index > inv_idx)
 Factory method for creating a classifier that takes both index types. More...
 
std::unique_ptr< classifierload_classifier (std::istream &stream)
 Convenience method for loading a classifier using the factory. More...
 
template<class Classifier >
void register_classifier ()
 Registration method for classifiers. More...
 
template<class Classifier >
void register_multi_index_classifier ()
 Registration method for multi-index classifiers. More...
 

Detailed Description

Algorithms for multi-class and binary classification.

Function Documentation

§ batch_train()

template<class Index , class Classifier >
void meta::classify::batch_train ( Index &  idx,
Classifier &  cls,
const std::vector< doc_id > &  training_set,
uint64_t  batch_size 
)

This trains a classifier in an online fashion, using batches of size batch_size from the training_set.

Parameters
idxThe index the classifier is using (to load in new data chunks for each batch)
clsThe classifier to train. This must be a classifier supporting online learning (e.g., sgd or an ensemble of sgd)
training_setThe list of document ids that comprise the training data
batch_sizeThe size of the batches to use for the minibatch training

§ make_binary_classifier()

std::unique_ptr< binary_classifier > meta::classify::make_binary_classifier ( const cpptoml::table &  config,
binary_dataset_view  training 
)

(Non-template): Convenience method for creating a binary classifier using the factory.

(Template): Factory method for creating a binary classifier; this should be specialized if your given binary classifier requires special construction behavior (e.g., reading parameters).

Parameters
configThe table that specifies the binary classifier's configuration
trainingThe training data
Returns
a unique_ptr to a binary_classifier constructed from the given configuration, trained on the given training data
Parameters
configThe table that specifies the binary classifier's configuration
trainingThe training data
Returns
a unique_ptr to a binary_classifier (of derived type Classifier) that has been constructed from the given configuration

§ load_binary_classifier()

std::unique_ptr< binary_classifier > meta::classify::load_binary_classifier ( std::istream &  input)

Convenience method for loading a binary_classifier using the factory.

Factory method for loading a classifier.

Parameters
streamThe stream to load the model from
Returns
a unique_ptr to the classifier created from the given stream

This should be specialized if your given classifier requires special construction behavior (e.g., reading parameters).

Parameters
streamThe stream to load the model from
Returns
a unique_ptr to the classifier (of derived type Classifier) created from the given stream

§ register_binary_classifier()

template<class Classifier >
void meta::classify::register_binary_classifier ( )

Registration method for binary classifiers.

Clients should use this method to register any new binary classifiers they write.

§ cross_validate() [1/2]

template<class Creator >
confusion_matrix meta::classify::cross_validate ( Creator &&  creator,
classifier::dataset_view_type  docs,
size_t  k,
bool  even_split = false 
)

Performs k-fold cross-validation on a set of documents.

Parameters
creatorA function to create classifiers given a multiclass_dataset_view
docsTesting documents
kThe number of folds
even_splitWhether to evenly split the data by class for a fair baseline
Returns
a confusion_matrix containing the results over all the folds

§ cross_validate() [2/2]

confusion_matrix meta::classify::cross_validate ( const cpptoml::table &  config,
classifier::dataset_view_type  docs,
size_t  k,
bool  even_split = false 
)

Performs k-fold cross-validation on a set of documents.

Parameters
configThe configuration to use to create the classifier
docsTesting documents
kThe number of folds
even_splitWhether to evenly split the data by class for a fair baseline
Returns
a confusion_matrix containing the results over all the folds

§ make_classifier() [1/2]

std::unique_ptr< classifier > meta::classify::make_classifier ( const cpptoml::table &  config,
multiclass_dataset_view  training,
std::shared_ptr< index::inverted_index inv_idx = nullptr 
)

Convenience method for creating a classifier using the factory.

Parameters
configThe configuration group that specifies the configuration for the classifier to be created
inv_idxThe inverted_index to be passed to the classifier being created (if needed)
Returns
a unique_ptr to the classifier created from the given configuration

§ make_classifier() [2/2]

template<class Classifier >
std::unique_ptr<classifier> meta::classify::make_classifier ( const cpptoml::table &  ,
multiclass_dataset_view  training 
)

Factory method for creating a classifier.

This should be specialized if your given classifier requires special construction behavior (e.g., reading parameters).

Parameters
configThe configuration group that specifies the configuration for the classifier to be created
Returns
a unique_ptr to the classifier (of derived type Classifier) created from the given configuration

§ make_multi_index_classifier()

template<class Classifier >
std::unique_ptr<classifier> meta::classify::make_multi_index_classifier ( const cpptoml::table &  ,
multiclass_dataset_view  training,
std::shared_ptr< index::inverted_index inv_idx 
)

Factory method for creating a classifier that takes both index types.

This should be specialized if your given classifier requires special construction behavior.

Parameters
configThe configuration group that specifies the configuration for the classifier to be created
idxThe forward_index to be passed to the classifier being created
inv_idxThe inverted_index to be passed to the classifier being created
Returns
a unique_ptr to the classifier (of derived type Classifier) created from the given configuration

§ load_classifier()

std::unique_ptr< classifier > meta::classify::load_classifier ( std::istream &  input)

Convenience method for loading a classifier using the factory.

Factory method for loading a classifier.

Parameters
streamThe stream to load the model from
Returns
a unique_ptr to the classifier created from the given stream

This should be specialized if your given classifier requires special construction behavior (e.g., reading parameters).

Parameters
streamThe stream to load the model from
Returns
a unique_ptr to the classifier (of derived type Classifier) created from the given stream

§ register_classifier()

template<class Classifier >
void meta::classify::register_classifier ( )

Registration method for classifiers.

Clients should use this method to register any new classifiers they write that operate on just a forward_index (this should be most).

§ register_multi_index_classifier()

template<class Classifier >
void meta::classify::register_multi_index_classifier ( )

Registration method for multi-index classifiers.

Clients should use this method to register any new classifiers they write that operate on both a forward_index and an inverted_index (this is rare).