ModErn Text Analysis
META Enumerates Textual Applications
Classes | Namespaces | Functions
unigram_mixture.h File Reference
#include <cassert>
#include <cstdint>
#include <limits>
#include "meta/config.h"
#include "meta/learn/dataset_view.h"
#include "meta/stats/multinomial.h"

Go to the source code of this file.

Classes

struct  meta::index::feedback::training_options
 

Namespaces

 meta
 The ModErn Text Analysis toolkit is a suite of natural language processing, classification, information retrieval, data mining, and other applications of text processing.
 
 meta::index
 Indexes to create efficient representations of data.
 

Functions

stats::multinomial< term_id > meta::index::feedback::maximum_likelihood (const learn::dataset_view &dset)
 
template<class BackgroundModel >
stats::multinomial< term_id > meta::index::feedback::unigram_mixture (BackgroundModel &&background, const learn::dataset_view &dset, const training_options &options={})
 Learns the feedback model component of a two-component unigram mixture model. More...
 

Detailed Description

Author
Chase Geigle

All files in META are dual-licensed under the MIT and NCSA licenses. For more details, consult the file LICENSE.mit and LICENSE.ncsa in the root of the project.

Function Documentation

§ maximum_likelihood()

stats::multinomial<term_id> meta::index::feedback::maximum_likelihood ( const learn::dataset_view dset)
Parameters
dsetA collection of documents to fit a language model to
Returns
the maximum likelihood estimate for the language model

§ unigram_mixture()

template<class BackgroundModel >
stats::multinomial<term_id> meta::index::feedback::unigram_mixture ( BackgroundModel &&  background,
const learn::dataset_view dset,
const training_options options = {} 
)

Learns the feedback model component of a two-component unigram mixture model.

The BackgroundModel is a unary function that returns the probability of a term. This is used as the first component of the mixture model, which has fixed probability options.lambda of being selected. This function used the EM algorithm to fit the second component language model and returns it.

Parameters
backgroundThe background language model
dsetThe feedback documents to fit the feedback model to
optionsThe training options for the EM algorithm
Returns
the feedback model