ModErn Text Analysis
META Enumerates Textual Applications
Classes | Namespaces | Functions
unigram_mixture.h File Reference
#include <cassert>
#include <cstdint>
#include <limits>
#include "meta/config.h"
#include "meta/learn/dataset_view.h"
#include "meta/stats/multinomial.h"

Go to the source code of this file.


struct  meta::index::feedback::training_options


 The ModErn Text Analysis toolkit is a suite of natural language processing, classification, information retrieval, data mining, and other applications of text processing.
 Indexes to create efficient representations of data.


stats::multinomial< term_id > meta::index::feedback::maximum_likelihood (const learn::dataset_view &dset)
template<class BackgroundModel >
stats::multinomial< term_id > meta::index::feedback::unigram_mixture (BackgroundModel &&background, const learn::dataset_view &dset, const training_options &options={})
 Learns the feedback model component of a two-component unigram mixture model. More...

Detailed Description

Chase Geigle

All files in META are dual-licensed under the MIT and NCSA licenses. For more details, consult the file and LICENSE.ncsa in the root of the project.

Function Documentation

§ maximum_likelihood()

stats::multinomial<term_id> meta::index::feedback::maximum_likelihood ( const learn::dataset_view dset)
dsetA collection of documents to fit a language model to
the maximum likelihood estimate for the language model

§ unigram_mixture()

template<class BackgroundModel >
stats::multinomial<term_id> meta::index::feedback::unigram_mixture ( BackgroundModel &&  background,
const learn::dataset_view dset,
const training_options options = {} 

Learns the feedback model component of a two-component unigram mixture model.

The BackgroundModel is a unary function that returns the probability of a term. This is used as the first component of the mixture model, which has fixed probability options.lambda of being selected. This function used the EM algorithm to fit the second component language model and returns it.

backgroundThe background language model
dsetThe feedback documents to fit the feedback model to
optionsThe training options for the EM algorithm
the feedback model