ModErn Text Analysis META Enumerates Textual Applications
unigram_mixture.h File Reference
#include <cassert>
#include <cstdint>
#include <limits>
#include "meta/config.h"
#include "meta/learn/dataset_view.h"
#include "meta/stats/multinomial.h"

Go to the source code of this file.

## Classes

struct  meta::index::feedback::training_options

## Namespaces

meta
The ModErn Text Analysis toolkit is a suite of natural language processing, classification, information retrieval, data mining, and other applications of text processing.

meta::index
Indexes to create efficient representations of data.

## Functions

stats::multinomial< term_id > meta::index::feedback::maximum_likelihood (const learn::dataset_view &dset)

template<class BackgroundModel >
stats::multinomial< term_id > meta::index::feedback::unigram_mixture (BackgroundModel &&background, const learn::dataset_view &dset, const training_options &options={})
Learns the feedback model component of a two-component unigram mixture model. More...

## Detailed Description

All files in META are dual-licensed under the MIT and NCSA licenses. For more details, consult the file LICENSE.mit and LICENSE.ncsa in the root of the project.

## § maximum_likelihood()

 stats::multinomial meta::index::feedback::maximum_likelihood ( const learn::dataset_view & dset )
Parameters
 dset A collection of documents to fit a language model to
Returns
the maximum likelihood estimate for the language model

## § unigram_mixture()

template<class BackgroundModel >
 stats::multinomial meta::index::feedback::unigram_mixture ( BackgroundModel && background, const learn::dataset_view & dset, const training_options & options = {} )

Learns the feedback model component of a two-component unigram mixture model.

The BackgroundModel is a unary function that returns the probability of a term. This is used as the first component of the mixture model, which has fixed probability options.lambda of being selected. This function used the EM algorithm to fit the second component language model and returns it.

Parameters
 background The background language model dset The feedback documents to fit the feedback model to options The training options for the EM algorithm
Returns
the feedback model