ModErn Text Analysis
META Enumerates Textual Applications
Classes | Typedefs | Functions
meta::lm Namespace Reference

Contains implementations of statistical language models. More...

Classes

class  diff
 Uses a language model to transform sentences given a reference text collection. More...
 
class  diff_exception
 Exception class for diff operations. More...
 
class  language_model
 A very simple language model class that reads existing language model data from a .arpa file. More...
 
class  language_model_exception
 
struct  lm_node
 Simple struct to keep track of probabilities and backoff values that is packed into a uint64_t for storage. More...
 
struct  lm_state
 
class  mph_language_model
 An ngram language model class based on a collection of minimal perfect hash functions and dense value arrays. More...
 
struct  prob_backoff
 
class  sentence
 A sequence of tokens that represents a sentence. More...
 
class  sentence_exception
 Exception for sentence operations. More...
 
class  static_probe_map
 Represents language model probabilities as string -> (prob, backoff) values. More...
 
class  static_probe_map_exception
 Basic exception for static_probe_map interactions. More...
 
class  token_list
 

Typedefs

template<class KeyType , class ValueType = prob_backoff<>, class FingerPrint = uint32_t>
using ngram_map_builder = hashing::perfect_hash_map_builder< KeyType, ValueType, FingerPrint >
 
template<class KeyType , class ValueType = prob_backoff<>, class FingerPrint = uint32_t>
using ngram_map = hashing::perfect_hash_map< KeyType, ValueType, FingerPrint >
 

Functions

template<class OutputStream , class Prob , class Backoff >
uint64_t packed_write (OutputStream &os, const lm::prob_backoff< Prob, Backoff > &pb)
 
template<class InputStream , class Prob , class Backoff >
uint64_t packed_read (InputStream &is, lm::prob_backoff< Prob, Backoff > &pb)
 
template<class CountHandler , class NGramHandler >
void read_arpa (std::istream &infile, CountHandler &&count_handler, NGramHandler &&ngram_handler)
 Parses an ARPA formatted language model file. More...
 
bool operator== (const sentence &lhs, const sentence &rhs)
 
bool operator!= (const sentence &lhs, const sentence &rhs)
 
template<class HashAlgorithm >
void hash_append (HashAlgorithm &h, const sentence &s)
 
bool operator== (const token_list &lhs, const token_list &rhs)
 
bool operator!= (const token_list &lhs, const token_list &rhs)
 
template<class HashAlgorithm >
void hash_append (HashAlgorithm &h, const token_list &list)
 

Detailed Description

Contains implementations of statistical language models.

Function Documentation

§ read_arpa()

template<class CountHandler , class NGramHandler >
void meta::lm::read_arpa ( std::istream &  infile,
CountHandler &&  count_handler,
NGramHandler &&  ngram_handler 
)

Parses an ARPA formatted language model file.

Parameters
count_handlerA callback function to be invoked when reading the ngram count information from the file, of the form (order, count), where order is 0-indexed.
ngram_handlerA callback function to be invoked when reading the ngram statistics themselves, of the form (order, ngram, prob, backoff), where order is 0-indexed.