ModErn Text Analysis
META Enumerates Textual Applications

An LDA topic model implemented using the Approximate Distributed LDA algorithm. More...
#include <parallel_lda_gibbs.h>
Public Member Functions  
virtual  ~parallel_lda_gibbs ()=default 
Destructor: virtual for potential subclassing.  
Public Member Functions inherited from meta::topics::lda_gibbs  
lda_gibbs (std::shared_ptr< index::forward_index > idx, std::size_t num_topics, double alpha, double beta)  
Constructs the lda model over the given documents, with the given number of topics, and hyperparameters \(\alpha\) and \(\beta\) for the priors on \(\phi\) (topic distributions) and \(\theta\) (topic proportions), respectively. More...  
virtual  ~lda_gibbs ()=default 
Destructor: virtual for potential subclassing.  
virtual void  run (uint64_t num_iters, double convergence=1e6) override 
Runs the sampler for a maximum number of iterations, or until the given convergence criterion is met. More...  
virtual double  compute_term_topic_probability (term_id term, topic_id topic) const override 
virtual double  compute_doc_topic_probability (doc_id doc, topic_id topic) const override 
Public Member Functions inherited from meta::topics::lda_model  
lda_model (std::shared_ptr< index::forward_index > idx, std::size_t num_topics)  
Constructs an lda_model over the given set of documents and with a fixed number of topics. More...  
virtual  ~lda_model ()=default 
Destructor. More...  
void  save_doc_topic_distributions (const std::string &filename) const 
Saves the topic proportions \(\theta_d\) for each document to the given file. More...  
void  save_topic_term_distributions (const std::string &filename) const 
Saves the term distributions \(\phi_j\) for each topic to the given file. More...  
void  save (const std::string &prefix) const 
Saves the current model to a set of files beginning with prefix: prefix.phi, prefix.theta, and prefix.terms. More...  
uint64_t  num_topics () const 
Protected Member Functions  
virtual void  initialize () override 
Initializes the first set of topic assignments for inference. More...  
virtual void  perform_iteration (uint64_t iter, bool init=false) override 
Performs a sampling iteration of the ADLDA algorithm. More...  
virtual void  decrease_counts (topic_id topic, term_id term, doc_id doc) override 
Decreases all counts associated with the given topic, term, and document by one. More...  
virtual void  increase_counts (topic_id topic, term_id term, doc_id doc) override 
Increases all counts associated with the given topic, term, and document by one. More...  
virtual double  compute_sampling_weight (term_id term, doc_id doc, topic_id topic) const override 
Computes a weight proportional to \(P(z_i = j  w, \boldsymbol{z})\). More...  
Protected Member Functions inherited from meta::topics::lda_gibbs  
topic_id  sample_topic (term_id term, doc_id doc) 
Samples a topic from the full conditional distribution \(P(z_i = j  w, \boldsymbol{z})\). More...  
double  corpus_log_likelihood () const 
lda_gibbs &  operator= (const lda_gibbs &)=delete 
lda_gibbs cannot be copy assigned.  
lda_gibbs (const lda_gibbs &other)=delete  
lda_gibbs cannot be copy constructed.  
Protected Member Functions inherited from meta::topics::lda_model  
lda_model &  operator= (const lda_model &)=delete 
lda_models cannot be copy assigned.  
lda_model (const lda_model &)=delete  
lda_models cannot be copy constructed.  
Protected Attributes  
parallel::thread_pool  pool_ 
The thread pool used for parallelization.  
std::unordered_map< std::thread::id, std::vector< stats::multinomial< term_id > > >  phi_diffs_ 
Stores the difference in topic_term counts on a perthread basis for use in the reduction step. More...  
Protected Attributes inherited from meta::topics::lda_gibbs  
std::vector< std::vector< topic_id > >  doc_word_topic_ 
The topic assignment for every word in every document. More...  
std::vector< stats::multinomial< term_id > >  phi_ 
The word distributions for each topic, \(\phi_t\).  
std::vector< stats::multinomial< topic_id > >  theta_ 
The topic distributions for each document, \(\theta_d\).  
std::mt19937_64  rng_ 
The random number generator for the sampler.  
Protected Attributes inherited from meta::topics::lda_model  
std::shared_ptr< index::forward_index >  idx_ 
The index containing the documents for the model.  
std::size_t  num_topics_ 
The number of topics.  
std::size_t  num_words_ 
The number of total unique words.  
An LDA topic model implemented using the Approximate Distributed LDA algorithm.
Based on the algorithm detailed by David Newman et. al.

overrideprotectedvirtual 
Initializes the first set of topic assignments for inference.
Employs an online application of the sampler where counts are only considered for the words observed so far through the loop.
Reimplemented from meta::topics::lda_gibbs.

overrideprotectedvirtual 
Performs a sampling iteration of the ADLDA algorithm.
This consists of splitting up the sampling of (document, word) topic assignments across threads, keeping for each thread a difference in counts for the potentially shared topic counts. Once the sampling has finished, the counts are reduced down (serially) before the iteration is completed.
iter  The current iteration number 
init  Whether or not this iteration should use the online method for initializing the sampler 
Reimplemented from meta::topics::lda_gibbs.

overrideprotectedvirtual 
Decreases all counts associated with the given topic, term, and document by one.
topic  The topic in question 
term  The term in question 
doc  The document in question 
Reimplemented from meta::topics::lda_gibbs.

overrideprotectedvirtual 
Increases all counts associated with the given topic, term, and document by one.
topic  The topic in question 
term  The term in question 
doc  The document in question 
Reimplemented from meta::topics::lda_gibbs.

overrideprotectedvirtual 
Computes a weight proportional to \(P(z_i = j  w, \boldsymbol{z})\).
term  The current word we are sampling for 
doc  The document in which the term resides 
topic  The topic \(j\) we want to compute the probability for 
Reimplemented from meta::topics::lda_gibbs.

protected 
Stores the difference in topic_term counts on a perthread basis for use in the reduction step.
Indexed as [thread_id][topic]