ModErn Text Analysis
META Enumerates Textual Applications
Public Member Functions | Public Attributes | Private Attributes | List of all members
meta::index::inverted_index::impl Class Reference

Implementation of an inverted_index. More...

Public Member Functions

 impl (inverted_index *parent, const cpptoml::table &config)
 Constructs an inverted_index impl. More...
 
void tokenize_docs (corpus::corpus &docs, postings_inverter< inverted_index > &inverter, metadata_writer &mdata_writer, uint64_t ram_budget, std::size_t num_threads)
 
void compress (const std::string &filename, uint64_t num_unique_terms)
 Compresses the large postings file.
 
void load_postings ()
 Loads the postings file.
 

Public Attributes

std::unique_ptr< analyzers::analyzeranalyzer_
 The analyzer used to tokenize documents.
 
util::optional< postings_file< inverted_index::primary_key_type, inverted_index::secondary_key_type > > postings_
 
uint64_t total_corpus_terms_
 the total number of term occurrences in the entire corpus
 

Private Attributes

inverted_indexidx_
 Pointer to the inverted_index this is an implementation of.
 

Detailed Description

Implementation of an inverted_index.

Constructor & Destructor Documentation

§ impl()

meta::index::inverted_index::impl::impl ( inverted_index parent,
const cpptoml::table &  config 
)

Constructs an inverted_index impl.

Parameters
parentThe parent of this impl
configThe config group

Member Function Documentation

§ tokenize_docs()

void meta::index::inverted_index::impl::tokenize_docs ( corpus::corpus docs,
postings_inverter< inverted_index > &  inverter,
metadata_writer mdata_writer,
uint64_t  ram_budget,
std::size_t  num_threads 
)
Parameters
docsThe documents to be tokenized
inverterThe postings inverter for this index
mdata_parserThe parser for reading metadata
mdata_writerThe writer for metadata
ram_budgetThe total estimated RAM budget
num_threadsThe number of threads to tokenize and index docs with
Returns
the number of chunks created

The documentation for this class was generated from the following file: