ModErn Text Analysis
META Enumerates Textual Applications
Public Member Functions | Public Attributes | Private Attributes | List of all members
meta::index::forward_index::impl Class Reference

Implementation of a forward_index. More...

Public Member Functions

 impl (forward_index *idx, const cpptoml::table &config)
 Constructs an implementation based on a forward_index.
 
void tokenize_docs (corpus::corpus &corpus, metadata_writer &mdata_writer, uint64_t ram_budget, std::size_t num_threads)
 Tokenizes the documents in the corpus in parallel, yielding num_threads number of forward_index chunks that then need to be merged.
 
void merge_chunks (size_t num_chunks, uint64_t num_docs, hashing::probe_map< std::string, term_id > vocab)
 Merges together num_chunks number of intermediate chunks, using the given vocabulary to do the renumbering. More...
 
void create_libsvm_postings (corpus::corpus &docs)
 
void uninvert (const inverted_index &inv_idx, uint64_t ram_budget)
 
void create_uninverted_metadata (const std::string &name)
 
bool is_libsvm_analyzer (const cpptoml::table &config) const
 
void compress (const std::string &filename, uint64_t num_docs)
 Compresses the postings file created by uninverting. More...
 
void load_postings ()
 Loads the postings file. More...
 

Public Attributes

std::unique_ptr< analyzers::analyzeranalyzer_
 The analyzer used to tokenize documents (nullptr if libsvm).
 
uint64_t total_unique_terms_
 the total number of unique terms if term_id_mapping_ is unused
 
util::optional< postings_file< forward_index::primary_key_type, forward_index::secondary_key_type, double > > postings_
 the postings file
 

Private Attributes

forward_indexidx_
 Pointer to the forward_index this is an implementation of.
 

Detailed Description

Implementation of a forward_index.

Member Function Documentation

§ merge_chunks()

void meta::index::forward_index::impl::merge_chunks ( size_t  num_chunks,
uint64_t  num_docs,
hashing::probe_map< std::string, term_id >  vocab 
)

Merges together num_chunks number of intermediate chunks, using the given vocabulary to do the renumbering.

The vocabulary mapping will assign ids in insertion order, but we will want our ids in lexicographic order for vocabulary_map to work, so this function will sort the vocabulary and perform a re-numbering of the old ids.

§ create_libsvm_postings()

void meta::index::forward_index::impl::create_libsvm_postings ( corpus::corpus docs)
Parameters
docsThe documents to index (that are in libsvm format)

§ uninvert()

void meta::index::forward_index::impl::uninvert ( const inverted_index inv_idx,
uint64_t  ram_budget 
)
Parameters
inv_idxThe inverted index to uninvert
ram_budgetThe estimated allowed size of an in-memory chunk

§ create_uninverted_metadata()

void meta::index::forward_index::impl::create_uninverted_metadata ( const std::string &  name)
Parameters
nameThe name of the inverted index to copy data from

§ is_libsvm_analyzer()

bool meta::index::forward_index::impl::is_libsvm_analyzer ( const cpptoml::table &  config) const
Parameters
configthe configuration settings for this index
Returns
whether this index will be based off of a single libsvm-formatted corpus file

§ compress()

void meta::index::forward_index::impl::compress ( const std::string &  filename,
uint64_t  num_docs 
)

Compresses the postings file created by uninverting.

Parameters
filenameThe file to compress
num_docsThe number of documents in that file

§ load_postings()

void meta::index::forward_index::impl::load_postings ( )

Loads the postings file.

Parameters
filenameThe path to the postings file to load

The documentation for this class was generated from the following file: