ModErn Text Analysis
META Enumerates Textual Applications
Namespaces | Classes | Typedefs | Enumerations | Functions
meta::index Namespace Reference

Indexes to create efficient representations of data. More...

Namespaces

 detail
 Implementation details for indexing and ranking implementations.
 

Classes

class  absolute_discount
 Implements the absolute discounting smoothing method. More...
 
class  cached_index
 Decorator class for wrapping indexes with a cache. More...
 
class  chunk
 Represents a portion of a disk_index's postings file. More...
 
class  chunk_handler
 
class  dirichlet_prior
 Implements Bayesian smoothing with a Dirichlet prior. More...
 
class  disk_index
 Holds generic data structures and functions that inverted_index and forward_index both use. More...
 
class  forward_index
 The forward_index stores information on a corpus by doc_ids. More...
 
class  forward_index_exception
 Basic exception for forward_index interactions. More...
 
class  inverted_index
 The inverted_index class stores information on a corpus indexed by term_ids. More...
 
class  inverted_index_exception
 Basic exception for inverted_index interactions. More...
 
class  ir_eval
 Evaluates lists of ranked documents returned from a search engine; can give stats per-query (e.g. More...
 
class  ir_eval_exception
 Basic exception for ir_eval interactions. More...
 
class  jelinek_mercer
 Implements the Jelinek-Mercer smoothed ranking model. More...
 
class  kl_divergence_prf
 Implements the two-component mixture model for pseudo-relevance feedback in the KL-divergence retrieval model. More...
 
class  language_model_ranker
 Scores documents according to one of three different smoothed language model scoring methods described in "A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval" by Zhai and Lafferty, 2001. More...
 
class  metadata_file
 Used for reading document-level metadata for an index. More...
 
class  metadata_writer
 Writes document metadata into the packed format for the index. More...
 
class  okapi_bm25
 The Okapi BM25 scoring function. More...
 
class  pivoted_length
 The pivoted document length normalization ranking function. More...
 
class  postings_buffer
 Represents the postings list for an in-memory chunk assocated with a specific PrimaryKey (usually a std::string). More...
 
class  postings_data
 A class to represent the per-PrimaryKey data in an index's postings file. More...
 
class  postings_file
 File that stores the postings list for an index on disk. More...
 
class  postings_file_writer
 
class  postings_inverter
 An interface for writing and merging inverted chunks of postings_data for a disk_index. More...
 
class  postings_inverter_exception
 Simple exception class for postings_inverter interactions. More...
 
class  postings_record
 Simple wrapper class to adapt PostingsData to the Record concept for multiway_merge. More...
 
class  postings_stream
 A stream for extracting the postings list for a specific key in a postings file. More...
 
class  rank_correlation
 Evaluates two different lists of ranks for correlation with various measures. More...
 
class  rank_correlation_exception
 Exception thrown when interacting with rank_correlation instances. More...
 
class  ranker
 A ranker scores a query against all the documents in an inverted index, returning a list of documents sorted by relevance. More...
 
struct  ranker_context
 Stores a list of postings_stream and other relevant information for performing document-at-a-time ranking. More...
 
class  ranker_exception
 Exception class for ranker interactions. More...
 
class  ranker_factory
 Factory that is responsible for creating rankers from configuration files. More...
 
class  ranker_loader
 Factory that is responsible for loading rankers from streams. More...
 
class  ranking_function
 
class  rocchio
 Implements the Rocchio algorithm for pseudo-relevance feedback. More...
 
struct  score_data
 A score_data object contains information needed to evaluate a ranking function. More...
 
struct  search_result
 A simple struct to hold scored document data. More...
 
class  string_list
 A class designed for reading large lists of strings that have been persisted to disk. More...
 
class  string_list_writer
 A class for writing large lists of strings to disk with an associated index file for fast random access. More...
 
class  vocabulary_map
 A read-only view of a B+-tree-like structure that stores the vocabulary for an index. More...
 
class  vocabulary_map_writer
 A class that writes the B+-tree-like data structure used for storing the term id mapping in an index. More...
 
class  vocabulary_map_writer_exception
 An exception that can be thrown during the building of the tree. More...
 

Typedefs

template<class PostingsData >
using chunk_reader = util::destructive_chunk_iterator< postings_record< PostingsData >>
 Represents an on-disk chunk to be merged with multi-way merge sort. More...
 
using dblru_inverted_index = cached_index< inverted_index, caching::default_dblru_cache >
 Inverted index using default DBLRU cache.
 
using splay_inverted_index = cached_index< inverted_index, caching::splay_cache >
 Inverted index using splay cache.
 
using memory_forward_index = cached_index< forward_index, caching::no_evict_cache >
 In-memory forward index.
 
using dblru_forward_index = cached_index< forward_index, caching::default_dblru_cache >
 Forward index using default DBLRU cache.
 
using splay_forward_index = cached_index< forward_index, caching::splay_cache >
 Forward index using splay cache.
 

Enumerations

enum  index_file {
  DOC_LABELS, LABEL_IDS_MAPPING, POSTINGS, POSTINGS_INDEX,
  TERM_IDS_MAPPING, TERM_IDS_MAPPING_INVERSE, METADATA_DB, METADATA_INDEX
}
 Collection of all the files that comprise a disk_index.
 

Functions

template<class PostingsData , class ForwardIterator >
uint64_t multiway_merge (std::ostream &outstream, ForwardIterator begin, ForwardIterator end)
 Performs a multi-way merge sort of all of the provided chunks, writing to the provided output stream. More...
 
template<class Index , class... Args>
std::shared_ptr< Index > make_index (const cpptoml::table &config, corpus::corpus &docs, Args &&... args)
 Factory method for creating indexes. More...
 
template<class Index , class... Args>
std::shared_ptr< Index > make_index (const cpptoml::table &config, Args &&... args)
 Helper for make_index that creates a corpus from the global config file. More...
 
template<class Index , template< class, class > class Cache, class... Args>
std::shared_ptr< cached_index< Index, Cache > > make_index (const cpptoml::table &config, Args &&... args)
 Factory method for creating indexes that are cached. More...
 
template<class HashAlgorithm , class PrimaryKey , class SecondaryKey >
void hash_append (HashAlgorithm &h, const postings_buffer< PrimaryKey, SecondaryKey > &pb)
 
template<class PrimaryKey , class SecondaryKey , class FeatureValue >
bool operator== (const postings_data< PrimaryKey, SecondaryKey, FeatureValue > &lhs, const postings_data< PrimaryKey, SecondaryKey, FeatureValue > &rhs)
 
template<>
std::unique_ptr< rankermake_ranker< absolute_discount > (const cpptoml::table &)
 Specialization of the factory method used to create absolute_discount rankers.
 
template<>
std::unique_ptr< rankermake_ranker< dirichlet_prior > (const cpptoml::table &)
 Specialization of the factory method used to create dirichlet_prior rankers.
 
template<>
std::unique_ptr< rankermake_ranker< jelinek_mercer > (const cpptoml::table &)
 Specialization of the factory method used to create jelinek_mercer rankers.
 
template<>
std::unique_ptr< rankermake_ranker< kl_divergence_prf > (const cpptoml::table &global, const cpptoml::table &local)
 Specialization of the factory method used to create kl_divergence_prf rankers.
 
template<>
std::unique_ptr< rankermake_ranker< okapi_bm25 > (const cpptoml::table &)
 Specialization of the factory method used to create okapi_bm25 rankers.
 
template<>
std::unique_ptr< rankermake_ranker< pivoted_length > (const cpptoml::table &)
 Specialization of the factory method used to create pivoted_length rankers.
 
std::unique_ptr< rankermake_ranker (const cpptoml::table &)
 Convenience method for creating a ranker using the factory. More...
 
std::unique_ptr< rankermake_ranker (const cpptoml::table &global, const cpptoml::table &local)
 Convenience method for creating a ranker using the factory. More...
 
std::unique_ptr< language_model_rankermake_lm_ranker (const cpptoml::table &)
 Convenience method for creating a language_model_ranker using the factory.
 
std::unique_ptr< language_model_rankermake_lm_ranker (const cpptoml::table &global, const cpptoml::table &local)
 Convenience method for creating a language_model_ranker using the factory. More...
 
std::unique_ptr< rankerload_ranker (std::istream &)
 Convenience method for loading a ranker using the factory. More...
 
std::unique_ptr< language_model_rankerload_lm_ranker (std::istream &)
 Convenience method for loading a language_model_ranker using the factory.
 
template<class Ranker >
void register_ranker ()
 Registration method for rankers. More...
 
template<>
std::unique_ptr< rankermake_ranker< rocchio > (const cpptoml::table &global, const cpptoml::table &local)
 Specialization of the factory method used to create rocchio rankers.
 

Detailed Description

Indexes to create efficient representations of data.

Typedef Documentation

§ chunk_reader

template<class PostingsData >
using meta::index::chunk_reader = typedef util::destructive_chunk_iterator<postings_record<PostingsData>>

Represents an on-disk chunk to be merged with multi-way merge sort.

Each chunk_reader stores the file it's reading from, the total bytes needed to be read, and the current number of bytes read, as well as buffers in one postings_record. When it reaches the end its file, the file will be destroyed.

Function Documentation

§ multiway_merge()

template<class PostingsData , class ForwardIterator >
uint64_t meta::index::multiway_merge ( std::ostream &  outstream,
ForwardIterator  begin,
ForwardIterator  end 
)

Performs a multi-way merge sort of all of the provided chunks, writing to the provided output stream.

Currently, this function will attempt to open std::distance(begin, end) number of files and merge them all simultaneously but this could change in future implementations.

Parameters
outstreamWhere the merged chunks should be written
beginAn iterator to the beginning of the sequence containing the chunk paths
endAn iterator to the end of the sequence containing the chunk paths
Returns
the total number of unique primary keys found during the merging

§ make_index() [1/3]

template<class Index , class... Args>
std::shared_ptr<Index> meta::index::make_index ( const cpptoml::table &  config,
corpus::corpus docs,
Args &&...  args 
)

Factory method for creating indexes.

forward_index is a friend of the factory method used to create it.

Usage:

auto idx = index::make_index<derived_index_type>(config);
Parameters
configThe configuration to be used to build the index
corpusThe collection of documents to index
argsany additional arguments to forward to the constructor for the chosen index type (usually none)
Returns
A properly initialized index

§ make_index() [2/3]

template<class Index , class... Args>
std::shared_ptr<Index> meta::index::make_index ( const cpptoml::table &  config,
Args &&...  args 
)

Helper for make_index that creates a corpus from the global config file.

inverted_index is a friend of the factory method used to create cached versions of it.

forward_index is a friend of the factory method used to create cached versions of it.

forward_index is a friend of the factory method used to create it.

§ make_index() [3/3]

template<class Index , template< class, class > class Cache, class... Args>
std::shared_ptr<cached_index<Index, Cache> > meta::index::make_index ( const cpptoml::table &  config,
Args &&...  args 
)

Factory method for creating indexes that are cached.

inverted_index is a friend of the factory method used to create cached versions of it.

forward_index is a friend of the factory method used to create cached versions of it.

forward_index is a friend of the factory method used to create it.

Usage:

auto idx =
index::make_index<dervied_index_type,
cache_type>(config_path, other, options);

Other options will be forwarded to the constructor for the chosen cache class.

Parameters
config_filethe path to the configuration file to be used to build the index.
argsany additional arguments to forward to the constructor for the cache class chosen
Returns
A properly initialized, and automatically cached, index.

§ operator==()

template<class PrimaryKey , class SecondaryKey , class FeatureValue >
bool meta::index::operator== ( const postings_data< PrimaryKey, SecondaryKey, FeatureValue > &  lhs,
const postings_data< PrimaryKey, SecondaryKey, FeatureValue > &  rhs 
)
Parameters
lhsThe first postings_data
rhsThe postings_data to compare with
Returns
whether this postings_data has the same PrimaryKey as the paramter

§ make_ranker() [1/2]

std::unique_ptr< ranker > meta::index::make_ranker ( const cpptoml::table &  config)

Convenience method for creating a ranker using the factory.

Factory method for creating a ranker.

This should be specialized if your given ranker requires special construction behavior (e.g., reading parameters) that requires only the ranker-specific configuration (this will be the case almost all of the time).

§ make_ranker() [2/2]

std::unique_ptr< ranker > meta::index::make_ranker ( const cpptoml::table &  global,
const cpptoml::table &  local 
)

Convenience method for creating a ranker using the factory.

Factory method for creating a ranker.

Parameters
globalThe global configuration group (containing the index path)
localThe ranker configuration group itself

This should be specialized if your given ranker requires special construction behavior that includes reading parameter values from the global configuration as well as the ranker-specific configuration.

§ make_lm_ranker()

std::unique_ptr< language_model_ranker > meta::index::make_lm_ranker ( const cpptoml::table &  global,
const cpptoml::table &  local 
)

Convenience method for creating a language_model_ranker using the factory.

Parameters
globalThe global configuration group (containing the index path)
localThe ranker configuration group itself

§ load_ranker()

std::unique_ptr< ranker > meta::index::load_ranker ( std::istream &  in)

Convenience method for loading a ranker using the factory.

Factory method for loading a ranker.

This should be specialized if your given ranker requires special construction behavior. Otherwise, it is assumed that the ranker has a constructor from a std::istream&.

§ register_ranker()

template<class Ranker >
void meta::index::register_ranker ( )

Registration method for rankers.

Clients should use this method to register any new rankers they write.