ModErn Text Analysis
META Enumerates Textual Applications
Classes | Public Types | Public Member Functions | Protected Member Functions | Private Member Functions | Private Attributes | Friends | List of all members
meta::index::inverted_index Class Reference

The inverted_index class stores information on a corpus indexed by term_ids. More...

#include <inverted_index.h>

Inheritance diagram for meta::index::inverted_index:
meta::index::disk_index

Classes

class  impl
 Implementation of an inverted_index. More...
 

Public Types

using primary_key_type = term_id
 
using secondary_key_type = doc_id
 
using postings_data_type = postings_data< term_id, doc_id, uint64_t >
 
using index_pdata_type = postings_data< std::string, doc_id, uint64_t >
 
using exception = inverted_index_exception
 

Public Member Functions

 inverted_index (inverted_index &&)
 Move constructs a inverted_index.
 
inverted_indexoperator= (inverted_index &&)
 Move assigns a inverted_index.
 
 inverted_index (const inverted_index &)=delete
 inverted_index may not be copy-constructed.
 
inverted_indexoperator= (const inverted_index &)=delete
 inverted_index may not be copy-assigned.
 
virtual ~inverted_index ()
 Default destructor.
 
analyzers::feature_map< uint64_t > tokenize (const corpus::document &doc)
 
virtual std::shared_ptr< postings_data_typesearch_primary (term_id t_id) const
 
util::optional< postings_stream< doc_id > > stream_for (term_id t_id) const
 
uint64_t doc_freq (term_id t_id) const
 
uint64_t term_freq (term_id t_id, doc_id d_id) const
 
uint64_t total_corpus_terms ()
 
uint64_t total_num_occurences (term_id t_id) const
 
float avg_doc_length ()
 
- Public Member Functions inherited from meta::index::disk_index
virtual ~disk_index ()=default
 Default destructor.
 
std::string index_name () const
 
uint64_t num_docs () const
 
std::string doc_name (doc_id d_id) const
 
std::string doc_path (doc_id d_id) const
 
std::vector< doc_id > docs () const
 
uint64_t doc_size (doc_id d_id) const
 
class_label label (doc_id d_id) const
 
label_id lbl_id (doc_id d_id) const
 
label_id id (class_label label) const
 
class_label class_label_from_id (label_id l_id) const
 
uint64_t num_labels () const
 
std::vector< class_label > class_labels () const
 
corpus::metadata metadata (doc_id d_id) const
 
template<class T >
util::optional< T > metadata (doc_id d_id, const std::string &name) const
 
virtual uint64_t unique_terms (doc_id d_id) const
 
virtual uint64_t unique_terms () const
 
term_id get_term_id (const std::string &term)
 
std::string term_text (term_id t_id) const
 
 disk_index (disk_index &&)=default
 Move constructs a disk_index.
 
disk_indexoperator= (disk_index &&)=default
 Move assigns a disk_index.
 

Protected Member Functions

 inverted_index (const cpptoml::table &config)
 
- Protected Member Functions inherited from meta::index::disk_index
 disk_index (const cpptoml::table &config, const std::string &name)
 Constructor. More...
 
 disk_index (const disk_index &)=delete
 disk_index may not be copy-constructed.
 
disk_indexoperator= (const disk_index &)=delete
 disk_index may not be copy-assigned.
 

Private Member Functions

void load_index ()
 Loads an inverted index from its filesystem representation.
 
void create_index (const cpptoml::table &config, corpus::corpus &docs)
 Initializes the inverted index; it is called by the make_index factory function. More...
 
bool valid () const
 

Private Attributes

util::pimpl< implinv_impl_
 Implementation of this index.
 

Friends

template<class Index , class... Args>
std::shared_ptr< Index > make_index (const cpptoml::table &, Args &&...)
 inverted_index is a friend of the factory method used to create it.
 
template<class Index , class... Args>
std::shared_ptr< Index > make_index (const cpptoml::table &, corpus::corpus &docs, Args &&...)
 inverted_index is a friend of the factory method used to create it.
 
template<class Index , template< class, class > class Cache, class... Args>
std::shared_ptr< cached_index< Index, Cache > > make_index (const cpptoml::table &config, Args &&... args)
 inverted_index is a friend of the factory method used to create cached versions of it. More...
 

Additional Inherited Members

- Protected Attributes inherited from meta::index::disk_index
util::pimpl< disk_index_implimpl_
 Implementation of this disk_index.
 

Detailed Description

The inverted_index class stores information on a corpus indexed by term_ids.

Each term_id key is associated with a per-document frequency (by doc_id).

It is assumed all this information will not fit in memory, so a large postings file containing the (term_id -> each doc_id) information is saved on disk. A lexicon (or "dictionary") contains pointers into the large postings file. It is assumed that the lexicon will fit in memory.

Constructor & Destructor Documentation

§ inverted_index()

meta::index::inverted_index::inverted_index ( const cpptoml::table &  config)
protected
Parameters
configThe table that specifies how to create the index.

Member Function Documentation

§ tokenize()

analyzers::feature_map< uint64_t > meta::index::inverted_index::tokenize ( const corpus::document doc)
Parameters
docThe document to tokenize
Returns
the analyzed version of the document

§ search_primary()

auto meta::index::inverted_index::search_primary ( term_id  t_id) const
virtual
Parameters
t_idThe term_id to search for
Returns
the postings data for a given term_id

§ stream_for()

util::optional< postings_stream< doc_id > > meta::index::inverted_index::stream_for ( term_id  t_id) const
Parameters
t_idThe trem_id to search for
Returns
the postings stream for a given term_id

§ doc_freq()

uint64_t meta::index::inverted_index::doc_freq ( term_id  t_id) const
Parameters
t_idThe term to search for
Returns
the document frequency of a term (number of documents it appears in)

§ term_freq()

uint64_t meta::index::inverted_index::term_freq ( term_id  t_id,
doc_id  d_id 
) const
Parameters
t_idThe term_id to search for
d_idThe doc_id to search for

§ total_corpus_terms()

uint64_t meta::index::inverted_index::total_corpus_terms ( )
Returns
the total number of terms in this index

§ total_num_occurences()

uint64_t meta::index::inverted_index::total_num_occurences ( term_id  t_id) const
Parameters
t_idThe specified term
Returns
the number of times the given term appears in the corpus

§ avg_doc_length()

float meta::index::inverted_index::avg_doc_length ( )
Returns
the average document length in this index

§ create_index()

void meta::index::inverted_index::create_index ( const cpptoml::table &  config,
corpus::corpus docs 
)
private

Initializes the inverted index; it is called by the make_index factory function.

Parameters
configThe configuration to be used
docsA corpus object of documents to index

§ valid()

bool meta::index::inverted_index::valid ( ) const
private
Returns
whether this index contains all necessary files

Friends And Related Function Documentation

§ make_index

template<class Index , template< class, class > class Cache, class... Args>
std::shared_ptr<cached_index<Index, Cache> > make_index ( const cpptoml::table &  config,
Args &&...  args 
)
friend

inverted_index is a friend of the factory method used to create cached versions of it.

forward_index is a friend of the factory method used to create cached versions of it.

forward_index is a friend of the factory method used to create it.

forward_index is a friend of the factory method used to create cached versions of it.

forward_index is a friend of the factory method used to create it.

Usage:

auto idx =
index::make_index<dervied_index_type,
cache_type>(config_path, other, options);

Other options will be forwarded to the constructor for the chosen cache class.

Parameters
config_filethe path to the configuration file to be used to build the index.
argsany additional arguments to forward to the constructor for the cache class chosen
Returns
A properly initialized, and automatically cached, index.

The documentation for this class was generated from the following files: