ModErn Text Analysis
META Enumerates Textual Applications
Classes | Public Types | Public Member Functions | Protected Member Functions | Private Member Functions | Private Attributes | Friends | List of all members
meta::index::forward_index Class Reference

The forward_index stores information on a corpus by doc_ids. More...

#include <forward_index.h>

Inheritance diagram for meta::index::forward_index:
meta::index::disk_index

Classes

class  impl
 Implementation of a forward_index. More...
 

Public Types

using primary_key_type = doc_id
 
using secondary_key_type = term_id
 
using postings_data_type = postings_data< doc_id, term_id, double >
 
using inverted_pdata_type = postings_data< term_id, doc_id, uint64_t >
 
using index_pdata_type = postings_data< doc_id, term_id, uint64_t >
 
using exception = forward_index_exception
 

Public Member Functions

 forward_index (forward_index &&)
 Move constructs a forward_index.
 
forward_indexoperator= (forward_index &&)
 Move assigns a forward_index. More...
 
 forward_index (const forward_index &)=delete
 forward_index may not be copy-constructed.
 
forward_indexoperator= (const forward_index &)=delete
 forward_index may not be copy-assigned.
 
virtual ~forward_index ()
 Default destructor.
 
virtual std::shared_ptr< postings_data_typesearch_primary (doc_id d_id) const
 
util::optional< postings_stream< term_id, double > > stream_for (doc_id d_id) const
 
std::string liblinear_data (doc_id d_id) const
 
virtual uint64_t unique_terms () const override
 
learn::feature_vector tokenize (const corpus::document &doc)
 
- Public Member Functions inherited from meta::index::disk_index
virtual ~disk_index ()=default
 Default destructor.
 
std::string index_name () const
 
uint64_t num_docs () const
 
std::string doc_name (doc_id d_id) const
 
std::string doc_path (doc_id d_id) const
 
std::vector< doc_id > docs () const
 
uint64_t doc_size (doc_id d_id) const
 
class_label label (doc_id d_id) const
 
label_id lbl_id (doc_id d_id) const
 
label_id id (class_label label) const
 
class_label class_label_from_id (label_id l_id) const
 
uint64_t num_labels () const
 
std::vector< class_label > class_labels () const
 
corpus::metadata metadata (doc_id d_id) const
 
template<class T >
util::optional< T > metadata (doc_id d_id, const std::string &name) const
 
virtual uint64_t unique_terms (doc_id d_id) const
 
term_id get_term_id (const std::string &term)
 
std::string term_text (term_id t_id) const
 
 disk_index (disk_index &&)=default
 Move constructs a disk_index.
 
disk_indexoperator= (disk_index &&)=default
 Move assigns a disk_index.
 

Protected Member Functions

 forward_index (const cpptoml::table &config)
 
- Protected Member Functions inherited from meta::index::disk_index
 disk_index (const cpptoml::table &config, const std::string &name)
 Constructor. More...
 
 disk_index (const disk_index &)=delete
 disk_index may not be copy-constructed.
 
disk_indexoperator= (const disk_index &)=delete
 disk_index may not be copy-assigned.
 

Private Member Functions

void load_index ()
 Loads a forward index from its filesystem representation.
 
void create_index (const cpptoml::table &config, corpus::corpus &docs)
 Initializes the forward index; it is called by the make_index factory function. More...
 
bool valid () const
 

Private Attributes

util::pimpl< implfwd_impl_
 Implementation of this index.
 

Friends

template<class Index , class... Args>
std::shared_ptr< Index > make_index (const cpptoml::table &config, Args &&... args)
 forward_index is a friend of the factory method used to create it. More...
 
template<class Index , class... Args>
std::shared_ptr< Index > make_index (const cpptoml::table &config, corpus::corpus &docs, Args &&... args)
 forward_index is a friend of the factory method used to create it. More...
 
template<class Index , template< class, class > class Cache, class... Args>
std::shared_ptr< cached_index< Index, Cache > > make_index (const cpptoml::table &config_file, Args &&... args)
 forward_index is a friend of the factory method used to create cached versions of it. More...
 

Additional Inherited Members

- Protected Attributes inherited from meta::index::disk_index
util::pimpl< disk_index_implimpl_
 Implementation of this disk_index.
 

Detailed Description

The forward_index stores information on a corpus by doc_ids.

Each doc_id key is associated with a distribution of term_ids or term "counts" that occur in that particular document.

Constructor & Destructor Documentation

§ forward_index()

meta::index::forward_index::forward_index ( const cpptoml::table &  config)
protected
Parameters
configThe table that specifies how to create the index.

Member Function Documentation

§ operator=()

forward_index & meta::index::forward_index::operator= ( forward_index &&  )
default

Move assigns a forward_index.

Parameters
otherThe forward_index to move into this one.

§ search_primary()

auto meta::index::forward_index::search_primary ( doc_id  d_id) const
virtual
Parameters
d_idThe doc_id to search for
Returns
the postings data for a given doc_id

§ stream_for()

util::optional< postings_stream< term_id, double > > meta::index::forward_index::stream_for ( doc_id  d_id) const
Parameters
d_idThe doc_id to search for
Returns
the postings stream for a given doc_id

§ liblinear_data()

std::string meta::index::forward_index::liblinear_data ( doc_id  d_id) const
Parameters
d_idThe document id of the doc to convert to liblinear format
Returns
the string representation liblinear format

§ unique_terms()

uint64_t meta::index::forward_index::unique_terms ( ) const
overridevirtual
Returns
the number of unique terms in the index

Reimplemented from meta::index::disk_index.

§ tokenize()

learn::feature_vector meta::index::forward_index::tokenize ( const corpus::document doc)
Parameters
docThe document to tokenize
Returns
the analyzed version of the document as a feature vector

§ create_index()

void meta::index::forward_index::create_index ( const cpptoml::table &  config,
corpus::corpus docs 
)
private

Initializes the forward index; it is called by the make_index factory function.

Parameters
configThe configuration to be used
docsA corpus object of documents to index

§ valid()

bool meta::index::forward_index::valid ( ) const
private
Returns
whether this index contains all necessary files

Friends And Related Function Documentation

§ make_index [1/3]

template<class Index , class... Args>
std::shared_ptr<Index> make_index ( const cpptoml::table &  config,
Args &&...  args 
)
friend

forward_index is a friend of the factory method used to create it.

Usage:

auto idx =
index::make_index<dervied_index_type,
cache_type>(config_path, other, options);

Other options will be forwarded to the constructor for the chosen cache class.

Parameters
config_filethe path to the configuration file to be used to build the index.
argsany additional arguments to forward to the constructor for the cache class chosen
Returns
A properly initialized, and automatically cached, index.

§ make_index [2/3]

template<class Index , class... Args>
std::shared_ptr<Index> make_index ( const cpptoml::table &  config,
corpus::corpus docs,
Args &&...  args 
)
friend

forward_index is a friend of the factory method used to create it.

Usage:

auto idx = index::make_index<derived_index_type>(config);
Parameters
configThe configuration to be used to build the index
corpusThe collection of documents to index
argsany additional arguments to forward to the constructor for the chosen index type (usually none)
Returns
A properly initialized index

§ make_index [3/3]

template<class Index , template< class, class > class Cache, class... Args>
std::shared_ptr<cached_index<Index, Cache> > make_index ( const cpptoml::table &  config_file,
Args &&...  args 
)
friend

forward_index is a friend of the factory method used to create cached versions of it.

forward_index is a friend of the factory method used to create it.

forward_index is a friend of the factory method used to create it.

Usage:

auto idx =
index::make_index<dervied_index_type,
cache_type>(config_path, other, options);

Other options will be forwarded to the constructor for the chosen cache class.

Parameters
config_filethe path to the configuration file to be used to build the index.
argsany additional arguments to forward to the constructor for the cache class chosen
Returns
A properly initialized, and automatically cached, index.

The documentation for this class was generated from the following files: