ModErn Text Analysis
META Enumerates Textual Applications
Classes | Functions
meta::corpus Namespace Reference

Various ways to convert corpus formats into META-readable documents. More...

Classes

class  corpus
 Provides interface to with multiple corpus input formats. More...
 
class  corpus_exception
 Basic exception for corpus interactions. More...
 
class  corpus_factory
 Factory that is responsible for creating corpus instances from configuration files. More...
 
class  document
 Represents an indexable document. More...
 
class  file_corpus
 Creates document objects from individual files, each representing a single document. More...
 
class  gz_corpus
 Fills document objects with content line-by-line from gzip-compressed input files. More...
 
class  libsvm_corpus
 Fills document objects with content line-by-line from a libsvm-formatted input file. More...
 
class  line_corpus
 Fills document objects with content line-by-line from an input file. More...
 
class  metadata
 Represents the collection of metadata for a document. More...
 
class  metadata_exception
 Exception class for metadata operations. More...
 
class  metadata_parser
 Reads metadata from the metadata file of a corpus according to a schema. More...
 

Functions

template<class LocalStorage , class ConsumeFunction >
void parallel_consume (corpus &docs, parallel::thread_pool &pool, LocalStorage &&ls_fn, ConsumeFunction &&consume_fn)
 Consumes each document in a corpus using a pool of threads. More...
 
std::unique_ptr< corpusmake_corpus (const cpptoml::table &config)
 Convenience method for creating a corpus using the factory. More...
 
template<class Corpus >
std::unique_ptr< corpusmake_corpus (util::string_view prefix, util::string_view dataset, const cpptoml::table &config)
 Factory method for creating a corpus. More...
 
template<>
std::unique_ptr< corpusmake_corpus< file_corpus > (util::string_view prefix, util::string_view dataset, const cpptoml::table &config)
 Specialization of the factory method used to create line_corpus instances.
 
template<>
std::unique_ptr< corpusmake_corpus< gz_corpus > (util::string_view prefix, util::string_view dataset, const cpptoml::table &config)
 Specialization of the factory method used to create gz_corpus instances.
 
template<>
std::unique_ptr< corpusmake_corpus< libsvm_corpus > (util::string_view prefix, util::string_view dataset, const cpptoml::table &config)
 Specialization of the factory method used to create libsvm_corpus instances.
 
template<>
std::unique_ptr< corpusmake_corpus< line_corpus > (util::string_view prefix, util::string_view dataset, const cpptoml::table &config)
 Specialization of the factory method used to create line_corpus instances.
 
metadata::schema_type metadata_schema (const cpptoml::table &config)
 Extracts a metadata schema from a configuration file. More...
 

Detailed Description

Various ways to convert corpus formats into META-readable documents.

Function Documentation

§ parallel_consume()

template<class LocalStorage , class ConsumeFunction >
void meta::corpus::parallel_consume ( corpus docs,
parallel::thread_pool pool,
LocalStorage &&  ls_fn,
ConsumeFunction &&  consume_fn 
)

Consumes each document in a corpus using a pool of threads.

Parameters
docsThe corpus to consume
poolThe thread pool to use
ls_fnA function to create thread-specific storage
consume_fnA function to consume a document

§ make_corpus() [1/2]

std::unique_ptr< corpus > meta::corpus::make_corpus ( const cpptoml::table &  config)

Convenience method for creating a corpus using the factory.

The configuration object passed here should be the "global" configuration (as in, the one that contains the "prefix", "dataset", and "corpus" keys).

§ make_corpus() [2/2]

template<class Corpus >
std::unique_ptr<corpus> meta::corpus::make_corpus ( util::string_view  prefix,
util::string_view  dataset,
const cpptoml::table &  config 
)

Factory method for creating a corpus.

This should be specialized if your given corpus requires special construction behavior (e.g., reading additional parameters).

§ metadata_schema()

metadata::schema_type meta::corpus::metadata_schema ( const cpptoml::table &  config)

Extracts a metadata schema from a configuration file.

Parameters
configThe configuration group that specifies the metadata
Returns
the corresponding metadata::schema object.