ModErn Text Analysis
META Enumerates Textual Applications
Public Member Functions | Static Public Attributes | Private Attributes | List of all members
meta::corpus::file_corpus Class Reference

Creates document objects from individual files, each representing a single document. More...

#include <file_corpus.h>

Inheritance diagram for meta::corpus::file_corpus:
meta::corpus::corpus

Public Member Functions

 file_corpus (const std::string &prefix, const std::string &doc_list, std::string encoding)
 
bool has_next () const override
 
document next () override
 
uint64_t size () const override
 
metadata::schema_type schema () const override
 
- Public Member Functions inherited from meta::corpus::corpus
 corpus (std::string encoding)
 Constructs a new corpus with the given encoding. More...
 
virtual ~corpus ()=default
 Destructor.
 
const std::string & encoding () const
 
bool store_full_text () const
 
void set_store_full_text (bool store_full_text)
 

Static Public Attributes

static const util::string_view id = "file-corpus"
 The identifier for this corpus.
 

Private Attributes

uint64_t cur_
 the current document we are on
 
std::string prefix_
 the path to all the documents
 
std::vector< std::pair< std::string, class_label > > docs_
 contains doc class labels and paths
 

Additional Inherited Members

- Protected Member Functions inherited from meta::corpus::corpus
std::vector< metadata::fieldnext_metadata ()
 Helper function to be used by deriving classes in implementing next() to set the metadata for the current document.
 

Detailed Description

Creates document objects from individual files, each representing a single document.

Constructor & Destructor Documentation

§ file_corpus()

meta::corpus::file_corpus::file_corpus ( const std::string &  prefix,
const std::string &  doc_list,
std::string  encoding 
)
Parameters
prefixPath to where the files are located
doc_listA file containing the path to each document in the corpus preceded by a class label (or "[none]")
encodingThe encoding of the corpus

Member Function Documentation

§ has_next()

bool meta::corpus::file_corpus::has_next ( ) const
overridevirtual
Returns
whether there is another document in this corpus

Implements meta::corpus::corpus.

§ next()

document meta::corpus::file_corpus::next ( )
overridevirtual
Returns
the next document from this corpus

Implements meta::corpus::corpus.

§ size()

uint64_t meta::corpus::file_corpus::size ( ) const
overridevirtual
Returns
the number of documents in this corpus

Implements meta::corpus::corpus.

§ schema()

metadata::schema_type meta::corpus::file_corpus::schema ( ) const
overridevirtual
Returns
the metadata schema for this corpus

Reimplemented from meta::corpus::corpus.


The documentation for this class was generated from the following files: