ModErn Text Analysis
META Enumerates Textual Applications
Public Member Functions | Static Public Attributes | Private Attributes | List of all members
meta::corpus::gz_corpus Class Reference

Fills document objects with content line-by-line from gzip-compressed input files. More...

#include <gz_corpus.h>

Inheritance diagram for meta::corpus::gz_corpus:
meta::corpus::corpus

Public Member Functions

 gz_corpus (const std::string &file, std::string encoding, uint64_t num_docs)
 
bool has_next () const override
 
document next () override
 
uint64_t size () const override
 
- Public Member Functions inherited from meta::corpus::corpus
 corpus (std::string encoding)
 Constructs a new corpus with the given encoding. More...
 
virtual metadata::schema_type schema () const
 
virtual ~corpus ()=default
 Destructor.
 
const std::string & encoding () const
 
bool store_full_text () const
 
void set_store_full_text (bool store_full_text)
 

Static Public Attributes

static const util::string_view id = "gz-corpus"
 The identifier for this corpus.
 

Private Attributes

doc_id cur_id_
 The current document we are on.
 
uint64_t num_lines_
 The number of lines in the file.
 
io::gzifstream corpus_stream_
 The stream for reading the corpus.
 
io::gzifstream class_stream_
 The stream to read the class labels.
 

Additional Inherited Members

- Protected Member Functions inherited from meta::corpus::corpus
std::vector< metadata::fieldnext_metadata ()
 Helper function to be used by deriving classes in implementing next() to set the metadata for the current document.
 

Detailed Description

Fills document objects with content line-by-line from gzip-compressed input files.

Constructor & Destructor Documentation

§ gz_corpus()

meta::corpus::gz_corpus::gz_corpus ( const std::string &  file,
std::string  encoding,
uint64_t  num_docs 
)
Parameters
fileThe path to the compressed corpus file, where each line represents a document
encodingThe encoding for the file
num_docsThe number of documents in this corpus

Member Function Documentation

§ has_next()

bool meta::corpus::gz_corpus::has_next ( ) const
overridevirtual
Returns
whether there is another document in this corpus

Implements meta::corpus::corpus.

§ next()

document meta::corpus::gz_corpus::next ( )
overridevirtual
Returns
the next document from this corpus

Implements meta::corpus::corpus.

§ size()

uint64_t meta::corpus::gz_corpus::size ( ) const
overridevirtual
Returns
the number of documents in this corpus

Implements meta::corpus::corpus.


The documentation for this class was generated from the following files: