ModErn Text Analysis
META Enumerates Textual Applications
Public Types | Public Member Functions | Static Public Attributes | Private Attributes | List of all members
meta::corpus::libsvm_corpus Class Reference

Fills document objects with content line-by-line from a libsvm-formatted input file. More...

#include <libsvm_corpus.h>

Inheritance diagram for meta::corpus::libsvm_corpus:
meta::corpus::corpus

Public Types

enum  label_type { CLASSIFICATION, REGRESSION }
 The label type for the corpus.
 

Public Member Functions

 libsvm_corpus (const std::string &file, label_type type=label_type::CLASSIFICATION, uint64_t num_docs=0)
 
bool has_next () const override
 
document next () override
 
uint64_t size () const override
 
metadata::schema_type schema () const override
 
- Public Member Functions inherited from meta::corpus::corpus
 corpus (std::string encoding)
 Constructs a new corpus with the given encoding. More...
 
virtual ~corpus ()=default
 Destructor.
 
const std::string & encoding () const
 
bool store_full_text () const
 
void set_store_full_text (bool store_full_text)
 

Static Public Attributes

static const util::string_view id = "libsvm-corpus"
 The identifier for this corpus.
 

Private Attributes

doc_id cur_id_
 The current document we are on.
 
label_type lbl_type_
 The label type.
 
uint64_t num_lines_
 The number of lines in the file.
 
std::string next_content_
 The next document.
 
std::ifstream input_
 The stream being read from.
 

Additional Inherited Members

- Protected Member Functions inherited from meta::corpus::corpus
std::vector< metadata::fieldnext_metadata ()
 Helper function to be used by deriving classes in implementing next() to set the metadata for the current document.
 

Detailed Description

Fills document objects with content line-by-line from a libsvm-formatted input file.

This should only be used with a libsvm_analyzer.

Constructor & Destructor Documentation

§ libsvm_corpus()

meta::corpus::libsvm_corpus::libsvm_corpus ( const std::string &  file,
label_type  type = label_type::CLASSIFICATION,
uint64_t  num_docs = 0 
)
Parameters
fileThe path to the corpus file
typeThe label type for the data (classification or regression)
num_docsThe number of documents (i.e., lines) in the corpus file if known beforehand. If unknown, leave out this parameter and the value will be calculated in the constructor.

Member Function Documentation

§ has_next()

bool meta::corpus::libsvm_corpus::has_next ( ) const
overridevirtual
Returns
whether there is another document in this corpus

Implements meta::corpus::corpus.

§ next()

document meta::corpus::libsvm_corpus::next ( )
overridevirtual
Returns
the next document from this corpus

Implements meta::corpus::corpus.

§ size()

uint64_t meta::corpus::libsvm_corpus::size ( ) const
overridevirtual
Returns
the number of documents in this corpus

Implements meta::corpus::corpus.

§ schema()

metadata::schema_type meta::corpus::libsvm_corpus::schema ( ) const
overridevirtual
Returns
the corpus' metadata schema

Reimplemented from meta::corpus::corpus.


The documentation for this class was generated from the following files: