ModErn Text Analysis
META Enumerates Textual Applications
Public Member Functions | Private Attributes | List of all members
meta::corpus::document Class Reference

Represents an indexable document. More...

#include <document.h>

Public Member Functions

 document (doc_id d_id=doc_id{0}, const class_label &label=class_label{"[NONE]"})
 Constructor. More...
 
const class_label & label () const
 
void content (const std::string &content, const std::string &encoding="utf-8")
 Sets the content of the document to be the parameter. More...
 
void encoding (const std::string &encoding)
 Sets the encoding for the document to be the parameter. More...
 
const std::string & content () const
 
const std::string & encoding () const
 
doc_id id () const
 
bool contains_content () const
 
void label (class_label label)
 Sets the label for this document. More...
 
const std::vector< metadata::field > & mdata () const
 
void mdata (std::vector< metadata::field > &&metadata)
 Sets the extra metadata fields for this document. More...
 

Private Attributes

doc_id d_id_
 The document id for this document.
 
class_label label_
 Which category this document would be classified into.
 
std::vector< metadata::fieldmdata_
 Other metadata fields for this document.
 
util::optional< std::string > content_
 What the document contains.
 
std::string encoding_
 The encoding for the content.
 

Detailed Description

Represents an indexable document.

Internally, a document may contain either string content or a path to a file it represents on disk.

Once tokenized, a document contains a mapping of term -> frequency. This mapping is empty upon creation.

Constructor & Destructor Documentation

§ document()

meta::corpus::document::document ( doc_id  d_id = doc_id{0},
const class_label &  label = class_label{"[NONE]"} 
)

Constructor.

Parameters
d_idThe doc id to assign to this document
labelThe optional class label to assign this document

Member Function Documentation

§ label() [1/2]

const class_label & meta::corpus::document::label ( ) const
Returns
the classification category this document is in

§ content() [1/2]

void meta::corpus::document::content ( const std::string &  content,
const std::string &  encoding = "utf-8" 
)

Sets the content of the document to be the parameter.

Parameters
contentThe string content to assign into this document
encodingthe encoding of content, which defaults to utf-8

§ encoding() [1/2]

void meta::corpus::document::encoding ( const std::string &  encoding)

Sets the encoding for the document to be the parameter.

Parameters
encodingThe string label for the encoding

§ content() [2/2]

const std::string & meta::corpus::document::content ( ) const
Returns
the contents of this document

§ encoding() [2/2]

const std::string & meta::corpus::document::encoding ( ) const
Returns
the encoding for this document

§ id()

doc_id meta::corpus::document::id ( ) const
Returns
the doc_id for this document

§ contains_content()

bool meta::corpus::document::contains_content ( ) const
Returns
whether this document contains its content internally

§ label() [2/2]

void meta::corpus::document::label ( class_label  label)

Sets the label for this document.

Parameters
labelThe new label for this document

§ mdata() [1/2]

const std::vector< metadata::field > & meta::corpus::document::mdata ( ) const
Returns
the set of extra metadata fields for this document

§ mdata() [2/2]

void meta::corpus::document::mdata ( std::vector< metadata::field > &&  metadata)

Sets the extra metadata fields for this document.

Parameters
metadataThe new metadata for this document

The documentation for this class was generated from the following files: