ModErn Text Analysis
META Enumerates Textual Applications
Public Types | Public Member Functions | Private Attributes | List of all members
meta::learn::dataset Class Reference

Represents an in-memory view of a set of documents for running learning algorithms over. More...

#include <dataset.h>

Inheritance diagram for meta::learn::dataset:
meta::learn::labeled_dataset< class_label > meta::learn::labeled_dataset< LabelType > meta::classify::multiclass_dataset

Public Types

using instance_type = instance
 
using const_iterator = std::vector< instance_type >::const_iterator
 
using iterator = std::vector< instance_type >::iterator
 
using size_type = std::vector< instance_type >::size_type
 

Public Member Functions

template<class ForwardIterator , class ProgressTrait = printing::default_progress_trait>
 dataset (std::shared_ptr< index::forward_index > idx, ForwardIterator begin, ForwardIterator end, ProgressTrait=ProgressTrait{})
 Creates an in-memory dataset from a forward_index and a range of doc_ids, represented as iterators.
 
template<class ForwardIterator , class ProgressTrait = printing::default_progress_trait>
 dataset (std::shared_ptr< index::inverted_index > idx, ForwardIterator begin, ForwardIterator end, ProgressTrait=ProgressTrait{})
 Creates an in-memory listing of documents from an inverted_index and a range of doc_ids, represented as iterators. More...
 
template<class ForwardIterator >
 dataset (ForwardIterator begin, ForwardIterator end, size_type total_features)
 Creates an in-memory dataset from a pair of iterators. More...
 
template<class ForwardIterator , class FeatureVectorFunction >
 dataset (ForwardIterator begin, ForwardIterator end, size_type total_features, FeatureVectorFunction &&featurizer)
 Creates an in-memory dataset from a pair of iterators and a function to convert to a feature_vector.
 
const_iterator begin () const
 
iterator begin ()
 
const_iterator end () const
 
iterator end ()
 
size_type size () const
 
size_type total_features () const
 
const instance_typeoperator() (size_type index) const
 

Private Attributes

std::vector< instance_typeinstances_
 the instances themselves
 
size_type total_features_
 the total number of unique features in the dataset
 

Detailed Description

Represents an in-memory view of a set of documents for running learning algorithms over.

Constructor & Destructor Documentation

§ dataset() [1/2]

template<class ForwardIterator , class ProgressTrait = printing::default_progress_trait>
meta::learn::dataset::dataset ( std::shared_ptr< index::inverted_index idx,
ForwardIterator  begin,
ForwardIterator  end,
ProgressTrait  = ProgressTrait{} 
)
inline

Creates an in-memory listing of documents from an inverted_index and a range of doc_ids, represented as iterators.

Note that this constructor will not load any feature_vectors, as doing so from an inverted index isn't possible. This ctor is mainly for use with the knn classifier. The id field of the instance_types stored within the dataset is a document_id.

§ dataset() [2/2]

template<class ForwardIterator >
meta::learn::dataset::dataset ( ForwardIterator  begin,
ForwardIterator  end,
size_type  total_features 
)
inline

Creates an in-memory dataset from a pair of iterators.

The dereferenced type must have a conversion operator to a feature_vector.

Member Function Documentation

§ begin() [1/2]

const_iterator meta::learn::dataset::begin ( ) const
inline
Returns
an iterator to the first instance

§ begin() [2/2]

iterator meta::learn::dataset::begin ( )
inline
Returns
an iterator to the first instance

§ end() [1/2]

const_iterator meta::learn::dataset::end ( ) const
inline
Returns
an iterator to one past the end of the dataset

§ end() [2/2]

iterator meta::learn::dataset::end ( )
inline
Returns
an iterator to one past the end of the dataset

§ size()

size_type meta::learn::dataset::size ( ) const
inline
Returns
the size of the dataset

§ total_features()

size_type meta::learn::dataset::total_features ( ) const
inline
Returns
the number of features in the dataset

§ operator()()

const instance_type& meta::learn::dataset::operator() ( size_type  index) const
inline
Parameters
indexThe index of the item you want in the dataset. Note that the index is not a doc_id!

Index == 0 does not imply doc_id == 0.

Returns
the instance at that index in the dataset

The documentation for this class was generated from the following file: