ModErn Text Analysis
META Enumerates Textual Applications
Public Member Functions | Static Public Attributes | Private Attributes | List of all members
meta::index::rocchio Class Reference

Implements the Rocchio algorithm for pseudo-relevance feedback. More...

#include <rocchio.h>

Inheritance diagram for meta::index::rocchio:

Public Member Functions

 rocchio (std::shared_ptr< forward_index > fwd)
 rocchio (std::shared_ptr< forward_index > fwd, std::unique_ptr< ranker > &&initial_ranker, float alpha=default_alpha, float beta=default_beta, uint64_t k=default_k, uint64_t max_terms=default_max_terms)
 rocchio (std::istream &in)
void save (std::ostream &out) const override
 Saves the ranker to a stream. More...
std::vector< search_resultrank (ranker_context &ctx, uint64_t num_results, const filter_function_type &filter) override
 Scores a query using a document-at-a-time strategy. More...
- Public Member Functions inherited from meta::index::ranker
template<class ForwardIterator , class Function = bool (*)(doc_id)>
std::vector< search_resultscore (inverted_index &idx, ForwardIterator begin, ForwardIterator end, uint64_t num_results=10, Function &&filter=passthrough)
std::vector< search_resultscore (inverted_index &idx, const corpus::document &query, uint64_t num_results=10, const filter_function_type &filter=[](doc_id) { return true;})
virtual ~ranker ()=default
 Default destructor.

Static Public Attributes

static const util::string_view id = "rocchio"
 Identifier for this ranker.
static const constexpr float default_alpha = 1.0f
 Default value of alpha, the original query weight parameter.
static const constexpr float default_beta = 0.8f
 Default value of beta, the positive document weight parameter.
static const constexpr uint64_t default_k = 10
 Default value for k, the number of feedback documents to retrieve.
static const constexpr uint64_t default_max_terms = 50
 Default value for max_terms, the number of new terms to add to the new query.

Private Attributes

std::shared_ptr< forward_indexfwd_
std::unique_ptr< rankerinitial_ranker_
const float alpha_
const float beta_
const uint64_t k_
const uint64_t max_terms_

Additional Inherited Members

- Public Types inherited from meta::index::ranker
using filter_function_type = std::function< bool(doc_id did)>
- Static Public Member Functions inherited from meta::index::ranker
static bool passthrough (doc_id)

Detailed Description

Implements the Rocchio algorithm for pseudo-relevance feedback.

This implementation considers only positive documents for feedback. The top max_terms from the centroid of the feedback set are selected according to their weights provided by the wrapped ranker's score_one function. These are then interpolated into the query in count space, and then the results from running the wrapped ranker on the new query are returned.

Required config parameters:

method = "rocchio"

Optional config parameters:

alpha = 1.0 # original query weight parameter
beta = 1.0 # feedback document weight parameter
k = 10 # number of feedback documents to retrieve
max-terms = 50 # maximum number of feedback terms to use
method = # whatever ranker method you want to wrap
# other parameters for that ranker
See also

Member Function Documentation

§ save()

void meta::index::rocchio::save ( std::ostream &  out) const

Saves the ranker to a stream.

This should save the ranker's id, followed by any parameters needed for reconstruction.

Implements meta::index::ranker.

§ rank()

std::vector< search_result > meta::index::rocchio::rank ( ranker_context ctx,
uint64_t  num_results,
const filter_function_type &  filter 

Scores a query using a document-at-a-time strategy.

You should not override this unless you desire a completely different ranking strategy than document-at-a-time, which might be the case if you are implementing a new pseudo-relevance feedback method.

ctxThe ranker_context holding the postings lists
num_resultsThe number of search results to return
filterThe filter function to be used

Implements meta::index::ranker.

The documentation for this class was generated from the following files: