ModErn Text Analysis
META Enumerates Textual Applications
Public Member Functions | Static Public Attributes | Private Attributes | List of all members
meta::index::rocchio Class Reference

Implements the Rocchio algorithm for pseudo-relevance feedback. More...

#include <rocchio.h>

Inheritance diagram for meta::index::rocchio:
meta::index::ranker

Public Member Functions

 rocchio (std::shared_ptr< forward_index > fwd)
 
 rocchio (std::shared_ptr< forward_index > fwd, std::unique_ptr< ranker > &&initial_ranker, float alpha=default_alpha, float beta=default_beta, uint64_t k=default_k, uint64_t max_terms=default_max_terms)
 
 rocchio (std::istream &in)
 
void save (std::ostream &out) const override
 Saves the ranker to a stream. More...
 
std::vector< search_resultrank (ranker_context &ctx, uint64_t num_results, const filter_function_type &filter) override
 Scores a query using a document-at-a-time strategy. More...
 
- Public Member Functions inherited from meta::index::ranker
template<class ForwardIterator , class Function = bool (*)(doc_id)>
std::vector< search_resultscore (inverted_index &idx, ForwardIterator begin, ForwardIterator end, uint64_t num_results=10, Function &&filter=passthrough)
 
std::vector< search_resultscore (inverted_index &idx, const corpus::document &query, uint64_t num_results=10, const filter_function_type &filter=[](doc_id) { return true;})
 
virtual ~ranker ()=default
 Default destructor.
 

Static Public Attributes

static const util::string_view id = "rocchio"
 Identifier for this ranker.
 
static const constexpr float default_alpha = 1.0f
 Default value of alpha, the original query weight parameter.
 
static const constexpr float default_beta = 0.8f
 Default value of beta, the positive document weight parameter.
 
static const constexpr uint64_t default_k = 10
 Default value for k, the number of feedback documents to retrieve.
 
static const constexpr uint64_t default_max_terms = 50
 Default value for max_terms, the number of new terms to add to the new query.
 

Private Attributes

std::shared_ptr< forward_indexfwd_
 
std::unique_ptr< rankerinitial_ranker_
 
const float alpha_
 
const float beta_
 
const uint64_t k_
 
const uint64_t max_terms_
 

Additional Inherited Members

- Public Types inherited from meta::index::ranker
using filter_function_type = std::function< bool(doc_id did)>
 
- Static Public Member Functions inherited from meta::index::ranker
static bool passthrough (doc_id)
 

Detailed Description

Implements the Rocchio algorithm for pseudo-relevance feedback.

This implementation considers only positive documents for feedback. The top max_terms from the centroid of the feedback set are selected according to their weights provided by the wrapped ranker's score_one function. These are then interpolated into the query in count space, and then the results from running the wrapped ranker on the new query are returned.

Required config parameters:

[ranker]
method = "rocchio"

Optional config parameters:

alpha = 1.0 # original query weight parameter
beta = 1.0 # feedback document weight parameter
k = 10 # number of feedback documents to retrieve
max-terms = 50 # maximum number of feedback terms to use
[ranker.feedback]
method = # whatever ranker method you want to wrap
# other parameters for that ranker
See also
https://en.wikipedia.org/wiki/Rocchio_algorithm

Member Function Documentation

§ save()

void meta::index::rocchio::save ( std::ostream &  out) const
overridevirtual

Saves the ranker to a stream.

This should save the ranker's id, followed by any parameters needed for reconstruction.

Implements meta::index::ranker.

§ rank()

std::vector< search_result > meta::index::rocchio::rank ( ranker_context ctx,
uint64_t  num_results,
const filter_function_type &  filter 
)
overridevirtual

Scores a query using a document-at-a-time strategy.

You should not override this unless you desire a completely different ranking strategy than document-at-a-time, which might be the case if you are implementing a new pseudo-relevance feedback method.

Parameters
ctxThe ranker_context holding the postings lists
num_resultsThe number of search results to return
filterThe filter function to be used

Implements meta::index::ranker.


The documentation for this class was generated from the following files: