ModErn Text Analysis
META Enumerates Textual Applications
score_data.h
Go to the documentation of this file.
1 
10 #ifndef META_SCORE_DATA_H_
11 #define META_SCORE_DATA_H_
12 
13 #include "meta/config.h"
14 #include "meta/meta.h"
15 
16 namespace meta
17 {
18 
19 namespace corpus
20 {
21 class document;
22 }
23 
24 namespace index
25 {
26 class inverted_index;
27 }
28 }
29 
30 namespace meta
31 {
32 namespace index
33 {
34 
40 struct score_data
41 {
42  // general info
43 
47  float avg_dl;
49  uint64_t num_docs;
51  uint64_t total_terms;
53  float query_length;
54 
55  // term-based info
56 
58  term_id t_id;
62  uint64_t doc_count;
65 
66  // document-based info
67 
69  doc_id d_id;
71  uint64_t doc_term_count;
73  uint64_t doc_size;
75  uint64_t doc_unique_terms;
76 
86  score_data(inverted_index& p_idx, float p_avg_dl, uint64_t p_num_docs,
87  uint64_t p_total_terms, float p_query_length)
88  : idx(p_idx), // gcc no non-const ref init from brace init list
89  avg_dl{p_avg_dl},
90  num_docs{p_num_docs},
91  total_terms{p_total_terms},
92  query_length{p_query_length}
93  {
94  /* nothing */
95  }
96 };
97 }
98 }
99 
100 #endif
uint64_t doc_unique_terms
number of unique terms in the doc
Definition: score_data.h:75
Contains top-level namespace documentation for the META toolkit.
The inverted_index class stores information on a corpus indexed by term_ids.
Definition: inverted_index.h:65
uint64_t num_docs
total number of documents
Definition: score_data.h:49
uint64_t doc_count
number of docs that t_id appears in
Definition: score_data.h:62
float query_length
the total length of the query (sum of all term weights)
Definition: score_data.h:53
uint64_t corpus_term_count
number of times t_id appears in corpus
Definition: score_data.h:64
uint64_t total_terms
total number of terms in the index
Definition: score_data.h:51
The ModErn Text Analysis toolkit is a suite of natural language processing, classification, information retrieval, data mining, and other applications of text processing.
Definition: analyzer.h:25
doc_id d_id
document id
Definition: score_data.h:69
float query_term_weight
query term count (or weight in case of feedback)
Definition: score_data.h:60
score_data(inverted_index &p_idx, float p_avg_dl, uint64_t p_num_docs, uint64_t p_total_terms, float p_query_length)
Constructor to initialize most elements.
Definition: score_data.h:86
inverted_index & idx
index queries are running on
Definition: score_data.h:45
A score_data object contains information needed to evaluate a ranking function.
Definition: score_data.h:40
uint64_t doc_size
total number of terms in the doc
Definition: score_data.h:73
float avg_dl
average document length
Definition: score_data.h:47
uint64_t doc_term_count
number of times the term appears in the current doc
Definition: score_data.h:71
term_id t_id
doc term id
Definition: score_data.h:58