ModErn Text Analysis
META Enumerates Textual Applications
vocabulary_map.h
Go to the documentation of this file.
1 
10 #ifndef META_VOCABULARY_MAP_H_
11 #define META_VOCABULARY_MAP_H_
12 
13 #include "meta/config.h"
14 #include "meta/io/mmap_file.h"
15 #include "meta/util/disk_vector.h"
16 #include "meta/util/optional.h"
17 
18 namespace meta
19 {
20 namespace index
21 {
22 
30 {
31  private:
36 
42 
46  uint64_t block_size_;
47 
52  uint64_t leaf_end_pos_;
53 
59 
65  int compare(const std::string& term, const char* other) const;
66 
67  public:
78  vocabulary_map(const std::string& path, uint16_t block_size = 4096);
79 
83  vocabulary_map(vocabulary_map&&) = default;
84 
89 
94  util::optional<term_id> find(const std::string& term) const;
95 
103  std::string find_term(term_id t_id) const;
104 
108  uint64_t size() const;
109 };
110 }
111 }
112 
113 #endif
std::string find_term(term_id t_id) const
Finds the term associated with the given id.
Definition: vocabulary_map.cpp:83
A class for representing optional values.
Definition: optional.h:115
uint64_t initial_seek_pos_
The position of the first internal node that is not the root.
Definition: vocabulary_map.h:58
Memory maps a text file readonly.
Definition: mmap_file.h:27
util::disk_vector< uint64_t > inverse_
Byte positions for each term in the leaves to allow for reverse lookup of a the string associated wit...
Definition: vocabulary_map.h:41
A read-only view of a B+-tree-like structure that stores the vocabulary for an index.
Definition: vocabulary_map.h:29
vocabulary_map & operator=(vocabulary_map &&)=default
Move assigns a vocabulary_map.
The ModErn Text Analysis toolkit is a suite of natural language processing, classification, information retrieval, data mining, and other applications of text processing.
Definition: analyzer.h:25
int compare(const std::string &term, const char *other) const
Convenience wrapper for comparing the term with strings in the tree.
Definition: vocabulary_map.cpp:78
io::mmap_file file_
The file containing the tree.
Definition: vocabulary_map.h:35
vocabulary_map(const std::string &path, uint16_t block_size=4096)
Creates a vocabulary map reading the file in the given path with the given block size.
Definition: vocabulary_map.cpp:15
uint64_t size() const
The number of terms in the map.
Definition: vocabulary_map.cpp:88
uint64_t block_size_
The size of the nodes in the tree.
Definition: vocabulary_map.h:46
uint64_t leaf_end_pos_
The ending position of the leaf nodes.
Definition: vocabulary_map.h:52
util::optional< term_id > find(const std::string &term) const
Finds the given term in the tree, if it exists.
Definition: vocabulary_map.cpp:31