ModErn Text Analysis
META Enumerates Textual Applications
vocabulary_map_writer.h
Go to the documentation of this file.
1 
10 #ifndef META_VOCABULARY_MAP_WRITER_H_
11 #define META_VOCABULARY_MAP_WRITER_H_
12 
13 #include <cstdint>
14 #include <fstream>
15 #include <stdexcept>
16 #include <string>
17 
18 #include "meta/config.h"
19 
20 namespace meta
21 {
22 namespace index
23 {
24 
60 {
61  public:
71  vocabulary_map_writer(const std::string& path, uint16_t block_size = 4096);
72 
80 
89  void insert(const std::string& term);
90 
91  private:
95  void write_padding();
96 
100  void flush();
101 
103  std::ofstream file_;
104 
110  uint64_t file_write_pos_;
111 
113  std::ofstream inverse_file_;
114 
116  std::string path_;
117 
119  uint16_t block_size_;
120 
122  uint64_t num_terms_;
123 
126 
128  uint64_t written_nodes_;
129 };
130 
134 class vocabulary_map_writer_exception : public std::runtime_error
135 {
136  using std::runtime_error::runtime_error;
137 };
138 }
139 }
140 #endif
void insert(const std::string &term)
Inserts this term into the map.
Definition: vocabulary_map_writer.cpp:33
std::ofstream inverse_file_
The file containing the reverse mapping.
Definition: vocabulary_map_writer.h:113
uint64_t written_nodes_
Number of written nodes to be "merged" when writing the next level.
Definition: vocabulary_map_writer.h:128
std::string path_
The path to the tree file.
Definition: vocabulary_map_writer.h:116
void flush()
Flushes a node to disk after writing the padding bytes.
Definition: vocabulary_map_writer.cpp:73
~vocabulary_map_writer()
The destructor for a vocabulary_map_writer flushes the last leaf node and builds the internal structu...
Definition: vocabulary_map_writer.cpp:80
vocabulary_map_writer(const std::string &path, uint16_t block_size=4096)
Creates a writer for a tree at the given path and block_size.
Definition: vocabulary_map_writer.cpp:17
void write_padding()
Writes null bytes to fill up the current block.
Definition: vocabulary_map_writer.cpp:61
uint16_t block_size_
The block size of every node in the tree, in bytes.
Definition: vocabulary_map_writer.h:119
uint64_t file_write_pos_
The current write position in the forward mapping tree file.
Definition: vocabulary_map_writer.h:110
The ModErn Text Analysis toolkit is a suite of natural language processing, classification, information retrieval, data mining, and other applications of text processing.
Definition: analyzer.h:25
std::ofstream file_
The file containing the forward mapping tree.
Definition: vocabulary_map_writer.h:103
An exception that can be thrown during the building of the tree.
Definition: vocabulary_map_writer.h:134
uint16_t remaining_block_space_
The remaining space in the block currently being written.
Definition: vocabulary_map_writer.h:125
uint64_t num_terms_
The total number of terms inserted so far.
Definition: vocabulary_map_writer.h:122
A class that writes the B+-tree-like data structure used for storing the term id mapping in an index...
Definition: vocabulary_map_writer.h:59