📜 ⬆️ ⬇️

My C ++ I / O streams using std :: streambuf

The example article explains how to implement streaming I / O support from the standard library (<iostream>) for its classes.

The text of the article will often contain the word "stream", which means just the input-output stream ((i / o) stream), but not the execution thread (thread). Execution threads are not considered in the article.

Introduction


Standard library streams are a powerful tool. Argument function can specify the stream, and it provides its versatility: it can work with standard files (fstream) and the console (cin / cout), and with sockets and COM-ports, if you find the appropriate library.
')
However, it is not always possible to find a ready-made library where the appropriate functionality has already been implemented, maybe even you are developing your own library with your own classes. Then there is the task of implementing the thread interface on its own.

Environment Used


When writing an article for the test of examples, the g ++ compiler was used (Ubuntu 5.4.0-6ubuntu1 ~ 16.04.4), as well as the c ++ 11 standard. For clarity, I used the override keyword from it to mark the base class's overridable methods, but if you remove it (and nullptr to replace NULL), then it should be collected on older standards.

All examples are also available on github: streambuf_examples .

Content



How are the streams arranged?


Each class that supports stream I / O inherits the classes std :: istream (input), std :: ostream (output), or std :: iostream (input and output). They provide the ability to use overloaded '<<' and '>>' operators, output formatting, converting numbers to strings and vice versa, etc.

However, the direct reading or writing of data does not occur in it, but in the class inheriting from std :: streambuf . Streambuf itself is just an interface with a set of virtual functions that need to be redefined in the inheritance class and already in them implement your own logic of reading / writing data (this is exactly what is done in the std :: filebuf and std :: stringbuf classes for fstream and stringstream respectively).

In addition, streambuf implements part of the buffer logic. The programmer need only specify the beginning and end of the buffer and implement event handlers for its overflow, emptying, synchronization, etc.

When developing your own streams, the most difficult part is the implementation of std :: streambuf . Derived classes from istream, ostream or iostream in simple cases may be completely absent.

Simple cases - unbuffered


In the simple case, or when performance is not important, buffers may not be necessary. Then it is enough to override only three virtual functions:


Hereinafter, the description of functions is taken from cppreference.com

Example 1 - we filter numbers


Perhaps, while the text is enough. As an example, let's analyze the filtering stream, which will pass only the characters of numbers and spaces (so that the numbers can somehow be separated from each other), we will take the data from another stream.

Code
#include <iostream> #include <sstream> #include <string> using namespace std; class numfilterbuf : public streambuf { private: istream *in; ostream *out; int cur; //  ,   underflow() protected: /*    : */ virtual int overflow(int c) override { if (c == traits_type::eof()){ return traits_type::eof(); } char_type ch = static_cast<char_type>(c); if (ch == ' ' || (ch >= '0' && ch <= '9')){ //     out->put(ch); // -  ,  EOF return out->good() ? ch : traits_type::eof(); } return ch; } /*    : */ // -        segmentation fault virtual int uflow() override { int c = underflow(); cur = traits_type::eof(); // underflow()       return c; } virtual int underflow() override { if (cur != traits_type::eof()){ return cur; } //   ,  while (in->good()){ cur = in->get(); if (cur == traits_type::eof()){ return traits_type::eof(); } char_type ch = static_cast<char_type>(cur); if (ch == ' ' || (ch >= '0' && ch <= '9')){ //      return ch; } } return traits_type::eof(); } public: numfilterbuf(istream &_in, ostream &_out) : in(&_in), out(&_out), cur(traits_type::eof()) {} }; int main(int argc, char **argv){ const char str1[] = "In 4 bytes contains 32 bits"; const char str2[] = "Unix time starts from Jan 1, 1970"; istringstream str(str1); numfilterbuf buf(str, cout); //   stringstream,    iostream numfilter(&buf); //        iostream string val; getline(numfilter, val); numfilter.clear(); //     EOF     stringstream cout << "Original: '" << str1 << "'" << endl; cout << "Read from numfilter: '" << val << "'" << endl; cout << "Original: '" << str2 << "'" << endl; cout << "Written to numfilter: '"; numfilter << str2; cout << "'" << endl; return 0; } 


The result of the program:

 Original: 'In 4 bytes contains 32 bits' Read from numfilter: ' 4 32 ' Original: 'Unix time starts from Jan 1, 1970' Written to numfilter: ' 1 1970' 

The main points in the code have already been commented, however, it is worth noting that it is important to read both uflow and underflow for reading, since underflow can be called before uflow and even several times in a row . If you add debugging output to the beginning of these functions, you can see it clearly, for example, when reading from a stream into an integer variable.

Also in the code, you might notice the use of the char_type type. It is defined in the streambuf class and in our case it is an alias to the char type, i.e. single byte character. More on this will be discussed at the end of the article.

We use buffers


As I said earlier, streambuf already implements in itself a part of the logic of working with the buffer and provides access to 6 pointers, 3 pointers each to the input and output buffers. However, streambuf does not implement memory allocation for buffers. This task is assigned to the programmer along with the initialization of the buffer pointers.

For the input buffer, the pointers are as follows:



Visual illustration from mr-edd.co.uk

The following functions are also used to control the input buffer pointers:


Output buffer pointers have similar names and purposes:



Another graphic illustration from mr-edd.co.uk

The control functions of the output buffer are also similar:


At this theory ends, and we turn to practice.

Example 2 - block output


In one project, I needed to transparently divide the stream into small parts, each of which was accompanied by a certain heading. I realized this with the help of the new heir streambuf . It seemed to me that this class rather simply and clearly shows a simple operation with an output buffer. Therefore, in the following example, we will divide the output into parts and frame each with the <start> and <end> tags:

Code
 #include <iostream> #include <sstream> #include <string> #include <vector> using namespace std; class blockoutputbuf : public streambuf { private: ostream *out; vector<char_type> buffer; string startb, endb; protected: virtual int overflow(int c) override { if (out->good() && c != traits_type::eof()){ *pptr() = c; //   1 "" ,    pbump(1); //        return sync() == 0 ? c : traits_type::eof(); } return traits_type::eof(); } virtual int sync() override { if (pptr() == pbase()) //  ,     return 0; ptrdiff_t sz = pptr() - pbase(); //,      //     *out << startb; out->write(pbase(), sz); *out << endb; if (out->good()){ pbump(-sz); //        return 0; } return -1; } public: blockoutputbuf(ostream &_out, size_t _bufsize, string _startb, string _endb) : out(&_out), buffer(_bufsize), startb(_startb), endb(_endb) { char_type *buf = buffer.data(); setp(buf, buf + (buffer.size() - 1)); // -1  ,    overflow() } }; int main(int argc, char **argv){ const char str1[] = "In 4 bytes contains 32 bits"; const char str2[] = "Unix time starts from Jan 1, 1970"; blockoutputbuf buf(cout, 10, "<start>", "<end>\n"); ostream blockoutput(&buf); cout << "Original: '" << str1 << "'" << endl; cout << "Written to blockoutputbuf: '"; blockoutput << str1; blockoutput.flush(); //"" ,       str1 cout << "'" << endl; cout << "Original: '" << str2 << "'" << endl; cout << "Written to blockoutputbuf: '"; blockoutput << str2; blockoutput.flush(); cout << "'" << endl; return 0; } 


The attentive reader has probably already thought a long time ago: the buffer is a buffer, but you have to somehow reset it, not only when overflowed, but also at the request of the programmer (just as it does when writing to a file).

For this purpose, one more virtual function int sync () serves. Usually, it is called just at the request of the programmer, however, in the example above, we also call it ourselves when the buffer overflows. The value returned by it indicates a successful synchronization (0) or unsuccessful (-1), if unsuccessful, the stream becomes an invalid state. The default implementation does nothing and simply returns 0 (success).

Speaking of buffer overflow. In the example, a small trick was used to simplify the implementation of overflow() : the actual buffer size is always 1 element larger than the streambuf "thinks". This allows you to place the “not fit” character passed to the overflow function and not complicate the code with its specific processing.

The output of the program for blocks of 10 characters is as follows:

Conclusion
 Original: 'In 4 bytes contains 32 bits'
 Written to blockoutputbuf: '<start> In 4 bytes <end>
 <start> contains <end>
 <start> 32 bits <end>
 '
 Original: 'Unix time starts from Jan 1, 1970'
 Written to blockoutputbuf: '<start> Unix time <end>
 <start> starts fro <end>
 <start> m Jan 1, 1 <end>
 <start> 970 <end>
 ' 


Example 3 - buffered input from file


Reading is a bit more complicated, so let's start with a simple one. In the example below, a simple sequential reading of the file is implemented using a stream. To retrieve data from a file, use the tools of the C standard library.

Code
 #include <iostream> #include <string> #include <vector> #include <cstdio> #include <cstdlib> using namespace std; class cfilebuf : public streambuf { private: vector<char_type> buffer; FILE *file; protected: virtual int underflow() override { if (!file) return traits_type::eof(); if (gptr() < egptr()) //   ,    return *gptr(); char_type *start = eback(); //   ,    size_t rd = fread(start, sizeof(char_type), buffer.size(), file); //    ,     setg(start, start, start + rd); return rd > 0 ? *gptr() : traits_type::eof(); } public: cfilebuf(size_t _bufsize) : buffer(_bufsize), file(nullptr) { char_type *start = buffer.data(); char_type *end = start + buffer.size(); setg(start, end, end); // eback = start, gptr = end, egptr = end //.. gptr == egptr,            } ~cfilebuf(){ close(); } bool open(string fn){ close(); file = fopen(fn.c_str(), "r"); return file != nullptr; } void close(){ if (file){ fclose(file); file = nullptr; } } }; int main(int argc, char **argv){ cfilebuf buf(10); istream in(&buf); string line; buf.open("file.txt"); while (getline(in, line)){ cout << line << endl; } return 0; } 


Since the example is simple, it has a number of drawbacks, the main of which we will discuss below.

Extended capabilities


Despite the fact that the streams obtained in the previous sections can already be used, their implementation is incomplete. In practice, more complex situations may arise that require additional functionality, which will be discussed later.

seekoff and seekpos to move around the file


When working with a file, it may be necessary to move the position in the file to an arbitrary location. As you have already guessed, in the example above this is not implemented: the file is read only in one direction, you cannot go back, only to rediscover the file. To fix this major flaw, we will need to override the following methods of the streambuf class:


In functions, in addition to the first argument (position or offset), there are two more:


Now, armed with this knowledge, let's imagine what the implementation of navigating through a file in Example 3 might look like:

 virtual streampos seekpos(streampos sp, ios_base::openmode which) override { if (!(which & ios_base::in)) return streampos(-1); return fill_buffer_from(sp); } virtual streampos seekoff(streamoff off, ios_base::seekdir way, ios_base::openmode which) override { if (!(which & ios_base::in)) return streampos(-1); switch (way){ default: case ios_base::beg: return fill_buffer_from(off, SEEK_SET); case ios_base::cur: return fill_buffer_from(pos_base + gptr() - eback() + off, SEEK_SET); //       case ios_base::end: return fill_buffer_from(off, SEEK_END); } } 

Explanation: in the pos_base field the pos_base is stored in the file from which the data was loaded into the buffer.

It looks pretty simple, but in fact, the fill_buffer_from function takes over the fill_buffer_from . Its implementation is as follows:

 streampos fill_buffer_from(streampos newpos, int dir = SEEK_SET){ if (!file || fseek(file, newpos, dir) == -1) return -1; long pos = ftell(file); if (pos < 0) return -1; pos_base = pos; char_type *start = eback(); size_t rd = fread(start, sizeof(char_type), buffer.size(), file); setg(start, start, start + rd); return rd > 0 && pos_base >= 0 ? pos_base : streampos(-1); } 

The function tries to move the pointer in the file to the specified position and fill our entire buffer from beginning to end. It is not very productive for any operation to re-fill the buffer from the file, but in the example this is done to simplify the implementation. When you implement your own successor to streambuf, you will probably know the subtleties of working with your data in order to write the most efficient pointer positioning functions.

Well, we go further.

pbackfail - return read characters back


There are algorithms that do not require free movement to an arbitrary place in the stream, but in the process of reading and processing they may be asked to return several characters (usually 1-3) back to the stream. For this, istream has unget() and putback(character) methods. In the streambuf class streambuf if the character returned to the stream matches the previous one in the buffer, no additional calls occur. However, if the characters do not match or the buffer pointer is at the very beginning, then a function is called to handle this situation:


Now we implement our pbackfail :

 virtual int pbackfail(int c) override { //   if (pos_base <= 0 || gptr() > eback()) return traits_type::eof(); //   ,     if (fill_buffer_from(pos_base - 1L) == -1) return traits_type::eof(); if (*gptr() != c){ gbump(1); return traits_type::eof(); } return *gptr(); } 

As I said earlier, in this example, the performance will be terrible, because almost every time pbackfail called pbackfail data will be re-read from the file to the buffer for the sake of just one character - the previous one. But the goal of this article is to understand the principle of operation, and not the competition in the performance of implementations.

Example 4 - reading a file with positioning and returning characters


Here you can simply see the code in which the edits implemented in the previous sections are added, as well as examples of the use of this functionality, with explanations:

Code
 #include <iostream> #include <string> #include <vector> #include <cstdio> #include <cstdlib> using namespace std; class cfilebuf : public streambuf { private: vector<char_type> buffer; FILE *file; streampos pos_base; //    eback streampos fill_buffer_from(streampos newpos, int dir = SEEK_SET) { if (!file || fseek(file, newpos, dir) == -1) return -1; //      eback long pos = ftell(file); if (pos < 0) return -1; pos_base = pos; char_type *start = eback(); //   ,    size_t rd = fread(start, sizeof(char_type), buffer.size(), file); //    ,     setg(start, start, start + rd); return rd > 0 && pos_base >= 0 ? pos_base : streampos(-1); } protected: virtual int underflow() override { if (!file) return traits_type::eof(); if (gptr() < egptr()) //   ,    return *gptr(); streampos pos; if (pos_base < 0) { //    ,    pos = fill_buffer_from(0); } else { //      pos = fill_buffer_from(pos_base + egptr() - eback()); } return pos != streampos(-1) ? *gptr() : traits_type::eof(); } //       ios_base::in //       ios_base::out     ( ) virtual streampos seekpos(streampos sp, ios_base::openmode which) override { if (!(which & ios_base::in)) return streampos(-1); return fill_buffer_from(sp); } //   :  ,       virtual streampos seekoff(streamoff off, ios_base::seekdir way, ios_base::openmode which) override { if (!(which & ios_base::in)) return streampos(-1); switch (way) { default: case ios_base::beg: return fill_buffer_from(off, SEEK_SET); case ios_base::cur: return fill_buffer_from(pos_base + gptr() - eback() + off); //       case ios_base::end: return fill_buffer_from(off, SEEK_END); } } virtual int pbackfail(int c) override { // gptr > eback,        , //     ,  if (pos_base <= 0 || gptr() > eback()) return traits_type::eof(); //   ,     if (fill_buffer_from(pos_base - streampos(1L)) == streampos(-1)) return traits_type::eof(); if (*gptr() != c) { gbump(1); // ,   return traits_type::eof(); } return *gptr(); } public: cfilebuf(size_t _bufsize) : buffer(_bufsize), file(nullptr), pos_base(-1) { char_type *start = buffer.data(); char_type *end = start + buffer.size(); setg(start, end, end); // eback = start, gptr = end, egptr = end } ~cfilebuf() { close(); } bool open(string fn) { close(); file = fopen(fn.c_str(), "r"); return file != nullptr; } void close() { if (file) { fclose(file); file = nullptr; } } }; void read_to_end(istream &in) { string line; while (getline(in, line)) { cout << line << endl; } } int main(int argc, char **argv) { cfilebuf buf(10); istream in(&buf); buf.open("file.txt"); read_to_end(in); in.clear(); //      cout << endl << endl << "Read last 6 symbols:" << endl; in.seekg(-5, ios_base::end); //  ,     5   in.seekg(-1, ios_base::cur); //  6,     :) read_to_end(in); in.clear(); cout << endl << endl << "Read all again:" << endl; in.seekg(0); read_to_end(in); in.clear(); in.seekg(2); //     3-    (      2-) in.get(); in.putback('b'); in.putback('a'); // pbackfail()             in.putback('H'); string word; in >> word; cout << endl << endl << "Read word after putback(): " << word << endl; return 0; } 


Other features


In addition to the features discussed in the article, there are others. Some are quite simply implemented, others are needed only in specific cases, so they are not discussed in detail. The following is a list of such functions and a brief description of why they are needed. A more detailed description of them you can find in the official documentation (link is at the end of the article).

Other override methods:


Also in your projects a situation may arise when the size of one character is more than 1 byte. In this case, you should inherit from the basic_streambuf template class and use the character type you need. Type aliases such as char_type , int_type , pos_type , etc. will help you in the implementation. It is preferable to use them, since they always correspond to the types with which the library implementation of streambuf .

Conclusion


The standard library offers a large amount of functions for the flexible and productive implementation of your own threads. However, remember that actual performance always depends on your implementation of the overridden methods for your particular case.

Links


Source: https://habr.com/ru/post/326578/


All Articles