User-Defined Output Stream Buffers

A blog about C++ programming, more or less.

You are probably already familiar with using iostream to perform I/O operations in C++. However, are you familiar with the backbone of the stream, which is the stream buffers. streambuf class is definitely a sophisticated class in the C++ standard library, if not the most sophisticated one. As it said, the complexity of the interfaces reflects tradition, the need for I/O performance, and the variety of human expectations.[2:§38.4] Nevertheless, a solid understanding of the stream buffer’s operations is required for implementing your own stream buffers. In today’s post, I am going to slice the streambuf into bite-size pieces and show you how to implement a user-defined output stream buffer.

This post is largely inspired by a section of the similar title in The C++ Standard Library by Nicolai Josuttis. Great book. Highly recommended.

Stream buffer basics

template<class CharT, class Traits = std::char_traits<CharT>>
class basic_streambuf;

Found in <streambuf>, the template class basic_streambuf<> defines the interface for stream buffers. A stream buffer is an abstract layer between an I/O stream and the final data source or destination. Different streambuf subclass implement different buffering strategies. Typically, an output stream buffer stores characters gathered from an output stream in a buffer until it flushes those characters to their real destination. An input stream buffer is similar, expect that the characters flow the other way.[2:§38.6] The buffer used to write characters is also called put area; the buffer for input is also called get area. The key to understand stream buffer’s operations is in knowing how those functions manipulate the get area or the put area.

Unbuffered output stream buffer

Streambuf buffer management is fairly sophisticated. So, let’s start with a simple one, which has no buffer to manage.

// hex-out-stream-nobuf.hpp

#pragma once

#include <unistd.h>

#include <streambuf>

#include "str-utils.hpp"

class HexOutBuf : public std::streambuf {
public:
    using char_type = std::streambuf::char_type;
    using int_type = std::streambuf::int_type;
    using traits_type = std::streambuf::traits_type;

protected:
    static constexpr int WIDTH = sizeof(char_type) * 2;

    virtual int_type overflow(int_type c) override {
        if (not traits_type::eq_int_type(c, traits_type::eof())) {
            const auto hex_str = ToHex(c, WIDTH);

            if (write(STDOUT_FILENO, hex_str.c_str(), hex_str.size()) == -1) {
                return traits_type::eof();
            }
        }

        return traits_type::not_eof(c);
    }
};

Basically, HexOutBuf class is a stream buffer which converts each character to its two character hex representation before writing it to the standard output channel(1) using the POSIX API write().

ToHex() is the function converts the given character to its Base16 encoding, and it may be implemented like this:

// str-utils.hpp

#pragma once

#include <iomanip>
#include <sstream>

inline auto ToHex(const unsigned c, const int width) {
    std::ostringstream oss;
    oss << std::setw(width) << std::setfill('0') << std::hex << c;
    return oss.str();
}

You can try this HexOutBuf with the following example program:

// test-utils.hpp

#pragma once

#include <ostream>

inline auto &TestHelper(std::ostream &out) {
    out << "1234" << '\0';
    out << "IJK" << '\0';
    out << "ab" << '\0';
    return out << '\n' << '\0';
}

// hex-out-stream-nobuf.cpp

#include "hex-out-stream-nobuf.hpp"
#include "test-utils.hpp"

int main() {
    HexOutBuf buffer;
    std::ostream out(&buffer);

    TestHelper(out);
}

The output may look like this:

$ ./hex-out-stream-nobuf
3132333400494a4b006162000a00

As you can see, the key to implement an output stream buffer is in overriding the overflow() virtual function. overflow(c) is responsible for sending the characters currently in the buffer, if any, plus the given character c to their real destination. It gets called when there is no room left in the associated put area. In our case, the default constructor assigns no space to the put area, thus overflow() is called for each character as soon as it is received.

Also, note that overflow() returns unspecified value not equal to traits_type::eof() on success, traits_type::eof() on failure. The base class version of the function returns traits_type::eof().[3]

Unbuffered output stream buffer improved

Although, our simple output stream buffer HexOutBuf works perfectly fine, it is not quite flexible. As it can only write to the standard output channel. Here is how we can improve it.

// hex-out-stream-nobuf-improved.hpp

#pragma once

#include <fcntl.h>
#include <unistd.h>

#include <streambuf>

#include "str-utils.hpp"

class HexOutBuf : public std::streambuf {
public:
    using char_type = std::streambuf::char_type;
    using int_type = std::streambuf::int_type;
    using traits_type = std::streambuf::traits_type;

    HexOutBuf(const int fd = INVALID_FD) : m_fd(fd), m_own(false) {
    }

    HexOutBuf(const char *pathname, const int flags, const mode_t mode = 0) :
        m_fd(open(pathname, flags, mode)), m_own(true) {
    }

    HexOutBuf(const HexOutBuf &) = delete;
    HexOutBuf &operator=(const HexOutBuf &) = delete;

    HexOutBuf(HexOutBuf &&source) : m_fd(source.m_fd), m_own(source.m_own) {
        source.m_fd = INVALID_FD;
    }
    HexOutBuf &operator=(HexOutBuf &&source) {
        if (this != &source) {
            closeFile();
            std::swap(m_fd, source.m_fd);
            std::swap(m_own, source.m_own);
        }

        return *this;
    }

    virtual ~HexOutBuf() {
        closeFile();
    }

    auto IsOpen() const {
        return m_fd != INVALID_FD;
    }

protected:
    static constexpr int INVALID_FD = -1;
    static constexpr int WIDTH = sizeof(char_type) * 2;

    void closeFile() {
        if (IsOpen() and m_own) {
            close(m_fd);
        }
        m_fd = INVALID_FD;
        m_own = false;
    }

    virtual int_type overflow(int_type c) override {
        if (not traits_type::eq_int_type(c, traits_type::eof())) {
            const auto hex_str = ToHex(c, WIDTH);

            if (write(m_fd, hex_str.c_str(), hex_str.size()) == -1) {
                return traits_type::eof();
            }
        }

        return traits_type::not_eof(c);
    }

private:
    int m_fd = INVALID_FD;
    bool m_own = false;
};

The main improvement over the previous version is the added constructors and destructor. One of the constructor takes a file descriptor fd, and assume that file descriptor is owned by someone else. The other constructor takes a few arguments which are used to open() a new file descriptor. The destructor simply close() the current associated file descriptor if it is owned by the stream buffer. This version of HexOutBuf serves as a good example of showing that a stream buffer can either own its associated underlying I/O channel, or not.

Also, since copy and move semantics are not the main topic of this post, I will not discuss them further here. The related functionalities are provided in the above example just for completeness.

A sample program to show a stream buffer that owns the I/O channel:

// hex-out-stream-nobuf-improved-path.cpp

#include "hex-out-stream-nobuf-improved.hpp"
#include "test-utils.hpp"

int main() {
    const char *pathname = "/tmp/hex-out-test-file.txt";

    HexOutBuf buffer {
        pathname, O_CREAT | O_WRONLY | O_TRUNC, S_IRUSR | S_IWUSR | S_IRGRP | S_IWGRP | S_IROTH};

    if (buffer.IsOpen()) {
        std::ostream out(&buffer);
        TestHelper(out);
    }
}

A sample application to show a stream buffer which does not own its I/O channel:

// hex-out-stream-nobuf-improved-fd.cpp

#include "hex-out-stream-nobuf-improved.hpp"
#include "test-utils.hpp"


int main() {
    HexOutBuf buffer {STDERR_FILENO};
    std::ostream out(&buffer);

    TestHelper(out);
}

Output stream

Although, not strictly required, it is convenient to also define a special stream class that mainly forwards the constructor arguments to the corresponding stream buffer. The following example demonstrates that.[1:§15.13.3]

// hex-out-stream.hpp

#pragma once

#include <ostream>

#include "hex-out-stream-nobuf-improved.hpp"

class HexOStream : public std::ostream {
public:
    template<typename... Args>
    HexOStream(Args &&...args) : std::ostream(nullptr), m_buf(std::forward<Args>(args)...) {
        if (m_buf.IsOpen()) {
            rdbuf(&m_buf);
        } else {
            setstate(ios_base::failbit);
        }
    }

private:
    HexOutBuf m_buf;
};

Note, in the member initializer list, a nullptr has to be passed to the base class at first, because at this point, the stream buffer member has not been fully initialized. Later in the body of the constructor, you can associate the fully initialized stream buffer with the stream by calling the member function rdbuf().

setstate() is used to set the stream error flags in the event of a failure.

This output stream may be used like:

// hex-out-stream.cpp

#include "hex-out-stream.hpp"
#include "test-utils.hpp"


int main() {
    HexOStream out {STDERR_FILENO};

    TestHelper(out);
}

That is pretty much all you need to know about the unbuffered output stream buffer, except member function sputc() and sputn(). We will talk about them in the next section, where we are going to look at a stream buffer that actually buffers.

Buffered output stream buffer

The put area is defined by three pointers that can be accessed by the following three member functions:[1:§15.13.3]

  1. pbase(): (“put base”) points at the beginning of the output buffer.
  2. pptr(): (“put pointer”) points at the position that is the next candidate for writing.
  3. epptr(): (“end put pointer”) points to one past the end of the buffer.

Those pointers can be initialized by calling the member function setp(begin, end). The constructor of the base class simply sets no buffer.

pbump(offset) can be used to reposition pptr() by offset characters relative to its current position. The offset may be positive or negative.

The p suffix of above mentioned functions is needed because we can create an iostream derived from both istream and ostream, and such a stream needs to keep track of both a get position and a put position.[2:§38.6.2]

An ostream can send one character to its associated output stream buffer by calling its member function sputc(c). If pptr() != epptr(), that character is copied to *pptr(), then pptr() is incremented. Otherwise, if pptr() == epptr(), overflow() is called.[1:§15.13.3]

An ostream can send multiple characters to the output stream buffer at once by using the member function sputn(s, n), which simply calls the virtual function xsputn(s, n) of the most derived class. The base class version of the function calls sputc() for each character. Often, overriding xsputn() is only necessary if writing multiple characters can be implemented more efficiently than writing characters one at a time.[1:§15.13.3] For example, in some implementations, std::ofstream::write() simply passes the pointer to the suitable system call without intermediate buffering,[3] something equivalent to this:

class filebuf : public std::streambuf {
    virtual std::streamsize xsputn(const char *s, std::streamsize n) override {
        return write(fd, s, n);
    }
};

Also note, there is no corresponding xsputc() virtual function, thus it is not possible to alter sputc() function’s behavior through derived class.

With all this information, now we can implement our buffering stream buffer.

// hex-out-stream-buffer.hpp

#pragma once

#include <unistd.h>

#include <array>
#include <streambuf>

#include "str-utils.hpp"

class HexOutBuf : public std::streambuf {
public:
    using char_type = std::streambuf::char_type;
    using int_type = std::streambuf::int_type;
    using traits_type = std::streambuf::traits_type;

    HexOutBuf(const int fd = STDOUT_FILENO) : m_fd(fd) {
        static_assert(SIZE % WIDTH == 0);

        std::streambuf::setp(m_buffer.begin(), m_buffer.begin() + SIZE / WIDTH - 1);
    }

    virtual ~HexOutBuf() {
        sync();
    }

protected:
    static constexpr int SIZE = 1024;
    static constexpr int WIDTH = sizeof(char_type) * 2;

    auto flushBuffer() {
        const auto n = pptr() - pbase();
        for (int i = n * WIDTH - WIDTH; i >= 0; i -= WIDTH) {
            const auto hex_str = ToHex(pbase()[i / WIDTH], WIDTH);
            std::copy(hex_str.cbegin(), hex_str.cend(), pbase() + i);
        }

        if (write(m_fd, pbase(), n * WIDTH) != n * WIDTH) {
            return false;
        }
        pbump(-n);

        return true;
    }

    virtual int_type overflow(int_type c) override {
        if (not traits_type::eq_int_type(c, traits_type::eof())) {
            *pptr() = c;
            pbump(1);
        }

        return flushBuffer() ? traits_type::not_eof(c) : traits_type::eof();
    }

    virtual int sync() override {
        return flushBuffer() ? 0 : -1;
    }

private:
    std::array<char_type, SIZE> m_buffer {};
    int m_fd = STDOUT_FILENO;
};

Note the -1 when calling setp() in the constructor, that is because, when overflow() gets called, it not only flushes the current content of the buffer, but also the given character. Thus, it is pretty convenient to leave at least one space for this character, so that, it can also be stored in the buffer and the whole buffer can then be written to the output channel with just one system call.

Also note, the write() POSIX API used in flushBuffer(), returns the number of bytes written on success. It is not uncommon for write() to transfer fewer than the requested number of bytes, especially for socket or pipe. Normally, when a partial write happens, the caller should make another write() call to transfer the remaining bytes. However, here, to keep things simple, I just treat all partial writes as errors.

I override the virtual function sync(), as well. For output streams, this function is responsible for flushing the buffer. For the unbuffered versions of the stream buffer, overriding this function is not necessary, because there is no buffer to be flushed. sync() is also called by the destructor to ensure that buffer gets flushed when the stream buffer is destroyed.[1:§15.13.3] sync() returns 0 on success, -1 otherwise. The base class version of this function has no effect, and returns 0.[3]

Although, not implemented in our HexOutBuf, the virtual functions seekoff() and seekpos() may be overridden to allow manipulation of the write position if such operations are also honored by the underlying I/O devices. The base class version of these functions have no effect.

Conclusion

In today’s post, I have shown you a few examples of user-defined output stream buffers, from a simple unbuffered one to a slightly more completed buffered one. The key to implement an output stream buffer is in understanding when and how to override the corresponding virtual functions to manipulate the put area, if any, appropriately. The complete code for this article can be found on my Github.

Unfortunately, output stream buffers are only half of the story. Guess what’s the other half? Yes, the input stream buffers. For various reasons, the input stream buffers are a bit more involved than the output stream buffers. But, do not worry. I will have you covered, in my next post.

References

  1. The C++ Standard Library, Second Edition (#ad) by Nicolai Josuttis
  2. The C++ Programming Language, 4th Edition (#ad) by Bjarne Stroustrup
  3. std::basic_streambuf