Understanding Go: io Package

Translation of one of Ben Johnson's articles from the "Go Walkthrough" series on a more in-depth study of the standard library in the context of real-world tasks.

Go is a well-adapted bytes programming language. Whether you have lists of bytes, streams of bytes, or just individual bytes, Go makes it easy to work with them. These are the primitives on which we build our abstractions and services.

The io package is one of the most fundamental in the entire standard library. It provides a set of interfaces and auxiliary functions for working with byte streams.

This post is one of a series of articles on a more in-depth analysis of the standard library. Despite the fact that standard documentation provides a lot of useful information, in the context of real-world tasks it can be difficult to figure out what to use and when. This series of articles aims to show the use of standard library packages in the context of real-world applications.

Reading bytes

When working with bytes, there are two fundamental operations: reading and writing. Let's first take a look at reading bytes.

Reader Interface

The simplest design for reading bytes from a stream is the Reader interface:

type Reader interface { Read(p []byte) (n int, err error) }

This interface is repeatedly implemented in the standard library for everything in general - from network connections to files and to wrappers for slices in memory .

The reader takes the buffer, p, as an input to the Read () method, so that it does not need to allocate memory. If Read () returned a new slice, instead of taking it as an argument, the reader would have to allocate memory each time Read () was called. It would be a disaster for the garbage collector.

One of the problems with the Reader interface is that it comes with a set of rather ornate rules. First, it returns an io.EOF error during the normal course of business, simply if the data stream has terminated. This can be confusing for newbies. Secondly, there is no guarantee that your buffer will be full. If you sent an 8-byte slice, in fact you can read from 0 to 8 bytes. The processing of reading in parts can be difficult and easily error prone. Fortunately, we have quite a few support functions for solving these problems.

Improving reading guarantees

Imagine that you have a protocol that needs to be parsed and you want to read an 8-byte uint64 value from the reader. In this case, it is preferable to use io.ReadFull (), since you know exactly how much you want to read:

 func ReadFull(r Reader, buf []byte) (n int, err error)

This function checks that the buffer is full before returning a value. If the size of the received data differs from the buffer size, then you will get an io.ErrUnexpectedEOF error. This simple guarantee simplifies the code quite a lot. To read 8 bytes, just do this:

  buf := make([]byte, 8) if _, err := io.ReadFull(r, buf); err != nil { return err }

There are also quite a few higher-level parsers like binary.Read () that can parse certain types. We will get closer to them in the next posts about other packages.

Another slightly less used auxiliary function is ReadAtLeast () :

  func ReadAtLeast(r Reader, buf []byte, min int) (n int, err error)

This function writes readable data into your buffer, but not less than the specified number of bytes. I did not find the need for this function for myself, but I can easily imagine its benefit for cases when you want to reduce the number of Read () calls and buffer additional data.

Thread merging

Often you can meet a situation where you need to combine several readers together. This is easy to do with MultiReader :

  func MultiReader(readers ...Reader) Reader

For example, you want to send an HTTP response in which the header is read from memory, and the content of the response body is from a file. Many people will first read the file into a buffer in memory before sending it, but it is slow and may require a lot of memory.

Here is a simpler approach:

  r := io.MultiReader( bytes.NewReader([]byte("...my header...")), myFile, ) http.Post("http://example.com", "application/octet-stream", r)

MultiReader allows http.Post () to use both readers as one.

Duplication of streams

One of the points that you may encounter when working with readers is that if the data were read, they cannot be read again. For example, your application could not parse the HTTP request body, but you cannot analyze it, because the parser has already read the data, it is no longer in the reader.

TeeReader is a good solution here - it allows you to save the read data, while not interfering with the reading process.

  func TeeReader(r Reader, w Writer) Reader

This function creates a new wrapper reader around your r reader. Any read operation from a new reader will also write data to w. This writer can be anything from a buffer in memory to a log file to the standard error stream STDERR .

For example, you can capture erroneous queries like this:

  var buf bytes.Buffer body := io.TeeReader(req.Body, &buf) // ... process body ... if err != nil { // inspect buf return err }

However, it is important to be careful with the size of the read response body, so as not to waste memory.

Stream Length Restriction

Since streams are not limited in size, sometimes reading from them can lead to problems with memory or disk space. A typical example is a file handler. Usually, there are limits on the maximum size of the loaded file in order not to overflow the disk, but it can be tedious to implement them manually.

LimitReader gives us this functionality by providing a wrapper around the reader that limits the number of bytes available for reading.

  func LimitReader(r Reader, n int64) Reader

One of the moments when working with LimitReader is that it will not tell you if r has subtracted more than n. It will simply return io.EOF as soon as it subtracts n bytes. Alternatively, you can set the limit to n + 1 and then check whether you have read more than n bytes at the end.

Byte writing

Now, after we got acquainted with reading bytes from streams, let's see how to write them to streams.

Interface writer

The Writer interface is essentially an inverted Reader. We specify a set of bytes to be written to the stream:

  type Writer interface { Write(p []byte) (n int, err error) }

In general, writing bytes is a simpler operation than reading. With readers, the difficulty is to work correctly with partial and incomplete readings, but with partial or incomplete writing, we just get an error.

Duplicate recording

Sometimes you need to send data to several writers at once. For example, in a log file and in STDERR. This is similar to TeeReader, only we want to duplicate the record, not the reading.

In this case, we are suitable MultiWriter :

  func MultiWriter(writers ...Writer) Writer

The name may be a little confusing, because it is not quite the writer-version of the MultiReader. If the MultiReader combines several readers into one, the MultiWriter returns a writer that duplicates the entries in all the writers.

I actively use MultiWriter in unit tests, where I want to make sure that the services write to the log correctly:

  type MyService struct { LogOuput io.Writer } ... var buf bytes.Buffer var s MyService s.LogOutput = io.MultiWriter(&buf, os.Stderr)

Using MultiWriter allows me to check the contents of the buf and at the same time see the full output of the logs in the terminal for debugging.

Copy bytes

Now that we’ve dealt with both reading and writing bytes, it’s logical to figure out how we can combine these two operations together and copy the data between them.

Combining readers & writers

The easiest way to copy from the reader to the writer is to use the Copy () function:

  func Copy(dst Writer, src Reader) (written int64, err error)

This function uses a 32 KB buffer to read from src and write to dst. If an error other than io.EOF occurs, the copying stops and the error returns.

One of the problems with Copy () is that you have no way to guarantee the maximum number of bytes copied. For example, you want to copy a log file to its current size. If the log continues to grow while copying, you will get more bytes than expected. In this case, you can use the CopyN () function, which copies no more than the specified number:

  func CopyN(dst Writer, src Reader, n int64) (written int64, err error)

Another important point with Copy () is that each copy is allocated a buffer of 32KB. If you need to do many copy operations, you can reuse the already allocated buffer and use CopyBuffer () :

  func CopyBuffer(dst Writer, src Reader, buf []byte) (written int64, err error)

The overhead of Copy () is actually very small, so I personally do not use CopyBuffer ().

We optimize copying

To avoid the use of an intermediate buffer, data types can implement special interfaces for reading and writing them directly. If they are implemented for a type, the Copy () function will not use a buffer, but will use these special methods.

If the type implements the WriterTo interface, then it can write data directly:

  type WriterTo interface { WriteTo(w Writer) (n int64, err error) }

I used it in the BoltDB Tx.WriteTo () function, which allows users to create a database snapshot from a transaction.

On the other hand, the ReaderFrom interface allows the type to directly read data from the reader:

  type ReaderFrom interface { ReadFrom(r Reader) (n int64, err error) }

Adaptation of readers and writers

Sometimes you find yourself in a situation where you have a function that accepts a Reader, but you only have a Writer. You may want to dynamically write data to an HTTP request, but http.NewRequest () only accepts Reader.

You can invert the writer using io.Pipe () :

  func Pipe() (*PipeReader, *PipeWriter)

Here you get a new reader and writer. Any entry to the PipeWriter will be forwarded to the PipeReader.

I rarely used this function, but exec.Cmd uses it to implement Stdin, Stdout and Stderr pipes, which can be very useful when working with programs that are started.

Closing threads

Everything good comes to an end, and work with threads is no exception. The Closer interface provides a general way to close threads:

  type Closer interface { Close() error }

There’s nothing to write about, the interface is very simple, but I always try to return an error in my Close () methods so that my types implement this interface if necessary. Closer is not always used directly; it is often used in conjunction with other interfaces, such as ReadCloser , WriteCloser, and ReadWriteCloser .

Thread Navigation

Flows are usually constantly appearing data from start to finish, but there are exceptions. A file, for example, can be a stream, but you can also arbitrarily move to any position within the file.

The Seeker interface provides the ability to navigate within the stream:

  type Seeker interface { Seek(offset int64, whence int) (int64, error) }

There are three ways to jump to the desired position: the transition from the current position, the transition from the beginning of the stream and the transition from the end. You specify this method with the argument whence. Argument offset indicates how many bytes to move.

Streaming can be useful if you use blocks of fixed size or if your file contains an index with offsets. Sometimes the data is in the header and it is logical to use the transition from the beginning of the stream, but sometimes the data is in the tail and it is more convenient to move from the end.

Optimization for data types

Reading and writing portions can be tedious if all you need is a single byte or rune . Go has interfaces that make life easier.

Work with individual bytes

The ByteReader and ByteWriter interfaces provide simple methods for reading and writing one byte:

  type ByteReader interface { ReadByte() (c byte, err error) } type ByteWriter interface { WriteByte(c byte) error }

Note that there is no parameter for the number of bytes, it will always be 0 or 1. If the byte has not been read or written, an error is returned.

There is also a ByteScanner interface that allows you to conveniently work with buffered readers for bytes:

  type ByteScanner interface { ByteReader UnreadByte() error }

This interface allows you to return a byte back to the stream. This is convenient, for example, when writing LL (1) parsers, since it allows you to look at the byte forward.

Work with individual runes

If you parse Unicode data, then you should work with runes instead of individual bytes. In this case, you must use the RuneReader and RuneScanner interfaces :

  type RuneReader interface { ReadRune() (r rune, size int, err error) } type RuneScanner interface { RuneReader UnreadRune() error }

Conclusion

Byte streams are important for many Go programs. These are interfaces for everything from network connections to disk files to user input from the keyboard. The io package provides basic primitives for working with all of this.

We looked at reading, writing and copying bytes, as well as optimizing these operations for specific tasks. These primitives may look simple, but they are the basic building blocks for applications that are actively working with data.

Please study the io package carefully and use its interfaces in your programs. Also, I’ll be happy if you share your interesting ways of using the io package, as well as any tips on how to improve this series of articles.

Source: https://habr.com/ru/post/306914/

All Articles