The author of the article about buffers, streams and binary data in Node.js, the translation of which we publish, says that he understands the feelings of those novice developers who have no special education, to whom all these entities seem mysterious and incomprehensible. According to him, this may force beginners to put on the back of their efforts to deal with the internal mechanisms of Node, referring to the fact that all this is intended not for them, but only for high-class professionals, but for package developers. Today, he is going to rectify the situation and help all those who wish to understand the essence of buffers, streams and binary data in Node.js and learn how to work with all this.
About Node internal mechanisms
Unfortunately, many Node.js manuals and books do not pay enough attention to the internal mechanisms of this platform, do not seek to explain the purpose of their existence. As a rule, in such publications everything comes down to stories about developing web applications using ready-made packages, without going into details of their implementation. And in some places even brazenly declares that the reader does not need to understand all this, since he most likely will never have to work, say, with objects of the Buffer class directly.
For someone who does not plan to go on using ready-made libraries in their projects, this approach is probably justified. But those in whom puzzles awaken curiosity, those who want to bring their own understanding of JS to a new level, should dig deeper and deal with the many internal features of Node.js, such as the
Buffer
class.
')
You can read the following in the
official Node.js
documentation on the
Buffer
class:
Before the appearance of the TypedArray object in ECMAScript 2015 (ES6), JavaScript had no mechanism for reading binary data streams or for performing other operations with them. The Buffer class was introduced as part of the Node.js API, which allows you to interact with arbitrary binary data streams in a context, for example, TCP streams and file system operations.Yes, if you did not know those words that are found in this definition, then you may perceive it as an unintelligible bunch of programmer jargon. Let us try to simplify all this a little, paraphrasing this definition, so that we can work with it, without distracting anything. From this definition, you can make the following:
The Buffer class was introduced as part of the Node.js API, which allows working with binary data streams.So now everything looks a little easier. But “class Buffer”, “streams”, “binary data” - there are still too many complicated concepts. We will try to deal with them, starting with the last one.
What is binary data?
You may already know that computers store and present data in binary form. Binary data is simply a set of ones and zeros. For example, here are five different sets of binary data made up of the values ​​“1” and “0”:
10, 01, 001, 1110, 00101011
Each number in a binary value, each value "1" and "0" in the set, is called a bit (Bit, Binary digIT, binary digit).
In order to work with some data, the computer must convert this data into their binary representation. For example, in order to save a decimal number of 12, the computer must convert it to binary form, namely, to 1100.
How does the computer know how to make such transformations? This is pure mathematics. This is a binary number system that is taught in schools. There are rules for converting decimal numbers into binary ones and the computer understands these rules.
However, numbers are not the only data type with which we work. We have lines, images, and even videos. The computer knows how to represent in binary form any data types. Take, for example, strings. How does a computer present the string “L” in binary form? In order to save a string in binary form, the computer first needs to convert the characters of this string to numbers, and then convert these numbers to their binary representation. So, in the case of our string of one character, the computer first needs to convert “L” to the number that represents this character. Let's see how this is done in javascript.
Open the browser developer tools console and paste this code there:
"L".charCodeAt(0)
Now press
Enter
. What did you see? The number 76? This is the so-called numerical representation, or the code, or the code point of the character L. But how does the computer know which number corresponds to a certain character? How does he know that the number 76 corresponds to the letter L?
Character sets
Character sets are predefined rules regarding the matching of characters to their numeric codes. There are many varieties of such rules. For example, very popular ones are Unicode and ASCII. JavaScript is very good at working with Unicode character sets. In fact, it is the Unicode character table that is used in the browser to convert the L character to the number 76, and the corresponding rule is written in it.
So, we saw how the computer represents characters as numbers. Now let's talk about how the number 76 turns into its binary representation. It may seem that it is enough to convert 76 from decimal to binary number system, but not everything is so simple.
Character encoding
Just as there are rules that indicate that characters match their numeric codes, there are rules for converting numbers into their binary representation. In particular, they relate to how many bits should be used to represent a number. This is called character encoding.
One of the character encoding rule sets is called UTF-8. UTF-8 defines the rules for converting characters to bytes. A byte is a set of eight bits - eight ones and zeros. So, to represent the code point of any character, a set of eight ones and zeros should be used. Let's deal with this statement.
As already mentioned, the binary representation of the decimal number 12 is 1100. So, when UTF-8 indicates that the number 12 must be represented by an eight-bit value, this means that the computer needs to add a few bits to the left of the actual binary representation of the number 12 in order to represent it as one byte. As a result, 12 should be stored as 00001100. And the number 76 will look like 01001100.
This is how a computer stores strings or individual characters in binary form. By analogy with this, the machine has special rules for converting images and video to binary form. The point of all this is that the computer stores in binary form all data types and all this is called binary data.
If you are interested in the subtleties of character encodings, take a look at
this material , in which all this is disclosed in detail.
Now we understand what binary data is, but what are the binary data streams we mentioned above?
Flow
A stream in Node.js is a sequence of data moving from one place to another. Data transfer is not instantaneous, it takes some time. The basic idea here is that streams allow you to process large data sets in parts.
If we recall some things from the definition of a buffer, namely, the fact that “binary data streams ... in the context of ... the file system” is mentioned there, we can understand that we are talking about moving binary data files, for example, when reading these files for later work with their contents. Let's say we read the text from
file1.txt
, convert it and save it to
file2.txt
.
And where does the buffer? How does it work with binary data in the form of a stream?
Buffer
Recall that a “data stream” is the movement of data from one place to another. Now let us ask ourselves how exactly this data is transferred.
Usually, data movement is performed in order to, at a minimum, read them and make them available for possible subsequent processing. Say, in order to make a decision based on the data. Computer processing speed is limited, so you can talk about some of the framework, representing the minimum and maximum amount of data that a process can process for a certain period of time. So, if the data arrival rate is greater than the rate at which they are consumed, the redundant data needs somewhere to wait for its turn for processing.
On the other hand, if the system is able to process data faster than it arrives, then a certain amount of data arriving earlier than the next session of processing a certain data packet can be started, you need to wait for some more data to arrive before all of them are sent for processing. .
This “waiting area” is the buffer! The physical representation of the buffer can be the space in the RAM, where data, when working with a stream, is temporarily accumulated, waiting for its turn, and eventually sent for processing.
All this can be imagined as a bus station. At some stations, buses cannot be sent until they have a certain number of passengers, or until the time of departure. In addition, passengers can arrive at the station at different speeds. At the same time, nobody clearly controls the arrival of passengers at the station.
In any case, passengers who arrived before the departure of the bus must wait until the station administration decides that it is time for their bus to leave. And passengers who arrived when the bus is already full, or when it has already departed, must wait for the next bus.
In any case, we are talking about a kind of "waiting room". The buffer in Node.js plays the same role. Node.js cannot control the speed of data arrival or the time of their arrival. He can only make decisions on whether to send data that has already arrived for processing. If the time of sending data for processing has not yet come, Node.js will put them in the buffer - in the "waiting area".
A typical example in which you may encounter a buffer in action is watching videos on the Internet. If your Internet connection is fast enough, the streaming speed is high enough to immediately fill the video player's buffer and allow the player to show the video, then fill the next buffer and send it to view - and so on until the video transfer is completed. Here is an example of a system in which data arrives faster than it is processed.
However, if the connection does not differ at a special speed, after processing the first set of arrived data, the player will show a data download icon, or display “buffering”, which means that it waits for more data to arrive before the video starts. And when the buffer is filled and the data received in it are processed, the player displays the video. In the process of playing video, new data will arrive and wait for their turn in the buffer. This is the case when the system is able to process data faster than it enters it.
If the player has completed playback of the data received earlier, and the buffer has not yet been filled, the inscription “buffering” will appear again, the system will wait for the amount of data it needs. In fact, in Node, working with buffers looks like this.
From the initial definition of the buffer, you can see that when the data is in the buffer, we can work with it. What can be done with raw binary data?
Work with buffers
The implementation of the buffer in Node.js gives us a lot of options for working with data. In addition, you can create buffers yourself, setting their characteristics. So, in addition to the buffer that Node.js will automatically create during data transfer, you can create your own buffer and manipulate it. There are different ways to create buffers. Take a look at some of them.
// 10. // 10 . const buf1 = Buffer.alloc(10); // . const buf2 = Buffer.from("hello buffer");
After creating the buffer, you can start working with it.
// buf1.toJSON() // { type: 'Buffer', data: [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ] } // buf2.toJSON() // { type: 'Buffer', data: [ 104, 101, 108, 108, 111, 32, 98, 117, 102, 102, 101, 114 ] } // toJSON() Unicode // buf1.length // 10 buf2.length // 12. , , . // buf1.write("Buffer really rocks!") // buf1.toString() // 'Buffer rea' // , buf1 10, 10
Results
Now that you understand what a “buffer”, “stream”, and “binary data” are, you can open the buffer
documentation and experimentally experiment with all that is being said.
In addition, in order to see how the buffers work in practice, read the
source code of the zlib.js library . This is one of the Node.js core libraries. Look at how in this library buffers are used to interact with binary data streams. Here work is being done with files representing gzip archives.
We hope that what you learned from this material, what you found in the documentation, and learned from analyzing the code, will help raise your professional level and will be useful to you in your projects.
Dear readers! What do you think, what basic things concerning Node.js should you pay attention to novice developers?
