📜 ⬆️ ⬇️

[Translation] Arrays, slices (and lines): The mechanism of 'insertion'

Introduction


One of the most common features of procedural programming languages ​​is the concept of an array. Arrays may seem something simple, but on the other hand, before adding them to a language, several issues need to be solved, such as:

Answers to these questions will define arrays as a simple language feature, or as the main part of its design.

In the early days of Go, finding answers to these questions took years of discussion before their concept began to look the way we thought it needed. The key step was the creation of the concept of slices , which are based on fixed-size arrays in order to create a flexible and extensible data structure. However, many newcomers to Go stumble over the principles of the slices, perhaps this is due to the fact that their experience with other languages ​​has left its mark.

In this publication we will try to dispel all these misunderstandings. We will achieve this piece by piece by explaining how the append function works, and why it works this way and no other way.

Arrays


Arrays are an important piece of the Go language, but, like the foundation of a building, it is hidden under more visible parts. We must get you up to date before moving on to more interesting, powerful, and noticeable slice features.
')
Arrays are not often seen in Go programs, because the size of an array includes its size, which limits its use.

For example the announcement:
var buffer [256]byte 

creates a variable buffer that contains 256 bytes. The type of the buffer variable includes the size and looks like this: [256] byte . An array of 512 bytes will be of type [512] byte .

The data associated with the array is simply an array of elements. Schematically, our buffer in memory will look something like this:
 buffer: byte byte byte ... 256 times ... byte byte byte 

That is, the variable contains 256 bytes of data and nothing more. We can access the elements using the usual indexing syntax buffer [0] , buffer [1] , and so on until buffer [255] (the index from 0 to 255 covers 256 elements). Attempting to go beyond this range will cause the program to crash.

There is a built-in len function that returns the number of elements in an array, a slice, and some other types. Obviously, what exactly will return len for an array. In our case, len (buffer) will return the value 256.

For arrays you can find your place of application. For example, they are good for transforming matrices, but their most frequent use in Go is storing slices.

Slices: Slice Header


Cuts where something interesting happens, but before you start using them, you will need to understand their need and what they are doing.

A slice is a data structure that describes the multiple division of an array and is stored separately from a variable. Slice is not an array . A slice describes a portion of the array.

If we take the buffer array from the previous section, then we can create a slice that will describe elements from 100 to 150 (to be exact, then from 100 to 149 inclusive) by cutting the array:
 var slice []byte = buffer[100:150] 

In this piece of code, to be precise, we used the full declaration of the variable. The variable slice is of type [] byte , reads as a “slice of bytes” and is created from the buffer array, by cutting starting from element 100 (inclusive) to 150 (exclusively). In a more “canonical” syntax, we would omit the type that will be defined during the initialization process:
 var slice = buffer[100:150] 

And inside the function, we would use the short form of the declaration:
 slice := buffer[100:150] 

What is a slice? This is not a complete description, but from now on, think of a slice as a small structure consisting of two elements: a length and a pointer to an array element. Consider that "behind the scenes" it looks like this:
 type sliceHeader struct { Length int ZerothElement *byte } slice := sliceHeader{ Length: 50, ZerothElement: &buffer[100], } 

Of course, this is just an illustration. Despite this, from this example it can be understood that the sliceHeader structure is inaccessible to the programmer, and the type of the pointer depends on the type of elements, but it makes it possible to understand the basic idea of ​​the mechanics of the slices.

So far, we have used the array cutting operation, however we can cut and cut:
 slice2 := slice[5:10] 


Just as before, this operation creates a new slice, but in this case from elements from 5 to 9 (inclusive) relative to the original slice, which means elements from 105 to 109 of the original array. The basic sliceHeader structure for the slice2 variable will look like this:
 slice2 := sliceHeader{ Length: 5, ZerothElement: &buffer[105], } 

Notice that the header still points to the underlying array in the buffer variable.

We can also re-cut , which means to cut a slice and save the result in the structure of the slice cut. Those. after:
 slice = slice[5:10] 

the sliceHeader structure for the slice variable will look the same as for the slice2 variable. You will often see re-slicing, for example to shorten the cut. In this example, the first and last element of our slice will be omitted:
 slice = slice[1:len(slice)-1] 

You can often hear from experienced Go programmers about the “slice header” because this is what is stored in the slice variable. For example, when you call a function that takes a slice as an argument, such as bytes.IndexRune , then a header will be passed to the function. In this example:
 slashPos := bytes.IndexRune(slice, '/') 

the slice argument will be passed to the IndexRune function and, in fact, this is just a “slice header”.

There is another data element in the "slice header" that we will discuss below, but first, let's take a look at what the "slice header" means when you write a program that uses slices.

Passing Cuts to Functions


It is very important to understand that even if the slice contains a pointer, in itself it is a value. Under the hood, it is a structure containing a pointer and a length. This is not a pointer to a structure.

It is important.

When we call IndexRune in the previous example, it takes a copy of the “header top”. This behavior has an important consequence.

Consider a simple function:
 func AddOneToEachElement(slice []byte) { for i := range slice { slice[i]++ } } 

She does exactly what is written in the title, i.e. bypasses all elements of the slice (using the for range loop), increasing its elements.

Try:
 func main() { slice := buffer[10:20] for i := 0; i < len(slice); i++ { slice[i] = byte(i) } fmt.Println("before", slice) AddOneToEachElement(slice) fmt.Println("after", slice) } 

Despite the fact that the “ slice header ” is passed to the function, it includes a pointer to the array elements, therefore, the original slice header and its copy describe the same array. As a consequence, when the function is completed, the changed elements can be seen through the original slice.

The function argument is actually a copy, and this example shows it:
 func SubtractOneFromLength(slice []byte) []byte { slice = slice[0 : len(slice)-1] return slice } func main() { fmt.Println("Before: len(slice) =", len(slice)) newSlice := SubtractOneFromLength(slice) fmt.Println("After: len(slice) =", len(slice)) fmt.Println("After: len(newSlice) =", len(newSlice)) } 

Here we see that the contents of the argument can be changed through the function, but not its title . The length is stored in the variable slice and does not change as a result of a function call, since a copy of the slice header, not the original, is passed to the function. Thus, if we want to write a function that changes the title, we must return it, as we did in the example. The variable slice does not change, but the return value has a new length, which is stored in newSlice .

Pointers to slices: Methods for obtaining


There is another way of writing a function that changes the title of the slice, and this is the transfer to the function a pointer to the slice. Here is a variation of our example, to demonstrate this possibility:
 func PtrSubtractOneFromLength(slicePtr *[]byte) { slice := *slicePtr *slicePtr = slice[0 : len(slice)-1] } func main() { fmt.Println("Before: len(slice) =", len(slice)) PtrSubtractOneFromLength(&slice) fmt.Println("After: len(slice) =", len(slice)) } 

The example may seem a bit clumsy, given the additional level of abstraction (temporary variable), but there is one case where you will use pointers to slices. It is accepted to use the pointer as a receiver when writing a method that changes the slice.

Let's say we wanted a method that eliminates the last slash. We can write it like this:
 type path []byte func (p *path) TruncateAtFinalSlash() { i := bytes.LastIndex(*p, []byte("/")) if i >= 0 { *p = (*p)[0:i] } } func main() { pathName := path("/usr/bin/tso") //     path pathName.TruncateAtFinalSlash() fmt.Printf("%s\n", pathName) } 

If you run the example, you will see what happens, what is required, that is, the method will change the slice.

On the other hand, if we want to write a method for path that sets upper case of ASCII characters (no behavior is not defined in English letters), the method can operate on a value, not a pointer, because the value of the receiver still points to the array we need.
 type path []byte func (p path) ToUpper() { for i, b := range p { if 'a' <= b && b <= 'z' { p[i] = b + 'A' - 'a' } } } func main() { pathName := path("/usr/bin/tso") pathName.ToUpper() fmt.Printf("%s\n", pathName) } 

Here, the ToUpper method uses two variables in for range in order to use the index of the element and, directly, the slice element itself. This will avoid re-writing to p [i] .

Capacity


Consider the following function, which increases the ints slice by one element:
 func Extend(slice []int, element int) []int { n := len(slice) slice = slice[0 : n+1] slice[n] = element return slice } 

Now run:
 func main() { var iBuffer [10]int slice := iBuffer[0:0] for i := 0; i < 20; i++ { slice = Extend(slice, i) fmt.Println(slice) } } 

Let's see how the cut grows until ... it does not grow.

It's time to talk about the third component of the slice header: its capacity . In addition to the pointer to the array and its length, the slice header contains its capacity:
 type sliceHeader struct { Length int Capacity int ZerothElement *byte } 

The Capacity field contains a record of how much space the array actually occupies - this is the maximum value that Length can reach. An attempt to increase the slice above its capacity will lead to going beyond the array and will cause an emergency program termination.

This example creates a slice
 slice := iBuffer[0:0] 

and its title looks like:
 slice := sliceHeader{ Length: 0, Capacity: 10, ZerothElement: &iBuffer[0], } 

The Capacity field is equivalent to the length of the original array minus the index of the array element, which is the first slice element (zero in this case). If you want to know the capacity of the slice, then use the function cap :
 if cap(slice) == len(slice) { fmt.Println("slice is full!") } 


Make


What if we want to increase the cut more than its capacity? We can not! By definition, capacity is the limit of growth. But you can get the same result by creating a new array, copying the data and changing the slice describing the new array.

Let's start with the selection. We can use the new function to select a larger array and as a result of a larger slice, but it will be easier to use the make function. It allocates a new array and creates a slice header. The make function has three arguments: the type of the slice, the initial length, and its capacity, where the length of the array is what make allocates for the slice data. As a result, this function call creates a slice of length 10 and the possibility of expanding by 5 (15-10), which you can see by running this:
  slice := make([]int, 10, 15) fmt.Printf("len: %d, cap: %d\n", len(slice), cap(slice)) 

This fragment doubles the capacity of our int slice, but leaves the same length:
  slice := make([]int, 10, 15) fmt.Printf("len: %d, cap: %d\n", len(slice), cap(slice)) newSlice := make([]int, len(slice), 2*cap(slice)) for i := range slice { newSlice[i] = slice[i] } slice = newSlice fmt.Printf("len: %d, cap: %d\n", len(slice), cap(slice)) 

After this, the cut has much more room to grow before it needs another redistribution.

When creating cuts, it often happens that the length and capacity are the same. The make function has an abbreviated version. The default length becomes capacity, so you can specify them in one value. After
 gophers := make([]Gopher, 10) 

at the cut of gophers the length and capacity will be equal to 10.

Copying


After we doubled the capacity of our slice in the previous section, we rewrote the cycle to copy the old data into the new slice. Go has a built-in copy function that simplifies this task. Her arguments are two slices and she copies the data from the right to the left. Here is our example rewritten to use copy :
  newSlice := make([]int, len(slice), 2*cap(slice)) copy(newSlice, slice) 

The copy function is smart. It copies only what it can, paying attention to the length of both arguments. In other words, the number of elements to be copied is equal to the minimum of the lengths of both slices. This can save a bit of “bureaucracy”. In addition, copy returns an integer value - the number of elements that were copied, although this is not always worth checking.

The copy function also takes into account cases where the source and receiver intersect (note lane is like memmove () in C), which means that the function can be used to move elements inside one slice. Below is an example of how to use the copy function to insert a value in the middle of the slice
 //        , //      . //        . func Insert(slice []int, index, value int) []int { //      slice = slice[0 : len(slice)+1] //  copy           copy(slice[index+1:], slice[index:]) //   . slice[index] = value //  . return slice } 

There are a few points you can notice in this function. First, obviously, it must return the cut, because its length has changed. Secondly, a convenient abbreviation is used. Expression
 slice[i:] 

means the same as
 slice[i:len(slice)] 

In addition, we did not use another trick; we can also leave the first element of the expression empty; by default it will be zero. In this way
 slice[:] 

Means just a slice of oneself, which is useful when slicing an array. This expression is the shortest to say: “a slice that describes all the elements of the array”:
 array[:] 

But it was between times, let's try our function Insert .
  slice := make([]int, 10, 20) // ,  capacity > length:     . for i := range slice { slice[i] = i } fmt.Println(slice) slice = Insert(slice, 5, 99) fmt.Println(slice) 


Insert: Example


Several sections ago, we wrote the Extend function, which expanded the slice by one element. It was wrong, at least for the reason that in the case where the cutoff capacity was too small, the function could fail with an error (in our example, the Insert function is subject to the same problem). Now we know how to fix it, so let's write a reliable implementation of the Extend function for integer slices.
 func Extend(slice []int, element int) []int { n := len(slice) if n == cap(slice) { //  ;  . //       1,     ,      newSlice := make([]int, len(slice), 2*len(slice)+1) copy(newSlice, slice) slice = newSlice } slice = slice[0 : n+1] slice[n] = element return slice } 

In this case, it is especially important to return the slice, since when we redistributed it, the slice that we got describes a completely different array. Here is a small piece to show what happens if the cut is filled:
  slice := make([]int, 0, 5) for i := 0; i < 10; i++ { slice = Extend(slice, i) fmt.Printf("len=%d cap=%d slice=%v\n", len(slice), cap(slice), slice) fmt.Println("address of 0th element:", &slice[0]) } 

Note the redistribution when the original array of size 5 is filled. The capacity and address of the zero element changes when a new array is allocated.

Using the robust Extend function as a basis, we can write an even better function that will allow us to extend the slice with several elements. To do this, we use the Go option by reversing the list of arguments in an array. That is, we use the Go feature to use a variable number of function arguments.

Let's call the function Append . For the first version, we can simply call Extend until the function arguments have run out. The prototype for the Append function looks like this:
 func Append(slice []int, items ...int) []int 

This tells us that Append takes one argument, a slice, followed by an int argument from zero to infinity. These elements are future pieces of the slice, as you can see:
 // Append    . //  :    Extend. func Append(slice []int, items ...int) []int { for _, item := range items { slice = Extend(slice, item) } return slice } 

Notice that the for range loop is executed for each element of the items argument, which is of type [] int . Also note the use of an empty identifier _, which drops the index, since we do not need it in the loop.

Try it:
  slice := []int{0, 1, 2, 3, 4} fmt.Println(slice) slice = Append(slice, 5, 6, 7, 8) fmt.Println(slice) 

Another new technique in this example is that the slice initialization is produced by a compound literal, which consists of the type and slice elements enclosed in braces:
  slice := []int{0, 1, 2, 3, 4} 

The function Append is also interesting for another reason. We can not just add elements, we can pass whole slices as arguments of the function using ...:
  slice1 := []int{0, 1, 2, 3, 4} slice2 := []int{55, 66, 77} fmt.Println(slice1) slice1 = Append(slice1, slice2...) // '...' ! fmt.Println(slice1) 

Of course, we can make Append more efficient, by single selection, based on Extend :
 // Append    . //  . func Append(slice []int, elements ...int) []int { n := len(slice) total := len(slice) + len(elements) if total > cap(slice) { // .    1.5 ,     . newSize := total*3/2 + 1 newSlice := make([]int, total, newSize) copy(newSlice, slice) slice = newSlice } slice = slice[:total] copy(slice[n:], elements) return slice } 

Notice that here we use copy twice to move the data slice to a new memory location and then copy the added elements to the end of the old data.

Try it, the behavior is the same as before:
  slice1 := []int{0, 1, 2, 3, 4} slice2 := []int{55, 66, 77} fmt.Println(slice1) slice1 = Append(slice1, slice2...) // '...' ! fmt.Println(slice1) 


Append: Built-in function


So, we came to the conclusion that in Go you need to add the built-in function append . It does the same thing as our function Append from the example, with the same efficiency, but it works for any type of slices.

The weakness of Go is that any operation defined on a "generic type" must be provided at run time. Someday this may change, but now, in order to simplify working with slices, Go provides the built-in common append function. It works the same as our integer version, but for any type of slice.

Remember that since the slice header is updated every time you call append , you need to save the resulting slice after the call. In fact, the compiler will not allow you to call append without saving the result.

Here are some single lines with a conclusion. Try them, change and explore:
  //    . slice := []int{1, 2, 3} slice2 := []int{55, 66, 77} fmt.Println("Start slice: ", slice) fmt.Println("Start slice2:", slice2) //    . slice = append(slice, 4) fmt.Println("Add one item:", slice) //     . slice = append(slice, slice2...) fmt.Println("Add one slice:", slice) //    (int). slice3 := append([]int(nil), slice...) fmt.Println("Copy a slice:", slice3) //      . fmt.Println("Before append to self:", slice) slice = append(slice, slice...) fmt.Println("After append to self:", slice) 


It is worthwhile to stop and think about the last lines of the example and understand how the slice design allows you to make such simple calls correct.

There are many examples on the " Slice Tricks " wiki (community created) that use append , copy, and other ways to use slices.

Nil


In addition, using the newly acquired knowledge, we can understand what a “zero” ( nil ) slice is. Naturally, this is the zero value of the slice header:
 sliceHeader{ Length: 0, Capacity: 0, ZerothElement: nil, } 

or simply
 sliceHeader{} 

The key is that the pointer is also nil . This cut
 array[0:0] 

It has zero length (and may even have zero capacity), but its pointer is not nil , so this is still not a zero slice.

Obviously, an empty cut can grow (assuming that it is not a zero capacity), but the “zero” ( nil ) cut does not contain an array where values ​​can be put and cannot grow, even to contain at least one element.

It should be noted that the “zero” ( nil ) cut is, in fact, equivalent to a cut of zero length, even if it does not indicate anything. It has zero length and you can add something to it with a selection. , , , «» ( nil ) .

Strings


Go .

, : , .

, ( ), .

, , :
 slash := "/usr/ken"[0] //   '/' 

:
 usr := "/usr/ken"[0:4] //   "/usr" 

, , .

- , :
 str := string(slice) 

- :
 slice := []byte(usr) 

, , . , . Go , - . , .

, . , . , - .

: , , . .

, , .

Conclusion


, . , , . , , .

, , , , copy append .


blog.golang.org/slices

Source: https://habr.com/ru/post/202948/


All Articles