Welcome to new things

[Technical] [Electronic work] [Gadget] [Game] memo writing

How to use Go language (string, byte, rune)

You will always find string, byte, and rune in introductory Go language books.

However, when you start to program, the frequent interconversions between each of them can be confusing.

In such cases, we tend to somehow convert the type and assume it is OK because it worked well, but I would like to summarize the usage of string, byte, and rune here so that we can convert types with a little more understanding of their meaning.

What is byte?

byte is another name for uint8.

In many cases where binary data is handled in the program, the binary data is stored in the uint8 array.

However, if you write unit8, it will look like uint16, uint32, etc. in parallel with other types. To make it easier to understand that it is binary data, byte is used when handling binary data.

What is string?

A string is a special language built-in structure that has a string region of the byte array and the length of the region array.

type StringHeader struct {
    Data uintptr
    Len  int
}

Of particular note is that once a string is created, its content is fixed for life and cannot be changed later, which is guaranteed by the language.

For example, it is not possible to change a single character in the middle of a line. The following operations will result in an error

a := "ABC"

// NG
a[0] = 'a'

String operations always create a new string. In the following example, the area of a is not reused, and three strings, 123, ABC, and 123ABC are created in memory.

a := "123"
b := "ABC"
a = a + b

It is not possible to add a slice after apend() as in the case of slices. The following operations will result in an error.

a := "123"
b := "ABC"

// NG
a = append(a, b)

As this shows, string operations are not very efficient.

What is rune?

The character encoding of strings stored in string is UTF-8.

UTF-8 is a variable-length code where each character has a different length.

The length of the string is the length of the byte array and does not necessarily correspond to the actual number of characters.

a := "ABCあいう"
fmt.Println(len(a)) // 12

On the other hand, there is a character code called UTF-32, which represents all characters with a fixed length of 4 bytes.

In programs, there is often a need to refer to individual characters from a string, such as to find the number of characters in a string or to extract a character at a specific position.

To meet such demands, the Go language converts UTF-8 strings to UTF-32 strings and treats them as UTF-32 characters when referring to individual characters in strings.

The UTF-32 character is called rune in the Go language.

For example, the number of characters in a string can be counted in the following way

a := "ABCあいう"
r : = []rune(a)
fmt.Println(len(r)) // 6

Converting a UTF-8 string to a UTF-32 string is done by Go by converting the string to a slice of rune.

The result of the conversion is stored in the rune slice.

Also, when range is used to call "string", the conversion is performed and the rune slice is automatically generated behind the scenes.

a := "ABCあいう"
for _, r := range a {
    fmt.Printf("%c", r) // ABC-A
}

In the above example, range generates a slice of rune from the string, from which rune is extracted and displayed one character at a time.

Also, rune is not a string. To display rune as a character, use %c.

intertype conversion

Normally, type conversion from a slice of one type to a slice of another type is not possible, but type conversion from a string to a byte-rune slice and from a byte-rune slice to a string is specifically defined in the language.

string to byte

Type conversion from "string" to []byte copies the string data directly to []byte.

The data is a copy, so if you convert a string to []byte, even if you change the value of []byte, the value of the original string remains the same.

s := "ABC"
b := []byte(s)
b[0] = 'a'
fmt.Printf("%s\n", b) // aBC
fmt.Printf("%s\n", s) // ABC

byte to string

Type conversion from []byte to a string generates a new string based on the data in []byte.

Again, the string data is copied from []byte, so even if the value in []byte is changed after conversion, the generated string value remains the same.

b := []byte{'A', 'B', 'C'}
s := string(b)
b[0] = 'a'
fmt.Printf("%s\n", b) // aBC
fmt.Printf("%s\n", s) // ABC

To display a string stored in []byte, either convert it to a string or use %s to display []byte as a string.

b := []byte{'A', 'B', 'C'}
s := string(b)
fmt.Printf("%s\n", s) // ABC
fmt.Printf("%s\n", b) // ABC

string to rune

It is easy to get confused: a string is a character string, and rune represents a single character.

Therefore, the concept corresponding to string is []rune, not rune.

As mentioned above, type conversion of a string to []rune generates a new slice of rune with the string converted to UTF-32.

// rune slice
rs := []rune("ABC")
for _, r := range rs {
    fmt.Printf("%c", r) // ABC
}

The conversion to a single character rune is done by type conversion of a single UTF-8 character.

A single UTF-8 character is converted to a single UTF-32 character to become rune.

Strings enclosed in double quotation marks are strings and cannot be converted to a single character rune.

Even characters enclosed in single quotation marks cannot be converted to a single character rune, because two or more characters enclosed in single quotation marks become a string.

// OK
r := rune('A')

// NG
r = rune("A")

// NG
r = rune('AB')

rune to string

You can convert []rune to string by type conversion.

A UTF-32 string is converted to a UTF-8 string, and a new string is generated with the data.

r := []rune{'A', 'B', 'C'}
s := string(r)
fmt.Println(s) // ABC

The single character rune can also be converted to a string. In this case, a new single-character string is created.

r := rune('A')
s := string(r)
fmt.Println(s) // A

Convenience comes at a cost.

Type conversion from and to string is costly because it always involves creating a new array and copying the values into that array.

Also, string operations are costly because a new string is always created, so here, too, a new array is created and the values are copied into that array.

Normally, you do not need to consider such costs, but if the costs become negligible, you may want to start thinking about using []byte instead of strings.

When you start using []byte to handle strings, you will need to perform some troublesome operations that strings have been doing for you behind the scenes. Common operations are provided in the following package as functions, so it may be a good idea to take a quick look at them before attempting to do them yourself.

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com