Welcome to new things

[Technical] [Electronic work] [Gadget] [Game] memo writing

Go Language Regular Expression and String Manipulation Memo

The Go language is often programmed by combining simple functions, and error checking occurs for each function, so the code in general tends to be long.

Even when I want to do something small, I need to write code accordingly, but it is tedious to write from scratch every time, so I would like to write down the processes I often do for my own copy and paste.

Here are my own notes on how to use regular expressions and string manipulation in the Go language.

(computer) regular expression

I will start with the most elaborate one below.

"All matches retrieved, with submatches."

  • Create a regular expression object with regext.MustCompile()
  • Get all match locations and submatch strings with Regexp.FindAllStringSubmatch()
  • A match result is stored in []string, where index 0 is the whole match, and 1 and after are submatches
  • Retrieves all match results, so the result is returned as [][]string, which is a slice of one match result ([]string)
  • The second argument of Regexp.FindAllStringSubmatch() is the number of matches to retrieve, or -1 to retrieve all matches
str := `
2020-01-01_test
# comment
2020-02-02_TEST
2020-03-03_TEST`

re := regexp.MustCompile(`(\d{4})-(\d{2})-(\d{2})_(TEST)`)

res := re.FindAllStringSubmatch(str, -1)

for _, v := range res {
    fmt.Println(v)
    // [2020-02-02_TEST 2020 02 02 TEST]
    // [2020-03-03_TEST 2020 03 03 TEST]
}

flag

When using the search flag, directly mention "(?<flag>)" in the regular expression definition.

  • m

    • If the m flag is not set, ^ and $ are the beginning and end of the input string
    • If the flag m is set, ^ and $ indicate the beginning and end of a line
  • i

    • If the i flag is not set, it is case-sensitive
    • If the i flag is set, it is case-insensitive
str := `
2020-01-01_test
# comment
2020-02-02_TEST
2020-03-03_TEST`

re := regexp.MustCompile(`(?mi)(\d{4})-(\d{2})-(\d{2})_(TEST)$`)

res := re.FindAllStringSubmatch(str, -1)

for _, v := range res {
    fmt.Println(v)
    // [2020-01-01_test 2020 01 01 test]
    // [2020-02-02_TEST 2020 02 02 TEST]
    // [2020-03-03_TEST 2020 03 03 TEST]
}

substitution

  • Replace with Regexp.ReplaceAllString()
  • The match string is represented by $0, $1... ....
str := `2020-01-01_test`

re := regexp.MustCompile(`(\d{4})-(\d{2})-(\d{2}).+$`)

res := re.ReplaceAllString(str, "$1/$2/$3")

fmt.Println(res) // 2020/01/01

All" and "String" in function name

Function names are available with or without "All" and with or without "String" versions.

Example: FindStringSubmatch(), FindAllSubmatch(), etc.

All

If you want to get only the first match, leave "All" off.

String

Due to Go language specifications, when []byte is converted to string, the data is copied from []byte to string. The data is also copied in the opposite direction.

If data is handled in []byte, it is inefficient to convert it to string for each search, since this would cause copying, Therefore, if you use the function without "String", you can handle input/output data as []byte instead of string.

Just check to see if it matches.

You can use Regexp.MatchString() only to check matches

str := "TEST"

re := regexp.MustCompile(`(?i)test`)

res := re.MatchString(str)

fmt.Println(res) // true

Split string by delimiter (Split)

str := `a, b; c| d e`

re := regexp.MustCompile(`[,;\|\s]+`)

res := re.Split(str, -1)

fmt.Println(res) // [a b c d e]

Text column operations (strings)

String manipulation using the strings package is also often used, so we will summarize it here.

Extract characters from a string by index

string holds data in []byte.

When string is accessed by index, it points to the data of byte at the specified position, not to the character at the specified position.

Also, len() in string returns bytes, not characters.

To extract a character from []byte of string by index, it is easy to break it down into a character (rune) by casting once.

str := "日本語"

runes := []rune(str)

for i := 0; i < len(runes); i++ {
    fmt.Println(string(runes[i]), " --- ", string(runes[i:]))
    // Japanese --- Japanese
    // Ben --- This language
    // Language --- Language
}

However, casting from string to a character (rune) will cause a copy of the string data." Conversion from []rune to string also causes a copy of the data.

Remove blanks before and after

The strings.TrimSpace() will remove whitespace before and after a string. Whitespace is considered to be whitespace, including full-width spaces and tabs.

str := " \t TEST "
fmt.Println(strings.TrimSpace(str)) // TEST

upper-case and lower-case conversion

str := "TEST"
fmt.Println(strings.ToLower(str)) // test

str = "test"
fmt.Println(strings.ToUpper(str)) // TEST

Check if the string contains

If you want to check whether a string contains a certain string or not, but not enough to use a regular expression, you can use strings.Index().

Since strings.Index() returns the position that matches and -1 if it does not match, you can determine if a string is included by looking at whether it is -1.

str := "TEST"
fmt.Println(strings.Index(str, "E")) // 1
fmt.Println(strings.Index(str, "e")) // -1

coupling

You can use strings.Join() to combine strings.

chars := []string{"T", "E", "S", "T"}
fmt.Println(strings.Join(chars, ",")) // T,E,S,T

Impressions, etc.

Unlike C, Go language guarantees that strings are immutable (unchangeable), so any cast to a slice or from a slice to a string that has access to data, I should note that there will always be a copy of the data.

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com