I have a slice of strings, and within each string contains multiple key=value
formatted messages. I want to pull all the keys out of the strings so I can collect them to use as the header for a CSV file. I do not know all potential key
fields, so I have to use regular expression matching to find them.
Here is my code.
package main import ( "fmt" "regexp" ) func GetKeys(logs []string) []string { // topMatches is the final array to be returned. // midMatches contains no duplicates, but the data is `key=`. // subMatches contains all initial matches. // initialRegex matches for anthing that matches `key=`. this is because the matching patterns. // cleanRegex massages `key=` to `key` topMatches := []string{} midMatches := []string{} subMatches := []string{} initialRegex := regexp.MustCompile(`([a-zA-Z]{1,}\=)`) cleanRegex := regexp.MustCompile(`([a-zA-Z]{1,})`) // the nested loop for matches is because FindAllString // returns []string for _, i := range logs { matches := initialRegex.FindAllString(i, -1) for _, m := range matches { subMatches = append(subMatches, m) } } // remove duplicates. seen := map[string]string{} for _, x := range subMatches { if _, ok := seen[x]; !ok { midMatches = append(midMatches, x) seen[x] = x } } // this is where I remove the `=` character. for _, y := range midMatches { clean := cleanRegex.FindAllString(y, 1) topMatches = append(topMatches, clean[0]) } return topMatches } func main() { y := []string{"key=value", "msg=payload", "test=yay", "msg=payload"} y = GetKeys(y) fmt.Println(y) }
I think my code is inefficient because I cannot determine how to properly optimise the initialRegex
regular expression to match just the key
in the key=value
format without matching the value as well.
Can my first regular expression, initialRegex
, be optimised so I do not have to do a second matching loop to remove the =
character?
Playground: http://play.golang.org/p/ONMf_cympM
{1,}
is equivalent to+
, and if the Goregexp
package supports it, you can use positive look-ahead to detect but not capture the=
:[a-zA-Z]+(?=\=)
. IIRC, the second=
doesn't need to be escaped since it has no special meaning outside of this context. Finally, I doubt you need the capturing group around the whole expression.\$\endgroup\$