Go package for splitting strings (aware of enclosing braces and quotes)
The problem with standard Golang strings.Split
is that it does not take into consideration that the string being split may
contain enclosing braces and/or quotes (where the separator should not be considered where it's inside braces or quotes)
Take for example a string representing a slice of comma separated strings...
str := `"aaa","bbb","this, for sanity, should not be split"`
running strings.Split
on that...
package main
import "strings"
func main() {
str := `"aaa","bbb","this, for sanity, should not be parts"`
parts := strings.Split(str, `,`)
println(len(parts))
}
would yield 5 (try on go-playground) - instead of the desired 3
However, with splitter, the result would be different...
package main
import "github.com/go-andiamo/splitter"
func main() {
commaSplitter, _ := splitter.NewSplitter(',', splitter.DoubleQuotes)
str := `"aaa","bbb","this, for sanity, should not be split"`
parts, _ := commaSplitter.Split(str)
println(len(parts))
}
which yields the desired 3! try on go-playground
Note: The varargs, after the first separator arg, are the desired 'enclosures' (e.g. quotes, brackets, etc.) to be taken into consideration
While splitting, any enclosures specified are checked for balancing!
To install Splitter, use go get:
go get github.com/go-andiamo/splitter
To update Splitter to the latest version, run:
go get -u github.com/go-andiamo/splitter
Enclosures instruct the splitter specific start/end sequences within which the separator is not to be considered. An enclosure can be one of two types: quotes or brackets.
Quote type enclosures only differ from bracket types in that the end quote can optionally be 'escaped' within the quoted sequence.
The Splitter provides many pre-defined enclosures:
Var Name | Type | Start - End | Escaped end |
---|---|---|---|
DoubleQuotes |
Quote | " " |
none |
DoubleQuotesBackSlashEscaped |
Quote | " " |
\" |
DoubleQuotesDoubleEscaped |
Quote | " " |
"" |
SingleQuotes |
Quote | ' ' |
none |
SingleQuotesBackSlashEscaped |
Quote | ' ' |
\' |
SingleQuotesDoubleEscaped |
Quote | ' ' |
'' |
SingleInvertedQuotes |
Quote | ` ` |
none |
SingleInvertedQuotesBackSlashEscaped |
Quote | ` ` |
\' |
SingleInvertedQuotesDoubleEscaped |
Quote | ` ` |
`` |
SinglePointingAngleQuotes |
Quote | ‹ › |
none |
SinglePointingAngleQuotesBackSlashEscaped |
Quote | ‹ › |
\› |
DoublePointingAngleQuotes |
Quote | « » |
none |
LeftRightDoubleDoubleQuotes |
Quote | “ ” |
none |
LeftRightDoubleSingleQuotes |
Quote | ‘ ’ |
none |
LeftRightDoublePrimeQuotes |
Quote | 〝 〞 |
none |
SingleLowHigh9Quotes |
Quote | ‚ ‛ |
none |
DoubleLowHigh9Quotes |
Quote | „ ‟ |
none |
Parenthesis |
Brackets | ( ) |
n/a |
CurlyBrackets |
Brackets | { } |
n/a |
SquareBrackets |
Brackets | [ ] |
n/a |
LtGtAngleBrackets |
Brackets | < > |
n/a |
LeftRightPointingAngleBrackets |
Brackets | 〈 〉 |
n/a |
SubscriptParenthesis |
Brackets | ₍ ₎ |
n/a |
SuperscriptParenthesis |
Brackets | ⁽ ⁾ |
n/a |
SmallParenthesis |
Brackets | ﹙ ﹚ |
n/a |
SmallCurlyBrackets |
Brackets | ﹛ ﹜ |
n/a |
DoubleParenthesis |
Brackets | ⸨ ⸩ |
n/a |
MathWhiteSquareBrackets |
Brackets | ⟦ ⟧ |
n/a |
MathAngleBrackets |
Brackets | ⟨ ⟩ |
n/a |
MathDoubleAngleBrackets |
Brackets | ⟪ ⟫ |
n/a |
MathWhiteTortoiseShellBrackets |
Brackets | ⟬ ⟭ |
n/a |
MathFlattenedParenthesis |
Brackets | ⟮ ⟯ |
n/a |
OrnateParenthesis |
Brackets | ﴾ ﴿ |
n/a |
AngleBrackets |
Brackets | 〈 〉 |
n/a |
DoubleAngleBrackets |
Brackets | 《 》 |
n/a |
FullWidthParenthesis |
Brackets | ( ) |
n/a |
FullWidthSquareBrackets |
Brackets | [ ] |
n/a |
FullWidthCurlyBrackets |
Brackets | { } |
n/a |
SubstitutionBrackets |
Brackets | ⸂ ⸃ |
n/a |
SubstitutionQuotes |
Quote | ⸂ ⸃ |
none |
DottedSubstitutionBrackets |
Brackets | ⸄ ⸅ |
n/a |
DottedSubstitutionQuotes |
Quote | ⸄ ⸅ |
none |
TranspositionBrackets |
Brackets | ⸉ ⸊ |
n/a |
TranspositionQuotes |
Quote | ⸉ ⸊ |
none |
RaisedOmissionBrackets |
Brackets | ⸌ ⸍ |
n/a |
RaisedOmissionQuotes |
Quote | ⸌ ⸍ |
none |
LowParaphraseBrackets |
Brackets | ⸜ ⸝ |
n/a |
LowParaphraseQuotes |
Quote | ⸜ ⸝ |
none |
SquareWithQuillBrackets |
Brackets | ⁅ ⁆ |
n/a |
WhiteParenthesis |
Brackets | ⦅ ⦆ |
n/a |
WhiteCurlyBrackets |
Brackets | ⦃ ⦄ |
n/a |
WhiteSquareBrackets |
Brackets | 〚 〛 |
n/a |
WhiteLenticularBrackets |
Brackets | 〖 〗 |
n/a |
WhiteTortoiseShellBrackets |
Brackets | 〘 〙 |
n/a |
FullWidthWhiteParenthesis |
Brackets | ⦅ ⦆ |
n/a |
BlackTortoiseShellBrackets |
Brackets | ⦗ ⦘ |
n/a |
BlackLenticularBrackets |
Brackets | 【 】 |
n/a |
PointingCurvedAngleBrackets |
Brackets | ⧼ ⧽ |
n/a |
TortoiseShellBrackets |
Brackets | 〔 〕 |
n/a |
SmallTortoiseShellBrackets |
Brackets | ﹝ ﹞ |
n/a |
ZNotationImageBrackets |
Brackets | ⦇ ⦈ |
n/a |
ZNotationBindingBrackets |
Brackets | ⦉ ⦊ |
n/a |
MediumOrnamentalParenthesis |
Brackets | ❨ ❩ |
n/a |
LightOrnamentalTortoiseShellBrackets |
Brackets | ❲ ❳ |
n/a |
MediumOrnamentalFlattenedParenthesis |
Brackets | ❪ ❫ |
n/a |
MediumOrnamentalPointingAngleBrackets |
Brackets | ❬ ❭ |
n/a |
MediumOrnamentalCurlyBrackets |
Brackets | ❴ ❵ |
n/a |
HeavyOrnamentalPointingAngleQuotes |
Quote | ❮ ❯ |
n/a |
HeavyOrnamentalPointingAngleBrackets |
Brackets | ❰ ❱ |
n/a |
Quotes within quotes can be handled by using an enclosure that specifies how the escaping works, for example the following uses \ (backslash) prefixed escaping...
package main
import "github.com/go-andiamo/splitter"
func main() {
commaSplitter, _ := splitter.NewSplitter(',', splitter.DoubleQuotesBackSlashEscaped)
str := `"aaa","bbb","this, for sanity, \"should\" not be split"`
parts, _ := commaSplitter.Split(str)
println(len(parts))
}
Or with double escaping...
package main
import "github.com/go-andiamo/splitter"
func main() {
commaSplitter, _ := splitter.NewSplitter(',', splitter.DoubleQuotesDoubleEscaped)
str := `"aaa","bbb","this, for sanity, """"should,,,,"" not be split"`
parts, _ := commaSplitter.Split(str)
println(len(parts))
}
package main
import (
"fmt"
"github.com/go-andiamo/splitter"
)
func main() {
encs := []*splitter.Enclosure{
splitter.Parenthesis, splitter.SquareBrackets, splitter.CurlyBrackets,
splitter.DoubleQuotesDoubleEscaped, splitter.SingleQuotesDoubleEscaped,
}
commaSplitter, _ := splitter.NewSplitter(',', encs...)
str := `do(not,)split,'don''t,split,this',[,{,(a,"this has "" quotes")}]`
parts, _ := commaSplitter.Split(str)
println(len(parts))
for i, pt := range parts {
fmt.Printf("\t[%d]%s\n", i, pt)
}
}
Options define behaviours that are to be carried out on each found part during splitting.
An option, by virtue of it's return args from .Apply()
, can do one of three things:
- return a modified string of what is to be added to the split parts
- return a
false
to indicate that the split part is not to be added to the split result - return an
error
to indicate that the split part is unacceptable (and cease further splitting - the error is returned from theSplit
method)
Options can be added directly to the Splitter using .AddDefaultOptions()
method. These options are checked for every call to the splitters .Split()
method.
Options can also be specified when calling the splitter .Split()
method - these options are only carried out for this call (and after any options already specified on the splitter)
package main
import (
"fmt"
"github.com/go-andiamo/splitter"
)
func main() {
s := splitter.MustCreateSplitter('/').
AddDefaultOptions(splitter.IgnoreEmpties)
parts, _ := s.Split(`/a//c/`)
println(len(parts))
fmt.Printf("%+v", parts)
}
package main
import (
"fmt"
"github.com/go-andiamo/splitter"
)
func main() {
s := splitter.MustCreateSplitter('/').
AddDefaultOptions(splitter.IgnoreEmptyFirst, splitter.IgnoreEmptyLast)
parts, _ := s.Split(`/a//c/`)
println(len(parts))
fmt.Printf("%+v\n", parts)
parts, _ = s.Split(`a//c/`)
println(len(parts))
fmt.Printf("%+v\n", parts)
parts, _ = s.Split(`/a//c`)
println(len(parts))
fmt.Printf("%+v\n", parts)
}
package main
import (
"fmt"
"github.com/go-andiamo/splitter"
)
func main() {
s := splitter.MustCreateSplitter('/').
AddDefaultOptions(splitter.TrimSpaces)
parts, _ := s.Split(`/a/b/c/`)
println(len(parts))
fmt.Printf("%+v\n", parts)
parts, _ = s.Split(` / a /b / c/ `)
println(len(parts))
fmt.Printf("%+v\n", parts)
parts, _ = s.Split(`/ a / b / c /`)
println(len(parts))
fmt.Printf("%+v\n", parts)
}
package main
import (
"fmt"
"github.com/go-andiamo/splitter"
)
func main() {
s := splitter.MustCreateSplitter('/').
AddDefaultOptions(splitter.TrimSpaces, splitter.IgnoreEmpties)
parts, _ := s.Split(`/a/ /c/`)
println(len(parts))
fmt.Printf("%+v\n", parts)
parts, _ = s.Split(` / a // c/ `)
println(len(parts))
fmt.Printf("%+v\n", parts)
parts, _ = s.Split(`/ a / / c /`)
println(len(parts))
fmt.Printf("%+v\n", parts)
}
package main
import (
"fmt"
"github.com/go-andiamo/splitter"
)
func main() {
s := splitter.MustCreateSplitter('/').
AddDefaultOptions(splitter.TrimSpaces, splitter.NoEmpties)
if parts, err := s.Split(`/a/ /c/`); err != nil {
println(err.Error())
} else {
println(len(parts))
fmt.Printf("%+v\n", parts)
}
if parts, err := s.Split(` / a // c/ `); err != nil {
println(err.Error())
} else {
println(len(parts))
fmt.Printf("%+v\n", parts)
}
if parts, err := s.Split(`/ a / / c /`); err != nil {
println(err.Error())
} else {
println(len(parts))
fmt.Printf("%+v\n", parts)
}
if parts, err := s.Split(` a / b/c `); err != nil {
println(err.Error())
} else {
println(len(parts))
fmt.Printf("%+v\n", parts)
}
}