Package: go/scanner

package scanner Import Path go/scanner (on go.dev) Dependency Relation imports 9 packages, and imported by 3 packages Involved Source Files errors.go #d scanner.go Package scanner implements a scanner for Go source text. It takes a []byte as source which can then be tokenized through repeated calls to the Scan method. Code Examples Scanner_Scan package main import ( "fmt" "go/scanner" "go/token" ) func main() { // src is the input that we want to tokenize. src := []byte("cos(x) + 1i*sin(x) // Euler") // Initialize the scanner. var s scanner.Scanner fset := token.NewFileSet() // positions are relative to fset file := fset.AddFile("", fset.Base(), len(src)) // register input "file" s.Init(file, src, nil /* no error handler */, scanner.ScanComments) // Repeated calls to Scan yield the token sequence found in the input. for { pos, tok, lit := s.Scan() if tok == token.EOF { break } fmt.Printf("%s\t%s\t%q\n", fset.Position(pos), tok, lit) } } Package-Level Type Names (total 5, all are exported) /* sort exporteds by: alphabet | popularity */ type Error (struct) In an ErrorList, an error is represented by an *Error. The position Pos, if valid, points to the beginning of the offending token, and the error condition is described by Msg. Fields (total 2, both are exported) Msg string Pos token.Position Methods (only one, which is exported) ( Error) Error() string Error implements the error interface. Implements (at least one exported) Error : error type ErrorHandler (func) An ErrorHandler may be provided to Scanner.Init. If a syntax error is encountered and a handler was installed, the handler is called with a position and an error message. The position points to the beginning of the offending token. As Inputs Of (at least one exported) func (*Scanner).Init(file *token.File, src []byte, err ErrorHandler, mode Mode) type ErrorList ([]) ErrorList is a list of *Errors. The zero value for an ErrorList is an empty ErrorList ready to use. Methods (total 9, all are exported) (*ErrorList) Add(pos token.Position, msg string) Add adds an Error with given position and error message to an ErrorList. ( ErrorList) Err() error Err returns an error equivalent to this error list. If the list is empty, Err returns nil. ( ErrorList) Error() string An ErrorList implements the error interface. ( ErrorList) Len() int ErrorList implements the sort Interface. ( ErrorList) Less(i, j int) bool (*ErrorList) RemoveMultiples() RemoveMultiples sorts an ErrorList and removes all but the first error per line. (*ErrorList) Reset() Reset resets an ErrorList to no errors. ( ErrorList) Sort() Sort sorts an ErrorList. *Error entries are sorted by position, other errors are sorted by error message, and before any *Error entry. ( ErrorList) Swap(i, j int) Implements (at least 2, both are exported) ErrorList : error ErrorList : sort.Interface type Mode uint (basic type) A mode value is a set of flags (or 0). They control scanner behavior. As Inputs Of (at least one exported) func (*Scanner).Init(file *token.File, src []byte, err ErrorHandler, mode Mode) As Types Of (total 2, in which 1 are exported) const ScanComments /* one unexported ... *//* one unexported: */ const dontInsertSemis type Scanner (struct) A Scanner holds the scanner's internal state while processing a given text. It can be allocated as part of another data structure but must be initialized via Init before use. Fields (total 11, in which 1 are exported) ErrorCount int public state - ok to modify // number of errors encountered /* 10 unexporteds ... *//* 10 unexporteds: */ ch rune scanning state // current character dir string // directory portion of file.Name() err ErrorHandler // error reporting; or nil file *token.File immutable state // source file handle insertSemi bool // insert a semicolon before next newline lineOffset int // current line offset mode Mode // scanning mode offset int // character offset rdOffset int // reading offset (position after current character) src []byte // source Methods (total 20, in which 2 are exported) (*Scanner) Init(file *token.File, src []byte, err ErrorHandler, mode Mode) Init prepares the scanner s to tokenize the text src by setting the scanner at the beginning of src. The scanner uses the file set file for position information and it adds line information for each line. It is ok to re-use the same file when re-scanning the same file as line information which is already present is ignored. Init causes a panic if the file size does not match the src size. Calls to Scan will invoke the error handler err if they encounter a syntax error and err is not nil. Also, for each error encountered, the Scanner field ErrorCount is incremented by one. The mode parameter determines how comments are handled. Note that Init may call err if there is an error in the first character of the file. (*Scanner) Scan() (pos token.Pos, tok token.Token, lit string) Scan scans the next token and returns the token position, the token, and its literal string if applicable. The source end is indicated by token.EOF. If the returned token is a literal (token.IDENT, token.INT, token.FLOAT, token.IMAG, token.CHAR, token.STRING) or token.COMMENT, the literal string has the corresponding value. If the returned token is a keyword, the literal string is the keyword. If the returned token is token.SEMICOLON, the corresponding literal string is ";" if the semicolon was present in the source, and "\n" if the semicolon was inserted because of a newline or at EOF. If the returned token is token.ILLEGAL, the literal string is the offending character. In all other cases, Scan returns an empty literal string. For more tolerant parsing, Scan will return a valid token if possible even if a syntax error was encountered. Thus, even if the resulting token sequence contains no illegal tokens, a client may not assume that no error occurred. Instead it must check the scanner's ErrorCount or the number of calls of the error handler, if there was one installed. Scan adds line information to the file added to the file set with Init. Token positions are relative to that file and thus relative to the file set. /* 18 unexporteds ... *//* 18 unexporteds: */ (*Scanner) digits(base int, invalid *int) (digsep int) digits accepts the sequence { digit | '_' }. If base <= 10, digits accepts any decimal digit but records the offset (relative to the source start) of a digit >= base in *invalid, if *invalid < 0. digits returns a bitset describing whether the sequence contained digits (bit 0 is set), or separators '_' (bit 1 is set). (*Scanner) error(offs int, msg string) (*Scanner) errorf(offs int, format string, args ...any) (*Scanner) findLineEnd() bool (*Scanner) next() Read the next Unicode char into s.ch. s.ch < 0 means end-of-file. For optimization, there is some overlap between this method and s.scanIdentifier. (*Scanner) peek() byte peek returns the byte following the most recently read character without advancing the scanner. If the scanner is at EOF, peek returns 0. (*Scanner) scanComment() string (*Scanner) scanEscape(quote rune) bool scanEscape parses an escape sequence where rune is the accepted escaped quote. In case of a syntax error, it stops at the offending character (without consuming it) and returns false. Otherwise it returns true. (*Scanner) scanIdentifier() string scanIdentifier reads the string of valid identifier characters at s.offset. It must only be called when s.ch is known to be a valid letter. Be careful when making changes to this function: it is optimized and affects scanning performance significantly. (*Scanner) scanNumber() (token.Token, string) (*Scanner) scanRawString() string (*Scanner) scanRune() string (*Scanner) scanString() string (*Scanner) skipWhitespace() (*Scanner) switch2(tok0, tok1 token.Token) token.Token (*Scanner) switch3(tok0, tok1 token.Token, ch2 rune, tok2 token.Token) token.Token (*Scanner) switch4(tok0, tok1 token.Token, ch2 rune, tok2, tok3 token.Token) token.Token (*Scanner) updateLineInfo(next, offs int, text []byte) updateLineInfo parses the incoming comment text at offset offs as a line directive. If successful, it updates the line info table for the position next per the line directive. Package-Level Functions (total 11, in which 1 are exported) func PrintError(w io.Writer, err error) PrintError is a utility function that prints a list of errors to w, one error per line, if the err parameter is an ErrorList. Otherwise it prints the err string. /* 10 unexporteds ... *//* 10 unexporteds: */ func digitVal(ch rune) int func invalidSep(x string) int invalidSep returns the index of the first invalid separator in x, or -1. func isDecimal(ch rune) bool func isDigit(ch rune) bool func isHex(ch rune) bool func isLetter(ch rune) bool func litname(prefix rune) string func lower(ch rune) rune func stripCR(b []byte, comment bool) []byte func trailingDigits(text []byte) (int, int, bool) Package-Level Variables (only one, which is unexported) /* one unexported ... *//* one unexported: */ var prefix []byte Package-Level Constants (total 4, in which 1 are exported) const ScanComments Mode = 1 // return comments as COMMENT tokens /* 3 unexporteds ... *//* 3 unexporteds: */ const bom = 65279 // byte order mark, only permitted as very first character const dontInsertSemis Mode = 2 // do not automatically insert semicolons - for testing only const eof = -1 // end of file