Class IndentationAwareTokenBuilder<Terminals, KeywordName>

A token builder that is sensitive to indentation in the input text. It will generate tokens for indentation and dedentation based on the indentation level.

The first generic parameter corresponds to the names of terminal tokens, while the second one corresponds to the names of keyword tokens. Both parameters are optional and can be imported from ./generated/ast.js.

Inspired by https://github.com/chevrotain/chevrotain/blob/master/examples/lexer/python_indentation/python_indentation.js

Type Parameters

  • Terminals extends string = string
  • KeywordName extends string = string

Hierarchy (view full)

Constructors

Properties

dedentTokenType: TokenType

The token type to be used for dedentation tokens

diagnostics: LexingDiagnostic[] = []

The list of diagnostics stored during the lexing process of a single text.

indentationStack: number[] = ...

The stack stores all the previously matched indentation levels to understand how deeply the next tokens are nested. The stack is valid for lexing

indentTokenType: TokenType

The token type to be used for indentation tokens

whitespaceRegExp: RegExp = ...

A regular expression to match a series of tabs and/or spaces. Override this to customize what the indentation is allowed to consist of.

Methods

  • Helper function to create an instance of an indentation token.

    Parameters

    • tokenType: TokenType

      Indent or dedent token type

    • text: string

      Full input string, used to calculate the line number

    • image: string

      The original image of the token (tabs or spaces)

    • offset: number

      Current position in the input string

    Returns IToken

    The indentation token instance

  • A custom pattern for matching dedents

    Parameters

    • text: string

      The full input string.

    • offset: number

      The offset at which to attempt a match

    • tokens: IToken[]

      Previously scanned tokens

    • groups: Record<string, IToken[]>

      Token Groups

    Returns null | RegExpExecArray | CustomPatternMatcherReturn

  • Resets the indentation stack between different runs of the lexer

    Parameters

    • text: string

      Full text that was tokenized

    Returns IToken[]

    Remaining dedent tokens to match all previous indents at the end of the file

  • Helper function to get the line number at a given offset.

    Parameters

    • text: string

      Full input string, used to calculate the line number

    • offset: number

      Current position in the input string

    Returns number

    The line number at the given offset

  • A custom pattern for matching indents

    Parameters

    • text: string

      The full input string.

    • offset: number

      The offset at which to attempt a match

    • tokens: IToken[]

      Previously scanned tokens

    • groups: Record<string, IToken[]>

      Token Groups

    Returns null | RegExpExecArray | CustomPatternMatcherReturn

  • Helper function to check if the current position is the start of a new line.

    Parameters

    • text: string

      The full input string.

    • offset: number

      The current position at which to check

    Returns boolean

    Whether the current position is the start of a new line

  • A helper function used in matching both indents and dedents.

    Parameters

    • text: string

      The full input string.

    • offset: number

      The current position at which to attempt a match

    • tokens: IToken[]

      Previously scanned tokens

    • groups: Record<string, IToken[]>

      Token Groups

    Returns {
        currIndentLevel: number;
        match: null | RegExpExecArray;
        prevIndentLevel: number;
    }

    The current and previous indentation levels and the matched whitespace

    • currIndentLevel: number
    • match: null | RegExpExecArray
    • prevIndentLevel: number