Getting Started
Before diving into Langium itself, let’s get your environment ready for development:
- You have a working Node environment with version 16 or higher.
- Install Yeoman and the Langium extension generator.
npm i -g yo generator-langium
For our getting started example, we would also recommend you to install the latest version of vscode.
To create your first working DSL, execute the yeoman generator:
yo langium
Yeoman will prompt you with a few basic questions about your DSL:
- Extension name: Will be used as the folder name of your extension and its
package.json
. - Language name: Will be used as the name of the grammar and as a prefix for some generated files and service classes.
- File extensions: A comma separated list of file extensions for your DSL.
Afterwards, it will generate a new project and start installing all dependencies, including the langium
framework as well as the langium-cli
command line tool required for generating code based on your grammar definition.
After everything has successfully finished running, open your newly created Langium project with vscode via the UI (File > Open Folder…) or execute the following command, replacing hello-world
with your chosen project name:
code hello-world
Press F5 or open the debug view and start the available debug configuration to launch the extension in a new Extension Development Host window. Open a folder and create a file with your chosen file extension (.hello
is the default). The hello-world
language accepts two kinds of entities: The person
and Hello
entity. Here’s a quick example on how to use them both:
person Alice
Hello Alice!
person Bob
Hello Bob!
The file src/language/hello-world.langium
in your newly created project contains your grammar.
If you’re already familiar with the terms used in parsing or DSL frameworks, you can skip this short excursion and go straight to the next part. However, anyone who is new to DSL development should carefully read the following primer on the terms we are using in our documentation:
abstract syntax tree: A tree of elements that represents a text document. Each element is a simple JS object that combines multiple input tokens into a single object. Commonly abbreviated as AST.
document: An abstract term to refer to a text file on your file system or an open editor document in your IDE.
grammar: Defines the form of your language. In Langium, a grammar is also responsible for describing how the AST is built.
parser: A program that takes a document as its input and computes an abstract syntax tree as its output.
parser rule: A parser rule describes how a certain AST element is supposed to be parsed. This is done by invoking other parser rules or terminals.
terminal: A terminal is the smallest parseable part of a document. It usually represents small pieces of text like names, numbers, keywords or comments.
token: A token is a substring of the document that matches a certain terminal. It contains information about which kind of terminal it represents as well as its location in the document.
Here’s the grammar that parses the previous text snippet:
grammar HelloWorld
hidden terminal WS: /\s+/;
terminal ID: /[_a-zA-Z][\w]*/;
entry Model: (persons+=Person | greetings+=Greeting)*;
Person:
'person' name=ID;
Greeting:
'Hello' person=[Person] '!';
Let’s go through this one by one:
grammar HelloWorld
Before we tell Langium anything about our grammar contents, we first need to give it a name - in this case it’s HelloWorld
. The langium-cli
will pick this up to prefix any generated services with this name.
hidden terminal WS: /\s+/;
terminal ID: /[_a-zA-Z][\w]*/;
Here we define our two needed terminals for this grammar: The whitespace WS
and identifier ID
terminals. Terminals parse a part of our document by matching it against their regular expression. The WS
terminal parses any whitespace characters with the regex /\s+/
. This allows us consume whitespaces in our document. As the terminal is declared as hidden
, the parser will parse any whitespace and discard the results. That way, we don’t have to care about how many whitespaces a user uses in their document. Secondly, we define our ID
terminal. It parses any string that starts with an underscore or letter and continues with any amount of characters that match the \w
regex token. It will match Alice
, _alice
, or _al1c3
but not 4lice
or #alice
. Langium is using the JS regex dialect for terminal definitions.
entry Model: (persons+=Person | greetings+=Greeting)*;
The Model
parser rule is the entry
point to our grammar. Parsing always starts with the entry
rule. Here we define a repeating group of alternatives: persons+=Person | greetings+=Greeting
. This will always try to parse either a Person
or a Greeting
and add it to the respective list of persons
or greetings
in the Model
object. Since the alternative is wrapped in a repeating group *
, the parser will continue until all input has been consumed.
Person: 'person' name=ID;
The Person
rule starts off with the 'person'
keyword. Keywords are like terminals, in the sense that they parse a part of the document. The set of keywords and terminals create the tokens that your language is able to parse. You can imagine that the 'person'
keyword here is like an indicator to tell the parser that an object of type Person
should be parsed. After the keyword, we assign the Person
a name by parsing an ID
.
Greeting: 'Hello' person=[Person] '!';
Like the previous rule, the Greeting
starts with a keyword. With the person
assignment we introduce the cross reference, indicated by the brackets []
. A cross reference will allow your grammar to reference other elements that are contained in your file or workspace. By default, Langium will try to resolve this cross reference by parsing the terminal that is associated with its name
property. In this case, we are looking for a Person
whose name
property matches the parsed ID
.
That finishes the short introduction to Langium! Feel free to play around with the grammar and use npm run langium:generate
to regenerate the generated TypeScript files. To go further, we suggest that you continue with our tutorials.