---
title: grammar | CodeWeaver Docs
description: API reference for codeweaver.semantic.grammar
url: "https://docs.knitli.com/api/semantic/grammar"
type: static
generatedAt: "2026-04-17T17:21:09.580Z"
---

# grammar
       [Open in ChatGPT](https://chatgpt.com/?q=Read%20https%3A%2F%2Fdocs.knitli.com%2Fcodeweaver%2Fapi%2Fsemantic%2Fgrammar%2F.%20I%20want%20to%20ask%20questions%20about%20it.)[Open in Claude](https://claude.ai/new?q=Read%20https%3A%2F%2Fdocs.knitli.com%2Fcodeweaver%2Fapi%2Fsemantic%2Fgrammar%2F.%20I%20want%20to%20ask%20questions%20about%20it.)[View in Markdown](/codeweaver/api/semantic/grammar.md)       [Share on LinkedIn](https://www.linkedin.com/sharing/share-offsite/?url=https%3A%2F%2Fdocs.knitli.com%2Fcodeweaver%2Fapi%2Fsemantic%2Fgrammar%2F)[Share on X](https://x.com/intent/tweet?url=https%3A%2F%2Fdocs.knitli.com%2Fcodeweaver%2Fapi%2Fsemantic%2Fgrammar%2F&text=grammar)[Share on Threads](https://threads.net/intent/post?url=https%3A%2F%2Fdocs.knitli.com%2Fcodeweaver%2Fapi%2Fsemantic%2Fgrammar%2F&text=grammar)[Share on Bluesky](https://bsky.app/intent/compose?text=grammar%20https%3A%2F%2Fdocs.knitli.com%2Fcodeweaver%2Fapi%2Fsemantic%2Fgrammar%2F)[Share on Facebook](https://www.facebook.com/sharer/sharer.php?u=https%3A%2F%2Fdocs.knitli.com%2Fcodeweaver%2Fapi%2Fsemantic%2Fgrammar%2F)[Share on Reddit](https://reddit.com/submit?url=https%3A%2F%2Fdocs.knitli.com%2Fcodeweaver%2Fapi%2Fsemantic%2Fgrammar%2F&title=grammar)[Share on Hacker News](https://news.ycombinator.com/submitlink?u=https%3A%2F%2Fdocs.knitli.com%2Fcodeweaver%2Fapi%2Fsemantic%2Fgrammar%2F&t=grammar)[Share on Email](mailto:?subject=grammar&body=https%3A%2F%2Fdocs.knitli.com%2Fcodeweaver%2Fapi%2Fsemantic%2Fgrammar%2F)[Share on WhatsApp](https://wa.me/?text=grammar%20https%3A%2F%2Fdocs.knitli.com%2Fcodeweaver%2Fapi%2Fsemantic%2Fgrammar%2F)[Share on Telegram](https://t.me/share/url?url=https%3A%2F%2Fdocs.knitli.com%2Fcodeweaver%2Fapi%2Fsemantic%2Fgrammar%2F&text=grammar)
# `codeweaver.semantic.grammar`
[Section titled “codeweaver.semantic.grammar”](#codeweaversemanticgrammar)
Parser for tree-sitter node-types.json files with intuitive terminology.

This module provides CodeWeaver’s internal representation and API for tree-sitter grammars. After some frustrating experiences with tree-sitter’s terminology and structure, we created these types to make working with tree-sitter grammars more intuitive.

## Background
[Section titled “Background”](#background)
tl;dr: **This is the parser we wish we had when we started working with tree-sitter. We hope it makes your experience with tree-sitter grammars smoother and more intuitive.**

When developing CodeWeaver and our rust-based future backend, Thread, we spent a lot of time with tree-sitter and its quirks. While tree-sitter is a powerful tool, its vocabulary and structure, combined with the lack of comprehensive documentation, can make it challenging to work with. Simply put: it’s not intuitive.

This is on full display in the `node-types.json` file, which describes the different node types in a grammar. The `node-types.json` file is crucial for understanding how to interact with parse trees, but its structure and terminology are confusing. It *conflates* several distinct concepts (meaning it treats them as if they are the same):

 - It doesn’t clearly differentiate between **nodes** (vertices) and **edges** (relationships)
 - It uses “named” to describe both nodes and edges, meaning “has a grammar rule”, not “has a name” (everything has a name!)
 - It flattens hierarchies and structural patterns in ways that obscure their meaning

When I originally wrote the last version of this parser, my misunderstandings of these concepts led to a week of lost time and incorrect assumptions. After that, I decided to write this parser using terminology and structure that more intuitively describes the concepts at play — completely departing from tree-sitter’s terminology.

Knitli is fundamentally about making complex systems more intuitive and accessible, and this is a perfect example of that philosophy in action. By using clearer terminology and structure, we’re making it easier for developers to understand and work with tree-sitter grammars. This saves time, reduces frustration, and empowers developers to build better tools.

**For tree-sitter experts:** We provide a translation guide below to help bridge the gap between the two terminologies. If you find this frustrating, we understand — but we believe clarity for newcomers is more important than tradition.

## CodeWeaver’s Terminology
[Section titled “CodeWeaver’s Terminology”](#codeweavers-terminology)
We clarify and separate concepts that tree-sitter conflates: nodes vs edges, abstract vs concrete, structural roles vs semantic meaning. Here’s our approach:

### Abstract Groupings
[Section titled “Abstract Groupings”](#abstract-groupings)
**Category** - Abstract classification that groups Things with shared characteristics.

 - Categories do NOT appear in parse trees (abstract only)
 - Used for polymorphic type constraints and classification (identifying what something can be used as)
 - Example: `expression` is a Category containing `binary_expression`, `unary_expression`, etc.
 - **Tree-sitter equivalent**: Nodes with `subtypes` field (abstract types)
 - **Empirical finding**: ~110 unique Categories across 25 languages, but much smaller number when normalized (across languages) ~ 16 categories with members across many languages

**Multi-Category Membership:**

 - Things can belong to multiple Categories (uncommon but important)
 - **13.5%** of Things belong to 2+ Categories
 - **86.5%** belong to exactly 1 Category
 - Common in C/C++ (declarators serving multiple roles)
 - Example: `identifier` → `[_declarator, expression]`

### Concrete Parse Tree Nodes
[Section titled “Concrete Parse Tree Nodes”](#concrete-parse-tree-nodes)
**Thing** - A concrete element that appears in the parse tree.

 - Two kinds: **Token** (leaf) or **Composite** (non-leaf)
 - What you actually see when you parse code
 - **Tree-sitter equivalent**: Named or unnamed “nodes” (named does not correlate to our Composite vs Token distinction)
 - Name chosen for clarity: “it’s a thing in your code” (considered: Entity, Element, Construct)

**Token** - Leaf Thing with no structural children.

 - Represents keywords, identifiers, literals, punctuation
 - What you literally **see** in the source code
 - Classified by purpose: keyword, identifier, literal, punctuation, comment
 - **Tree-sitter equivalent**: Node with no `fields` or `children`

**Composite Node** - Non-leaf Thing with structural children.

 - Has Direct and/or Positional connections to child Things
 - Represents complex structures: functions, classes, expressions
 - **Tree-sitter equivalent**: Node with `fields` and/or `children`

### Structural Relationships
[Section titled “Structural Relationships”](#structural-relationships)
**Connection** - Directed relationship from parent Thing to child Thing(s).

 - Graph terminology: an “edge”
 - Three classes: Direct, Positional, Loose
 - **Tree-sitter equivalent**: `fields` (Direct), `children` (Positional), `extras` (Loose)

**ConnectionClass** - Classification of connection types:

 1. **DIRECT** - Named semantic relationship with a **Role**

 - Has a specific semantic function (e.g., “condition”, “body”, “parameters”)
 - Most precise type of structural relationship
 - **Tree-sitter equivalent**: Grammar “fields”
 - **Empirical finding**: 9,606 Direct connections across all languages
 1. **POSITIONAL** - Ordered structural relationship without semantic naming

 - Position matters but no explicit role name
 - Example: function arguments in some languages
 - **Tree-sitter equivalent**: Grammar “children”
 - If a thing has fields, it can also have children, but not vice versa (all things with children have fields)
 - All children are named (is_explicit_rule = True)
 - **Empirical finding**: 6,029 Positional connections across all languages

*Note: Direct and Positional Connections describe **structure**, while Loose Connections describe **permission**.*

**Role** - Named semantic function of a Direct connection.

 - Only Direct connections have Roles (Positional and Loose do not)
 - Describes **what purpose** a child serves, not just that it exists
 - Examples: “condition”, “body”, “parameters”, “left”, “right”, “operator”
 - **Tree-sitter equivalent**: Field name in grammar
 - **Empirical finding**: ~90 unique role names across all languages

### Connection Target References
[Section titled “Connection Target References”](#connection-target-references)
**Polymorphic Type Constraints:** Connections can reference either Categories (abstract) OR concrete Things, enabling flexible type constraints:

**Category References** (polymorphic constraints):

 - Connection accepts ANY member of a Category
 - Example: `condition` field → `expression` (accepts any expression type)
 - **Empirical finding**:
 - **7.9%** of field references are to Categories
 - **10.3%** of children references are to Categories
 - Common pattern: `argument_list.children → expression` (any expression type accepted)

**Concrete Thing References** (specific constraints):

 - Connection accepts only specific Thing types
 - Example: `operator` field → `["+", "-", "*", "/"]` (specific operators only)
 - **Empirical finding**:
 - **92.1%** of field references are to concrete Things
 - **89.7%** of children references are to concrete Things
 - Common pattern: Structural components like `parameter_list`, `block`, specific tokens

**Mixed References** (both in same connection):

 - Single connection can reference both Categories AND concrete Things
 - Example: `body` field → `[block, expression]` (either concrete type)
 - Design principle: Store references as-is, provide resolution utilities when needed

### Attributes
[Section titled “Attributes”](#attributes)
**Thing Attributes:**

 - **can_be_anywhere** (bool)

 - Whether the Thing can appear anywhere in the parse tree (usually comments)
 - **Tree-sitter equivalent**: the `extra` attribute Data notes:
 - Only used in a plurality of languages (11 of 25)
 - *almost always* a **comment**. Two exceptions:
 - Python: `line_continuation` token (1/2, other is `comment`)
 - PHP: `text_interpolation` (1/2, other is `comment`)
 - **Empirical finding**: 1 or 2 things with ‘can_be_anywhere’ attribute per language (‘comment’ is one for all 11, others with 2 are other types of comment like ‘html_comment’ for javascript (for jsx))
 - **is_explicit_rule** (bool)

 - Whether the Thing has a dedicated named production rule in the grammar
 - True: Named grammar rule (appears with semantic name)
 - False: Anonymous grammar construct or synthesized node
 - **Tree-sitter equivalent**: `named = True/False`
 - **Note**: Included for completeness; limited practical utility for semantic analysis
 - **kind** (ThingKind enum)

 - Classification of Thing type: TOKEN or COMPOSITE
 - TOKEN: Leaf Thing with no structural children
 - COMPOSITE: Non-leaf Thing with structural children
 - **is_file** (bool, Composite only)

 - Whether this Composite is the root of the parse tree (i.e., the start symbol)
 - **is_significant** (bool, Token only)

 - Whether the Token carries semantic/structural meaning vs formatting trivia
 - True: keywords, identifiers, literals, operators, comments
 - False: whitespace, line continuations, formatting tokens
 - Practically similar to `is_explicit_rule` but focuses on semantic importance
 - Used for filtering during semantic analysis vs preserving for formatting

**Connection Attributes:**

 - **allows_multiple** (bool)

 - Whether the Connection permits multiple children of specified type(s)
 - Defines cardinality upper bound (0 or 1 vs 0 or many)
 - **Tree-sitter equivalent**: `multiple = True/False`
 - **Note**: Specifies CAN have multiple, not MUST have multiple
 - **requires_presence** (bool)

 - Whether at least one child of specified type(s) MUST be present
 - Defines cardinality lower bound (0 or more vs 1 or more)
 - **Tree-sitter equivalent**: `required = True/False`
 - **Note**: Doesn’t require a specific Connection, just ≥1 from the allowed list

**Cardinality Matrix:**

| requires_presence | allows_multiple | Meaning |
| --- | --- | --- |
| False | False | 0 or 1 (optional single) |
| False | True | 0 or more (optional multiple) |
| True | False | exactly 1 (required single) |
| True | True | 1 or more (required multiple) |

## Tree-sitter Translation Guide
[Section titled “Tree-sitter Translation Guide”](#tree-sitter-translation-guide)
For developers familiar with tree-sitter terminology:

| Tree-sitter Term | CodeWeaver Term | Notes |
| --- | --- | --- |
| Abstract type (with subtypes) | Category | Doesn’t appear in parse trees |
| Named/unnamed node | Thing | Concrete parse tree node |
| Node with no fields | Token | Leaf node |
| Node with fields/children | Composite Thing | Non-leaf node |
| Field | Direct Connection | Has semantic Role |
| Child | Positional Connection | Ordered, no Role |
| Field name | Role | Semantic function |
| Extra | `can_be_anywhere` | Can be anywhere in the AST |
| `named` attribute | `is_explicit_rule` | Has named grammar rule |
| `multiple` attribute | `allows_multiple` | Upper cardinality bound |
| `required` attribute | `requires_presence` | Lower cardinality bound |
| ’root’ attribute | `is_file` | The starting node of the parse tree |

## Design Rationale
[Section titled “Design Rationale”](#design-rationale)
**Why these names?**

 - **Thing**: Simple, clear, unpretentious. “It’s a thing in your code.”
 - **Category**: Universally understood as abstract grouping
 - **Connection**: Graph theory standard; clearer than conflating fields/children/extras
 - **Role**: Describes purpose, not just presence
 - **ConnectionClass**: Explicit enumeration of relationship types

**Empirical validation:**

 - Analysis of 25 languages, 5,000+ node types
 - ~110 unique Categories, ~736 unique Things with category membership
 - 7.9-10.3% of references are polymorphic (Category references)
 - 13.5% of Things have multi-category membership
 - Patterns consistent across language families

**Benefits:**

 - **Clearer mental model**: Separate nodes, edges, and attributes explicitly
 - **Easier to learn**: Intuitive names reduce cognitive load
 - **Better tooling**: Explicit types enable better type checking and validation
 - **Future-proof**: Accommodates real-world patterns (multi-category, polymorphic references)

## Class: `AllThingsDict`
[Section titled “Class: AllThingsDict”](#class-allthingsdict)
TypedDict for all Things and Tokens in a grammar.

## Class: `Category`
[Section titled “Class: Category”](#class-category)
A Category is an abstract classification that groups Things with shared characteristics.

Categories do not appear in parse trees. They are primarily for classification of related Things. For example, `expression` is a Category containing `binary_expression`, `unary_expression`, etc.

### Method: `from_node_dto`
[Section titled “Method: from_node_dto”](#method-from_node_dto)

**

```
from_node_dto()
```

Create a Category from the given node DTOs.

### Method: `includes`
[Section titled “Method: includes”](#method-includes)

**

```
includes()
```

Check if this Category includes the specified Thing name.

### Method: `overlap_with`
[Section titled “Method: overlap_with”](#method-overlap_with)

**

```
overlap_with()
```

Check if this Category shares any member Things with another Category. Returns the overlapping member Thing names.

Used for analyzing multi-category membership.

### Method: `serialize_for_cli`
[Section titled “Method: serialize_for_cli”](#method-serialize_for_cli)

**

```
serialize_for_cli()
```

Serialize the Category for CLI output.

## Class: `CompositeThing`
[Section titled “Class: CompositeThing”](#class-compositething)
A CompositeThing is a concrete element that appears in the parse tree. A Token is a Thing, but a CompositeThing (this class) is not a Token.

Tree-sitter equivalent: Node with fields and/or children

Attributes: name: Thing identifier (e.g., “if_statement”, “identifier”) kind: Structural classification (always COMPOSITE) language: Programming language this Thing belongs to categories: Set of Category names this Thing belongs to is_explicit_rule: Whether has named grammar rule can_be_anywhere: Whether can appear anywhere in parse tree (usually comments) is_file: Whether this Composite is the root of the parse tree (i.e., the start symbol).

Relationships:

 - Thing → Many Categories (via categories attribute)
 - Categories reference Things via their `member_things` attribute

A CompositeThing represents complex structures like functions, classes, and expressions, which have direct and/or positional connections to child Things.

Empirical findings:

 - Average 3-5 possible Direct Connections per CompositeThing
 - Average 1-2 possible Positional Connections per CompositeThing

## Class: `Connection`
[Section titled “Class: Connection”](#class-connection)
Base class for Connections between Things in a parse tree.

A Connection is a relationship from a parent Thing to child Thing(s) (an ‘edge’ in graph terminology). There are three classes of Connections: Direct or Positional. Direct and Positional Connections describe structure.

Attributes: connection_class: Classification of connection type (DIRECT, POSITIONAL) target_thing_names: Set of names of target Things this Connection can point to allows_multiple: Whether this Connection permits multiple children of specified type(s) requires_presence: Whether at least one child of specified type(s) MUST be present

Relationships:

 - Connection → Many Things (via target_thing_names attribute)
 - Things reference Connections via their `direct_connections` or `positional_connections` attributes

Empirical findings:

 - Average 3-5 Direct Connections per CompositeThing
 - Average 1-2 Positional Connections per CompositeThing

### Method: `can_connect_to`
[Section titled “Method: can_connect_to”](#method-can_connect_to)

**

```
can_connect_to()
```

Check if this Connection can point to the specified Thing.

This method differs slightly from using **contains** because it treats Things that can be anywhere (extra) as always connectable.

### Method: `serialize_for_cli`
[Section titled “Method: serialize_for_cli”](#method-serialize_for_cli-1)

**

```
serialize_for_cli()
```

Serialize the Connection for CLI output.

## Class: `DirectConnection`
[Section titled “Class: DirectConnection”](#class-directconnection)
A DirectConnection is a named semantic relationship with a Role.

Tree-sitter equivalent: Grammar “fields”.

Attributes: role: Semantic function name (e.g., “condition”, “body”) _connection_class: Always ConnectionClass.DIRECT

Characteristics:

 - Most precise type of structural relationship
 - Role describes what purpose the child serves
 - Only Direct connections have Roles

Empirical findings:

 - ~90 unique role names across all languages
 - Most common: name (381), body (281), type (217), condition (102)
 - Average 3-5 Direct connections per Composite Thing

### Method: `from_node_dto`
[Section titled “Method: from_node_dto”](#method-from_node_dto-1)

**

```
from_node_dto()
```

Create DirectConnections from the given node DTOs.

## Class: `Grammar`
[Section titled “Class: Grammar”](#class-grammar)
Grammar provides the primary API for evaluating defined grammar rules for a language. We use grammars to analyze observed AST nodes and determine their semantic meaning.

A grammar represents the complete set of Things, Categories, and Connections for a specific programming language.

Grammars are the primary objects for semantic analysis and code understanding (comparing observed ASTs to expected structures — grammars are the expected structure half).

### Method: `from_registry`
[Section titled “Method: from_registry”](#method-from_registry)

**

```
from_registry()
```

Create a Grammar for the specified language from the ThingRegistry.

Args: language: The programming language for the Grammar.

Returns: A Grammar instance for the specified language.

### Method: `get_category_by_name`
[Section titled “Method: get_category_by_name”](#method-get_category_by_name)

**

```
get_category_by_name()
```

Get a Category by its name in this Grammar.

Args: name: The name of the Category to retrieve.

Returns: The Category instance if found; otherwise, None.

### Method: `get_thing_by_name`
[Section titled “Method: get_thing_by_name”](#method-get_thing_by_name)

**

```
get_thing_by_name()
```

Get a Thing (CompositeThing or Token) by its name in this Grammar.

Args: name: The name of the Thing to retrieve.

Returns: The Thing instance if found; otherwise, None.

## Class: `PositionalConnections`
[Section titled “Class: PositionalConnections”](#class-positionalconnections)
A PositionalConnections is an ordered structural relationship without a Role.

Tree-sitter equivalent: Grammar “children”.

Characteristics:

 - Less precise than DirectConnection (no Role)
 - Ordered relationship (position may imply role)
 - No Role; may have implied role from position

Empirical findings:

 - Average 1-2 Positional connections per Composite Thing

### Method: `from_node_dto`
[Section titled “Method: from_node_dto”](#method-from_node_dto-2)

**

```
from_node_dto()
```

Create PositionalConnections from the given node DTOs.

## Class: `Thing`
[Section titled “Class: Thing”](#class-thing)
Base class for Things (Things and Tokens — also called Composites and Tokens)).

There are two kinds of Things: Token (leaf) or Composite (non-leaf). Things are what you actually see in the AST produced by parsing code. A token is what you literally see in the source code (keywords, identifiers, literals, punctuation). A Composite represents complex structures like functions, classes, and expressions, which have direct and/or positional connections to child Things.

We keep Token as a separate class for clarity, type safety, and to enforce that Tokens cannot have children.

### Method: `classification_confidence`
[Section titled “Method: classification_confidence”](#method-classification_confidence)

**

```
classification_confidence()
```

Get the confidence score of this Thing’s classification.

### Method: `from_node_dto`
[Section titled “Method: from_node_dto”](#method-from_node_dto-3)

**

```
from_node_dto()
```

Create a Thing (Token or Composite) from a NodeTypeDTO and category names.

### Method: `serialize_for_cli`
[Section titled “Method: serialize_for_cli”](#method-serialize_for_cli-2)

**

```
serialize_for_cli()
```

Serialize the Thing for CLI output.

## Class: `Token`
[Section titled “Class: Token”](#class-token)
A Token is a leaf Thing with no structural children.

A Token represents keywords, identifiers, literals, and punctuation — what you literally see in the source code. Tokens are classified by their purpose, indicating whether they carry semantic or structural meaning versus being mere formatting trivia.

Tree-sitter equivalent: Node with no fields or children (i.e., a leaf node)

Attributes: name: Thing identifier (e.g., “if”, “identifier”, “string”) language: Programming language this Token belongs to. categories: Set of Category names this Token belongs to, if any. kind: Structural classification (always TOKEN) purpose: Semantic purpose of the Token (e.g., KEYWORD, IDENTIFIER) can_be_anywhere: Whether can appear anywhere in parse tree (usually comments)

## Function: `cat_name_normalizer`
[Section titled “Function: cat_name_normalizer”](#function-cat_name_normalizer)

**

```
cat_name_normalizer()
```

Normalize category names by stripping leading underscores.

## Function: `get_all_grammars`
[Section titled “Function: get_all_grammars”](#function-get_all_grammars)

**

```
get_all_grammars()
```

Get all available Grammars for all supported programming languages.

Returns: dict: A dictionary mapping each SemanticSearchLanguage to its corresponding Grammar.

## Function: `get_grammar`
[Section titled “Function: get_grammar”](#function-get_grammar)

**

```
get_grammar()
```

Get the Grammar for the specified programming language.

Args: language (SemanticSearchLanguage): The programming language to get the Grammar for.

Returns: Grammar: The Grammar for the specified programming language.

## Function: `name_normalizer`
[Section titled “Function: name_normalizer”](#function-name_normalizer)

**

```
name_normalizer()
```

Normalize names by stripping leading underscores.

## Function: `role_name_normalizer`
[Section titled “Function: role_name_normalizer”](#function-role_name_normalizer)

**

```
role_name_normalizer()
```

Normalize role names by stripping leading underscores.

## Function: `thing_name_normalizer`
[Section titled “Function: thing_name_normalizer”](#function-thing_name_normalizer)

**

```
thing_name_normalizer()
```

Normalize thing names by stripping leading underscores.