---
title: node_type_parser | CodeWeaver Docs
description: API reference for codeweaver.semantic.node_type_parser
url: "https://docs.knitli.com/api/semantic/node_type_parser"
type: static
generatedAt: "2026-04-17T17:21:09.591Z"
---

# node_type_parser
       [Open in ChatGPT](https://chatgpt.com/?q=Read%20https%3A%2F%2Fdocs.knitli.com%2Fcodeweaver%2Fapi%2Fsemantic%2Fnode_type_parser%2F.%20I%20want%20to%20ask%20questions%20about%20it.)[Open in Claude](https://claude.ai/new?q=Read%20https%3A%2F%2Fdocs.knitli.com%2Fcodeweaver%2Fapi%2Fsemantic%2Fnode_type_parser%2F.%20I%20want%20to%20ask%20questions%20about%20it.)[View in Markdown](/codeweaver/api/semantic/node_type_parser.md)       [Share on LinkedIn](https://www.linkedin.com/sharing/share-offsite/?url=https%3A%2F%2Fdocs.knitli.com%2Fcodeweaver%2Fapi%2Fsemantic%2Fnode_type_parser%2F)[Share on X](https://x.com/intent/tweet?url=https%3A%2F%2Fdocs.knitli.com%2Fcodeweaver%2Fapi%2Fsemantic%2Fnode_type_parser%2F&text=node_type_parser)[Share on Threads](https://threads.net/intent/post?url=https%3A%2F%2Fdocs.knitli.com%2Fcodeweaver%2Fapi%2Fsemantic%2Fnode_type_parser%2F&text=node_type_parser)[Share on Bluesky](https://bsky.app/intent/compose?text=node_type_parser%20https%3A%2F%2Fdocs.knitli.com%2Fcodeweaver%2Fapi%2Fsemantic%2Fnode_type_parser%2F)[Share on Facebook](https://www.facebook.com/sharer/sharer.php?u=https%3A%2F%2Fdocs.knitli.com%2Fcodeweaver%2Fapi%2Fsemantic%2Fnode_type_parser%2F)[Share on Reddit](https://reddit.com/submit?url=https%3A%2F%2Fdocs.knitli.com%2Fcodeweaver%2Fapi%2Fsemantic%2Fnode_type_parser%2F&title=node_type_parser)[Share on Hacker News](https://news.ycombinator.com/submitlink?u=https%3A%2F%2Fdocs.knitli.com%2Fcodeweaver%2Fapi%2Fsemantic%2Fnode_type_parser%2F&t=node_type_parser)[Share on Email](mailto:?subject=node_type_parser&body=https%3A%2F%2Fdocs.knitli.com%2Fcodeweaver%2Fapi%2Fsemantic%2Fnode_type_parser%2F)[Share on WhatsApp](https://wa.me/?text=node_type_parser%20https%3A%2F%2Fdocs.knitli.com%2Fcodeweaver%2Fapi%2Fsemantic%2Fnode_type_parser%2F)[Share on Telegram](https://t.me/share/url?url=https%3A%2F%2Fdocs.knitli.com%2Fcodeweaver%2Fapi%2Fsemantic%2Fnode_type_parser%2F&text=node_type_parser)
# `codeweaver.semantic.node_type_parser`
[Section titled “codeweaver.semantic.node_type_parser”](#codeweaversemanticnode_type_parser)
Parser for tree-sitter node-types.json files with intuitive terminology.

This module provides functionality to parse tree-sitter `node-types.json` files and extract grammar information using clear, intuitive terminology instead of tree-sitter’s confusing vocabulary.

## Background
[Section titled “Background”](#background)
tl;dr: **This is the parser we wish we had when we started working with tree-sitter. We hope it makes your experience with tree-sitter grammars smoother and more intuitive.**

When developing CodeWeaver and our rust-based future backend, Thread, we spent a lot of time with tree-sitter and its quirks. While tree-sitter is a powerful tool, its vocabulary and structure, combined with the lack of comprehensive documentation, can make it challenging to work with. Simply put: it’s not intuitive.

This is on full display in the `node-types.json` file, which describes the different node types in a grammar. The `node-types.json` file is crucial for understanding how to interact with parse trees, but its structure and terminology are confusing. It *conflates* several distinct concepts (meaning it treats them as if they are the same):

 - It doesn’t clearly differentiate between **nodes** (vertices) and **edges** (relationships)
 - It uses “named” to describe both nodes and edges, meaning “has a grammar rule”, not “has a name” (everything has a name!)
 - It flattens hierarchies and structural patterns in ways that obscure their meaning

When I originally wrote the last version of this parser, my misunderstandings of these concepts led to a week of lost time and incorrect assumptions. After that, I decided to write this parser using terminology and structure that more intuitively describes the concepts at play — completely departing from tree-sitter’s terminology.

Knitli is fundamentally about making complex systems more intuitive and accessible, and this is a perfect example of that philosophy in action. By using clearer terminology and structure, we’re making it easier for developers to understand and work with tree-sitter grammars. This saves time, reduces frustration, and empowers developers to build better tools.

**For tree-sitter experts:** We provide a translation guide below to help bridge the gap between the two terminologies. If you find this frustrating, we understand — but we believe clarity for newcomers is more important than tradition.

## CodeWeaver’s Terminology
[Section titled “CodeWeaver’s Terminology”](#codeweavers-terminology)
We clarify and separate concepts that tree-sitter conflates: nodes vs edges, abstract vs concrete, structural roles vs semantic meaning. Here’s our approach:

### Abstract Groupings
[Section titled “Abstract Groupings”](#abstract-groupings)
**Category** - Abstract classification that groups Things with shared characteristics.

 - Categories do NOT appear in parse trees (abstract only)
 - Used for polymorphic type constraints and classification (identifying what something can be used as)
 - Example: `expression` is a Category containing `binary_expression`, `unary_expression`, etc.
 - **Tree-sitter equivalent**: Nodes with `subtypes` field (abstract types)
 - **Empirical finding**: ~110 unique Categories across 25 languages, but much smaller number when normalized (across languages) ~ 16 categories with members across many languages

**Multi-Category Membership:**

 - Things can belong to multiple Categories (uncommon but important)
 - **13.5%** of Things belong to 2+ Categories
 - **86.5%** belong to exactly 1 Category
 - Common in C/C++ (declarators serving multiple roles)
 - Example: `identifier` → `[_declarator, expression]`

### Concrete Parse Tree Nodes
[Section titled “Concrete Parse Tree Nodes”](#concrete-parse-tree-nodes)
**Thing** - A concrete element that appears in the parse tree.

 - Two kinds: **Token** (leaf) or **Composite** (non-leaf)
 - What you actually see when you parse code
 - **Tree-sitter equivalent**: Named or unnamed “nodes” (named does not correlate to our Composite vs Token distinction)
 - Name chosen for clarity: “it’s a thing in your code” (considered: Entity, Element, Construct)

**Token** - Leaf Thing with no structural children.

 - Represents keywords, identifiers, literals, punctuation
 - What you literally **see** in the source code
 - Classified by purpose: keyword, identifier, literal, punctuation, comment
 - **Tree-sitter equivalent**: Node with no `fields` or `children`

**Composite Node** - Non-leaf Thing with structural children.

 - Has Direct and/or Positional connections to child Things
 - Represents complex structures: functions, classes, expressions
 - **Tree-sitter equivalent**: Node with `fields` and/or `children`

### Structural Relationships
[Section titled “Structural Relationships”](#structural-relationships)
**Connection** - Directed relationship from parent Thing to child Thing(s).

 - Graph terminology: an “edge”
 - Three classes: Direct, Positional, Loose
 - **Tree-sitter equivalent**: `fields` (Direct), `children` (Positional), `extras` (Loose)

**ConnectionClass** - Classification of connection types:

 1. **DIRECT** - Named semantic relationship with a **Role**

 - Has a specific semantic function (e.g., “condition”, “body”, “parameters”)
 - Most precise type of structural relationship
 - **Tree-sitter equivalent**: Grammar “fields”
 - **Empirical finding**: 9,606 Direct connections across all languages
 1. **POSITIONAL** - Ordered structural relationship without semantic naming

 - Position matters but no explicit role name
 - Example: function arguments in some languages
 - **Tree-sitter equivalent**: Grammar “children”
 - If a thing has fields, it can also have children, but not vice versa (all things with children have fields)
 - All children are named (is_explicit_rule = True)
 - **Empirical finding**: 6,029 Positional connections across all languages

*Note: Direct and Positional Connections describe **structure**, while Loose Connections describe **permission**.*

**Role** - Named semantic function of a Direct connection.

 - Only Direct connections have Roles (Positional and Loose do not)
 - Describes **what purpose** a child serves, not just that it exists
 - Examples: “condition”, “body”, “parameters”, “left”, “right”, “operator”
 - **Tree-sitter equivalent**: Field name in grammar
 - **Empirical finding**: ~90 unique role names across all languages

### Connection Target References
[Section titled “Connection Target References”](#connection-target-references)
**Polymorphic Type Constraints:** Connections can reference either Categories (abstract) OR concrete Things, enabling flexible type constraints:

**Category References** (polymorphic constraints):

 - Connection accepts ANY member of a Category
 - Example: `condition` field → `expression` (accepts any expression type)
 - **Empirical finding**:
 - **7.9%** of field references are to Categories
 - **10.3%** of children references are to Categories
 - Common pattern: `argument_list.children → expression` (any expression type accepted)

**Concrete Thing References** (specific constraints):

 - Connection accepts only specific Thing types
 - Example: `operator` field → `["+", "-", "*", "/"]` (specific operators only)
 - **Empirical finding**:
 - **92.1%** of field references are to concrete Things
 - **89.7%** of children references are to concrete Things
 - Common pattern: Structural components like `parameter_list`, `block`, specific tokens

**Mixed References** (both in same connection):

 - Single connection can reference both Categories AND concrete Things
 - Example: `body` field → `[block, expression]` (either concrete type)
 - Design principle: Store references as-is, provide resolution utilities when needed

### Attributes
[Section titled “Attributes”](#attributes)
**Thing Attributes:**

 - **can_be_anywhere** (bool)

 - Whether the Thing can appear anywhere in the parse tree (usually comments)
 - **Tree-sitter equivalent**: the `extra` attribute Data notes:
 - Only used in a plurality of languages (11 of 25)
 - *almost always* a **comment**. Two exceptions:
 - Python: `line_continuation` token (1/2, other is `comment`)
 - PHP: `text_interpolation` (1/2, other is `comment`)
 - **Empirical finding**: 1 or 2 things with ‘can_be_anywhere’ attribute per language (‘comment’ is one for all 11, others with 2 are other types of comment like ‘html_comment’ for javascript (for jsx))
 - **is_explicit_rule** (bool)

 - Whether the Thing has a dedicated named production rule in the grammar
 - True: Named grammar rule (represented in grammar with semantic name)
 - False: Anonymous grammar construct or synthesized node
 - **Tree-sitter equivalent**: `named = True/False` (i.e. ‘named nodes’)
 - **Note**: Included for completeness; limited practical utility for semantic analysis in practice, most significant nodes are named, and most unnamed nodes are trivial (punctuation, formatting), but it’s not a perfect correlation. Other tools and libraries tend to treat unnamed nodes as synonymous with “insignificant”, but we don’t make that assumption here.
 - **kind** (ThingKind enum)

 - Classification of Thing type: TOKEN or COMPOSITE
 - TOKEN: Leaf Thing with no structural children
 - COMPOSITE: Non-leaf Thing with structural children
 - **is_file** (bool, Composite only)

 - Whether this Composite is the root of the parse tree (i.e., the start symbol)
 - **is_significant** (bool, Token only)

 - Whether the Token carries semantic/structural meaning vs formatting trivia
 - True: keywords, identifiers, literals, operators, comments
 - False: whitespace, line continuations, formatting tokens
 - Practically similar to `is_explicit_rule` but focuses on semantic importance
 - Used for filtering during semantic analysis vs preserving for formatting

**Connection Attributes:**

 - **allows_multiple** (bool)

 - Whether the Connection permits multiple children of specified type(s)
 - Defines cardinality upper bound (0 or 1 vs 0 or many)
 - **Tree-sitter equivalent**: `multiple = True/False`
 - **Note**: Specifies CAN have multiple, not MUST have multiple
 - **requires_presence** (bool)

 - Whether at least one child of specified type(s) MUST be present
 - Defines cardinality lower bound (0 or more vs 1 or more)
 - **Tree-sitter equivalent**: `required = True/False`
 - **Note**: Doesn’t require a specific Connection, just ≥1 from the allowed list

**Cardinality Matrix:**

| requires_presence | allows_multiple | Meaning |
| --- | --- | --- |
| False | False | 0 or 1 (optional single) |
| False | True | 0 or more (optional multiple) |
| True | False | exactly 1 (required single) |
| True | True | 1 or more (required multiple) |

## Tree-sitter Translation Guide
[Section titled “Tree-sitter Translation Guide”](#tree-sitter-translation-guide)
For developers familiar with tree-sitter terminology:

| Tree-sitter Term | CodeWeaver Term | Notes |
| --- | --- | --- |
| Abstract type (with subtypes) | Category | Doesn’t appear in parse trees |
| Named/unnamed node | Thing | Concrete parse tree node |
| Node with no fields | Token | Leaf node |
| Node with fields/children | Composite Thing | Non-leaf node |
| Field | Direct Connection | Has semantic Role |
| Child | Positional Connection | Ordered, no Role |
| Field name | Role | Semantic function |
| Extra | `can_be_anywhere` | Can be anywhere in the AST |
| `named` attribute | `is_explicit_rule` | Has named grammar rule |
| `multiple` attribute | `allows_multiple` | Upper cardinality bound |
| `required` attribute | `requires_presence` | Lower cardinality bound |
| ’root’ attribute | `is_file` | The starting node of the parse tree |

## Design Rationale
[Section titled “Design Rationale”](#design-rationale)
**Why these names?**

 - **Thing**: Simple, clear, unpretentious. “It’s a thing in your code.”
 - **Category**: Universally understood as abstract grouping
 - **Connection**: Graph theory standard; clearer than conflating fields/children/extras
 - **Role**: Describes purpose, not just presence
 - **ConnectionClass**: Explicit enumeration of relationship types

**Empirical validation:**

 - Analysis of 25 languages, 5,000+ node types
 - ~110 unique Categories, ~736 unique Things with category membership
 - 7.9-10.3% of references are polymorphic (Category references)
 - 13.5% of Things have multi-category membership
 - Patterns consistent across language families

**Benefits:**

 - **Clearer mental model**: Separate nodes, edges, and attributes explicitly
 - **Easier to learn**: Intuitive names reduce cognitive load
 - **Better tooling**: Explicit types enable better type checking and validation
 - **Future-proof**: Accommodates real-world patterns (multi-category, polymorphic references)

## Class: `NodeArray`
[Section titled “Class: NodeArray”](#class-nodearray)
Root object for node types file containing array of node type objects.

Attributes: nodes: List of node type objects

### Method: `from_json_data`
[Section titled “Method: from_json_data”](#method-from_json_data)

**

```
from_json_data()
```

Create NodeArray from JSON data.

## Class: `NodeTypeFileLoader`
[Section titled “Class: NodeTypeFileLoader”](#class-nodetypefileloader)
Container for node types files in a directory structure.

Attributes: directory: Directory containing node types files (None to use package resources) files: List of node types file paths

### Method: `get_all_nodes`
[Section titled “Method: get_all_nodes”](#method-get_all_nodes)

**

```
get_all_nodes()
```

Get all nodes from the node types files.

Returns: List of dictionaries containing the language and list of NodeTypeDTOs for that language.

### Method: `get_all_types`
[Section titled “Method: get_all_types”](#method-get_all_types)

**

```
get_all_types()
```

Get all types from a node types files.

Returns: List of dictionaries containing raw data from node types files. This is in the tree-sitter node-types.json format.

### Method: `get_node`
[Section titled “Method: get_node”](#method-get_node)

**

```
get_node()
```

Get the NodeArray for a specific language.

Args: language: The language to get the NodeArray for.

Returns: The NodeArray for the specified language, or None if not found.

## Class: `NodeTypeParser`
[Section titled “Class: NodeTypeParser”](#class-nodetypeparser)
Parses and translates node types files into CodeWeaver’s internal representation.

### Method: `cache_complete`
[Section titled “Method: cache_complete”](#method-cache_complete)

**

```
cache_complete()
```

Check if the internal cache is fully populated for all specified languages.

### Method: `parse_all_nodes`
[Section titled “Method: parse_all_nodes”](#method-parse_all_nodes)

**

```
parse_all_nodes()
```

Parse and translate all node types files into internal representation.

### Method: `parse_for_language`
[Section titled “Method: parse_for_language”](#method-parse_for_language)

**

```
parse_for_language()
```

Parse and translate node types files for a specific language into internal representation.

Args: language: The language to parse node types for.

Returns: List of parsed and translated node types for the specified language.

### Method: `parse_languages`
[Section titled “Method: parse_languages”](#method-parse_languages)

**

```
parse_languages()
```

Parse and translate node types files for a specific set of languages into internal representation.

Args: languages: The languages to parse node types for. If None, will use internal self._languages or all languages if self._languages is empty.

Returns: List of parsed and translated node types for the specified languages.

## Function: `get_things`
[Section titled “Function: get_things”](#function-get_things)

**

```
get_things()
```

Get all Things and Categories from the registry, optionally filtered by language.

Args: languages: Optional list of languages to filter by; if None, returns all Things and Categories.

Returns: List of Things and Categories matching the specified languages.