Splitters Crate
Recoco Splitters
Section titled โRecoco SplittersโIntelligent text splitting and parsing for Recoco.
This crate implements sophisticated text splitting strategies, primarily leveraging Tree-sitter to perform syntax-aware chunking of source code and structured documents.
๐ Why Tree-sitter?
Section titled โ๐ Why Tree-sitter?โStandard text splitters often break code in the middle of functions or classes, destroying context. recoco-splitters understands the syntax of the language it is processing, ensuring that chunks respect logical boundaries (e.g., keeping a whole function together).
๐ฆ Supported Languages
Section titled โ๐ฆ Supported LanguagesโTo minimize binary size, Recoco feature-gates every language parser. Enable only what you need in your Cargo.toml.
[dependencies]recoco-splitters = { version = "...", features = ["python", "rust"] }| Feature | Language |
|---|---|
all | all languages |
c | C |
c-sharp | C# |
cpp | C++ |
css | CSS |
fortran | Fortran |
go | Go |
html | HTML |
java | Java |
javascript | JavaScript |
json | JSON |
kotlin | Kotlin |
markdown | Markdown |
php | PHP |
python | Python |
r | R |
ruby | Ruby |
rust | Rust |
scala | Scala |
solidity | Solidity |
sql | SQL |
swift | Swift |
toml | TOML |
typescript | TypeScript |
xml | XML |
yaml | YAML |
๐งฉ Splitter Strategies
Section titled โ๐งฉ Splitter Strategiesโ- Recursive Character Splitter: Standard splitting by separators (paragraphs, newlines, etc.).
- Recursive Syntax Splitter: Tree-sitter based splitting that respects code blocks and syntax nodes.
๐ License
Section titled โ๐ LicenseโApache-2.0. See main repository for details.