---
title: discovery | CodeWeaver Docs
description: API reference for codeweaver.core.discovery
url: "https://docs.knitli.com/api/core/discovery"
type: static
generatedAt: "2026-04-17T17:21:08.011Z"
---

# discovery
       [Open in ChatGPT](https://chatgpt.com/?q=Read%20https%3A%2F%2Fdocs.knitli.com%2Fcodeweaver%2Fapi%2Fcore%2Fdiscovery%2F.%20I%20want%20to%20ask%20questions%20about%20it.)[Open in Claude](https://claude.ai/new?q=Read%20https%3A%2F%2Fdocs.knitli.com%2Fcodeweaver%2Fapi%2Fcore%2Fdiscovery%2F.%20I%20want%20to%20ask%20questions%20about%20it.)[View in Markdown](/codeweaver/api/core/discovery.md)       [Share on LinkedIn](https://www.linkedin.com/sharing/share-offsite/?url=https%3A%2F%2Fdocs.knitli.com%2Fcodeweaver%2Fapi%2Fcore%2Fdiscovery%2F)[Share on X](https://x.com/intent/tweet?url=https%3A%2F%2Fdocs.knitli.com%2Fcodeweaver%2Fapi%2Fcore%2Fdiscovery%2F&text=discovery)[Share on Threads](https://threads.net/intent/post?url=https%3A%2F%2Fdocs.knitli.com%2Fcodeweaver%2Fapi%2Fcore%2Fdiscovery%2F&text=discovery)[Share on Bluesky](https://bsky.app/intent/compose?text=discovery%20https%3A%2F%2Fdocs.knitli.com%2Fcodeweaver%2Fapi%2Fcore%2Fdiscovery%2F)[Share on Facebook](https://www.facebook.com/sharer/sharer.php?u=https%3A%2F%2Fdocs.knitli.com%2Fcodeweaver%2Fapi%2Fcore%2Fdiscovery%2F)[Share on Reddit](https://reddit.com/submit?url=https%3A%2F%2Fdocs.knitli.com%2Fcodeweaver%2Fapi%2Fcore%2Fdiscovery%2F&title=discovery)[Share on Hacker News](https://news.ycombinator.com/submitlink?u=https%3A%2F%2Fdocs.knitli.com%2Fcodeweaver%2Fapi%2Fcore%2Fdiscovery%2F&t=discovery)[Share on Email](mailto:?subject=discovery&body=https%3A%2F%2Fdocs.knitli.com%2Fcodeweaver%2Fapi%2Fcore%2Fdiscovery%2F)[Share on WhatsApp](https://wa.me/?text=discovery%20https%3A%2F%2Fdocs.knitli.com%2Fcodeweaver%2Fapi%2Fcore%2Fdiscovery%2F)[Share on Telegram](https://t.me/share/url?url=https%3A%2F%2Fdocs.knitli.com%2Fcodeweaver%2Fapi%2Fcore%2Fdiscovery%2F&text=discovery)
# `codeweaver.core.discovery`
[Section titled “codeweaver.core.discovery”](#codeweavercorediscovery)
Defines the DiscoveredFile dataclass representing files found during project scanning.

## Class: `DiscoveredFile`
[Section titled “Class: DiscoveredFile”](#class-discoveredfile)
Represents a file discovered during project scanning.

`DiscoveredFile` instances are immutable and hashable, making them suitable for use in sets and as dictionary keys, and ensuring that their state cannot be altered after creation. In CodeWeaver operations, they are created using the `from_path` method when scanning and indexing a codebase.

### Method: `from_chunk`
[Section titled “Method: from_chunk”](#method-from_chunk)

**

```
from_chunk()
```

Create a DiscoveredFile from a CodeChunk, if it has a valid file_path.

### Method: `from_path`
[Section titled “Method: from_path”](#method-from_path)

**

```
from_path()
```

Create a DiscoveredFile from a file path.

### Method: `is_path_binary`
[Section titled “Method: is_path_binary”](#method-is_path_binary)

**

```
is_path_binary()
```

Check if a file at path is binary by reading its first 1024 bytes.

### Method: `is_path_text`
[Section titled “Method: is_path_text”](#method-is_path_text)

**

```
is_path_text()
```

Check if a file at path is text.

### Method: `is_same`
[Section titled “Method: is_same”](#method-is_same)

**

```
is_same()
```

Checks if a file at other_path is the same as this one, by comparing blake3 hashes.

The other can be in a different location (paths not the same), useful for checking if a file has been moved or copied, or deduping files (we can just point to one copy).

### Method: `normalize_content`
[Section titled “Method: normalize_content”](#method-normalize_content)

**

```
normalize_content()
```

Normalize file content by ensuring it’s a UTF-8 string.

## Function: `compute_semantic_file_hash`
[Section titled “Function: compute_semantic_file_hash”](#function-compute_semantic_file_hash)

**

```
compute_semantic_file_hash()
```

Compute a file hash using AST-based hashing for supported semantic languages.

For files with a supported AST language (Python, JavaScript, etc.), parse the file to an AST and hash the canonical tree representation. This ignores comments, whitespace, and formatting changes so that only genuine semantic modifications trigger a different hash.

Fall back to a raw content blake3 hash for unsupported languages or when AST parsing fails.