Local Knowledge Base

Overview

The local knowledge base lets you quickly associate local file directories and leverage vector embedding and full-text search technology to help AI understand and utilize your file content. All operations run locally, and your data never leaves your device.

Core Features

  • Quick Association — One-click selection of local folders as knowledge base
  • Vectorization — Automatically vectorize file content with semantic search support (based on BGE model)
  • Full-Text Search — Fast and precise keyword matching (using Jieba tokenizer)
  • Local Execution — All operations run locally, protecting your data privacy
  • Intelligent Context — AI chat automatically retrieves relevant knowledge as context
  • Multi-Format Support — Supports PDF, Word, Excel, PowerPoint, Markdown, and plain text

Adding a Knowledge Base

  1. Click Knowledge Base in the ClawSky sidebar
  2. Click the + Add Knowledge Base button
  3. Select a local folder
  4. The system automatically scans, parses, and vectorizes file content

Using in Chat

Once you've added a knowledge base, when you ask questions in AI chat:

  1. The system automatically retrieves relevant knowledge base content
  2. The retrieved context is sent to the AI
  3. The AI generates more accurate answers based on your knowledge base content

You can also explicitly specify which knowledge base to use in the chat interface, or manually select which knowledge bases to search.

Privacy & Security

ClawSky's local knowledge base runs entirely on your device. File contents are never uploaded to any remote server, and all vectorization and search operations are completed locally. This ensures your sensitive information, code, documents, and other data remain completely private.

Technical Implementation

  • Document Parsing — Automatic parsing of multiple formats: PDF, DOCX, XLSX, PPTX, Markdown, and plain text
  • Tokenization — Jieba tokenizer library for Chinese and English word segmentation
  • Vector Embedding — BGE model-based ONNX inference, completely offline
  • Storage — DuckDB for efficient vector and text data storage and querying
  • Search — Supports both vector similarity search and full-text keyword search

Next Steps