🚀
Codex Editor
  • Project Overview
    • Welcome to Codex
    • Codex Editor
      • Features
      • What was Project Accelerate?
        • Project Philosophy
        • Steering Committee
    • Vision
      • Powerful Tools and Simplicity
      • Streamlining the Translation Process
      • Unopinionated Assistance
    • Architecture
    • Frequently Asked Questions
    • Translator Requirements
    • Roadmap
    • Why VS Code?
    • Multimodality
  • Translator's Copilot
    • What is Translator's Copilot?
    • Information Management
      • Resource Indexing
      • Greek/Hebrew Insights
      • Chat with Resources
      • Project-Based Insights
    • Translation Assistance
      • Translation Drafting
        • Prioritizing Semantics over Structures
        • Reranked Translation Suggestions
      • Quality Checking
        • Token matching evaluation
        • Few-shot evaluation
        • Fine-tuned Bible QA model
        • Character and word n-gram evaluation
        • Simulated GAN evaluation
        • Linguistic Anomaly Detection (LAD)
      • Back Translation
    • Orchestration
      • Translation Memory
      • Multi-Agent Simulations
      • Drafting linguistic resources
    • Intelligent Functions Library
  • Development
    • Codex Basics
      • Projects
      • The Editor
      • Extensions
        • Rendered Views
        • Language Servers
        • Custom Notebooks
        • Global State
    • Experimental Repositories
Powered by GitBook
On this page
  1. Translator's Copilot
  2. Information Management

Resource Indexing

Whenever a new document or file is added to the resources/ directory, we should automatically generate some new, low-level indexes, and optionally or more slowly generate semantic embeddings for the documents.

Low-level indexes might include TF-IDF indexing, topic models, or LSI indexes. Additionally, we could use tokenization-like methods such as gzip compression.

Keyword indexing is handled automatically by VS Code, so we don't need to worry about that.

The reason we don't solely want to rely on semantic indexes is that we may not have an embedding model that works well for the language of the document. Token-based indexes, by contrast, will work consistently across all languages.

PreviousInformation ManagementNextGreek/Hebrew Insights

Last updated 1 year ago