What are Matryoshka Representation Learning (MRL) embeddings?

MRL embeddings (like OpenAI's text-embedding-3) are trained to contain their most important semantic information in the earliest dimensions. This allows you to truncate the vector (e.g., from 1536 to 256 dimensions) while retaining up to 95%+ of the original semantic accuracy, drastically reducing index size.

What is the difference between Int8 Scalar and Binary Quantization?

Int8 quantization maps Float32 values (-1.0 to 1.0) to integers (-128 to 127), reducing vector size by 4x. Binary (1-bit) quantization maps positive floats to 1 and negative floats to 0, reducing size by 32x. Binary quantization uses Hamming distance instead of Cosine distance for extremely fast search speeds, albeit with a slight drop in accuracy.

Réducteur & Optimiseur de Vecteurs

Vector Database (RAG) Embeddings Dimension Reducer & Optimizer est un outil local. Simulez la troncature normalisée L2 et les quantifications binaires 1-bit directement dans votre navigateur.

Presets

Vector A (JSON float array)

Vector B (JSON float array)

Matryoshka Truncation Size

Quantization Format

Paste vector floats in Vector A and Vector B to begin calculation. Or load a preset above to test the math.

/**
 * client-reducer.ts
 * Browser-safe embedding dimensionality reducer & optimizer.
 * Designed for Matryoshka Representation Learning (MRL) and local quantization.
 */

export function optimizeEmbedding(
  vector: number[],
  targetDim: number = 256,
  format: 'float32' | 'int8' | 'binary' = 'float32'
): {
  data: Float32Array | Int8Array | Uint8Array;
  scale?: number;
} {
  // 1. Matryoshka Truncation (Slice prefix)
  const sliced = vector.slice(0, targetDim);

  // 2. L2 Re-normalization (Ensure unit length)
  let sumSq = 0;
  for (let i = 0; i < targetDim; i++) {
    sumSq += sliced[i] * sliced[i];
  }
  const norm = Math.sqrt(sumSq);
  const normalized = new Float32Array(targetDim);
  if (norm > 0) {
    for (let i = 0; i < targetDim; i++) {
      normalized[i] = sliced[i] / norm;
    }
  }

  // 3. Apply Quantization Format
  if (format === 'float32') {
    return { data: normalized };
  }

  if (format === 'int8') {
    // Find absolute maximum value for scaling
    let maxVal = 0;
    for (let i = 0; i < targetDim; i++) {
      const abs = Math.abs(normalized[i]);
      if (abs > maxVal) maxVal = abs;
    }
    const scale = maxVal > 0 ? 127 / maxVal : 1;
    const int8Data = new Int8Array(targetDim);
    for (let i = 0; i < targetDim; i++) {
      int8Data[i] = Math.round(normalized[i] * scale);
    }
    return { data: int8Data, scale };
  }

  if (format === 'binary') {
    // Pack every 8 float values into a single byte
    const byteLen = Math.ceil(targetDim / 8);
    const binaryData = new Uint8Array(byteLen);
    for (let i = 0; i < targetDim; i++) {
      if (normalized[i] >= 0) {
        const byteIdx = Math.floor(i / 8);
        const bitIdx = i % 8;
        binaryData[byteIdx] |= (1 << bitIdx);
      }
    }
    return { data: binaryData };
  }

  throw new Error('Unsupported quantization format');
}

/**
 * Helper to calculate Hamming Distance between two Packed Binary Vectors
 */
export function hammingDistance(a: Uint8Array, b: Uint8Array): number {
  let distance = 0;
  for (let i = 0; i < a.length; i++) {
    let xor = a[i] ^ b[i];
    while (xor > 0) {
      if (xor & 1) distance++;
      xor >>= 1;
    }
  }
  return distance;
}

Instructions

1
Fournissez deux vecteurs sous forme de tableau de floats JSON (Vecteur A et Vecteur B).
2
Sélectionnez une taille de troncature MRL cible (Original, 512, 256 ou 128).
3
Configurez le format de quantification (Float32, Int8 Scalaire ou Binaire 1-bit).
4
Analysez le taux de rétention de similitude cosinus, le facteur de compression mémoire et téléchargez les blocs de code.
5
Copiez le code du réducteur client ou le schéma SQL pgvector.

Questions Fréquemment Posées

Les embeddings MRL (comme text-embedding-3 d'OpenAI) sont entraînés pour regrouper les informations sémantiques les plus importantes dans les premières dimensions. Cela permet de tronquer le vecteur (par exemple, de 1536 à 256 dimensions) tout en conservant plus de 95% de la précision d'origine, réduisant ainsi considérablement la taille de l'index.

La quantification Int8 mappe les valeurs décimales sur des entiers de 8 bits, réduisant la taille par 4. La quantification binaire mappe les valeurs positives à 1 et négatives à 0, réduisant la taille par 32. La quantification binaire permet des recherches extrêmement rapides via la distance de Hamming au lieu de la similitude cosinus.

AI Agent Rule & SKILL.md BuilderJump to tool

Optimiseur de Structure de Caching de Prompts LLMJump to tool

Extracteur d'Entités Local (NLP)Jump to tool

Engineering Guides

Maîtrisez Cet Outil

Guides approfondis et tutoriels pour les experts.

Decoding React Server Components Flight Data: A Local-First Audit Guide

Discover the hidden data leaks in your Next.js network stream. Learn how to debug Next.js RSC data and inspect RSC payloads 100% locally.

Read Guide

Vector Dimensionality: Why Misaligned Embeddings Break RAG

Discover why projecting 3072-D embeddings into 1536-D indices destroys semantic retrieval. Learn to audit vector math using Cosine Similarity to prevent AI hallucinations.

Read Guide

Securing AI Agents: How to Detect & Prevent Prompt Injection

A Cybersecurity Architect's guide to prompt injection in 2026. Learn about Token to Shell vectors, RAG poisoning, and embedding-based anomaly detection.

Read Guide

Understanding MCP Transport Layers: stdio vs. HTTP vs. WebSockets

A technical deep dive into Model Context Protocol (MCP) transport mechanisms. Compare stdio, HTTP with SSE, and WebSockets for secure AI agent integration.

Read Guide

Debugging RAG: Cosine vs Euclidean Distance

A technical guide for AI Architects on measuring embedding proximity. Learn to debug RAG retrieval errors using vector math and Cosine Similarity metrics.

Read Guide

Instructions

Questions Fréquemment Posées

Related Tools

Maîtrisez Cet Outil

Decoding React Server Components Flight Data: A Local-First Audit Guide

Vector Dimensionality: Why Misaligned Embeddings Break RAG

Securing AI Agents: How to Detect & Prevent Prompt Injection

Understanding MCP Transport Layers: stdio vs. HTTP vs. WebSockets

Debugging RAG: Cosine vs Euclidean Distance