What are Matryoshka Representation Learning (MRL) embeddings?

MRL embeddings (like OpenAI's text-embedding-3) are trained to contain their most important semantic information in the earliest dimensions. This allows you to truncate the vector (e.g., from 1536 to 256 dimensions) while retaining up to 95%+ of the original semantic accuracy, drastically reducing index size.

What is the difference between Int8 Scalar and Binary Quantization?

Int8 quantization maps Float32 values (-1.0 to 1.0) to integers (-128 to 127), reducing vector size by 4x. Binary (1-bit) quantization maps positive floats to 1 and negative floats to 0, reducing size by 32x. Binary quantization uses Hamming distance instead of Cosine distance for extremely fast search speeds, albeit with a slight drop in accuracy.

Optimizador de Dimensiones de Vectores

Vector Database (RAG) Embeddings Dimension Reducer & Optimizer es una herramienta local de simulación de vectores. Realice truncaciones normalizadas L2 y cuantificaciones a 1-bit directamente en su navegador.

Presets

Vector A (JSON float array)

Vector B (JSON float array)

Matryoshka Truncation Size

Quantization Format

Paste vector floats in Vector A and Vector B to begin calculation. Or load a preset above to test the math.

/**
 * client-reducer.ts
 * Browser-safe embedding dimensionality reducer & optimizer.
 * Designed for Matryoshka Representation Learning (MRL) and local quantization.
 */

export function optimizeEmbedding(
  vector: number[],
  targetDim: number = 256,
  format: 'float32' | 'int8' | 'binary' = 'float32'
): {
  data: Float32Array | Int8Array | Uint8Array;
  scale?: number;
} {
  // 1. Matryoshka Truncation (Slice prefix)
  const sliced = vector.slice(0, targetDim);

  // 2. L2 Re-normalization (Ensure unit length)
  let sumSq = 0;
  for (let i = 0; i < targetDim; i++) {
    sumSq += sliced[i] * sliced[i];
  }
  const norm = Math.sqrt(sumSq);
  const normalized = new Float32Array(targetDim);
  if (norm > 0) {
    for (let i = 0; i < targetDim; i++) {
      normalized[i] = sliced[i] / norm;
    }
  }

  // 3. Apply Quantization Format
  if (format === 'float32') {
    return { data: normalized };
  }

  if (format === 'int8') {
    // Find absolute maximum value for scaling
    let maxVal = 0;
    for (let i = 0; i < targetDim; i++) {
      const abs = Math.abs(normalized[i]);
      if (abs > maxVal) maxVal = abs;
    }
    const scale = maxVal > 0 ? 127 / maxVal : 1;
    const int8Data = new Int8Array(targetDim);
    for (let i = 0; i < targetDim; i++) {
      int8Data[i] = Math.round(normalized[i] * scale);
    }
    return { data: int8Data, scale };
  }

  if (format === 'binary') {
    // Pack every 8 float values into a single byte
    const byteLen = Math.ceil(targetDim / 8);
    const binaryData = new Uint8Array(byteLen);
    for (let i = 0; i < targetDim; i++) {
      if (normalized[i] >= 0) {
        const byteIdx = Math.floor(i / 8);
        const bitIdx = i % 8;
        binaryData[byteIdx] |= (1 << bitIdx);
      }
    }
    return { data: binaryData };
  }

  throw new Error('Unsupported quantization format');
}

/**
 * Helper to calculate Hamming Distance between two Packed Binary Vectors
 */
export function hammingDistance(a: Uint8Array, b: Uint8Array): number {
  let distance = 0;
  for (let i = 0; i < a.length; i++) {
    let xor = a[i] ^ b[i];
    while (xor > 0) {
      if (xor & 1) distance++;
      xor >>= 1;
    }
  }
  return distance;
}

Instrucciones

1
Ingrese dos vectores en formato de matriz float JSON (Vector A y Vector B).
2
Seleccione el tamaño de truncación MRL (Original, 512, 256 o 128).
3
Configure el formato de cuantificación (Float32, Int8 Escalar o Binario de 1-bit).
4
Analice la retención de similitud de coseno, el factor de compresión de memoria y descargue los bloques de código.

Preguntas Frecuentes

Los embeddings MRL (como text-embedding-3 de OpenAI) se entrenan para concentrar la información semántica más importante en las primeras dimensiones. Esto permite truncar el vector (por ejemplo, de 1536 a 256 dimensiones) reteniendo más del 95% de la precisión original y reduciendo drásticamente el espacio de almacenamiento.

La cuantificación Int8 mapea valores flotantes a enteros de 8 bits, reduciendo el tamaño 4x. La cuantificación binaria mapea valores positivos a 1 y negativos a 0, reduciendo el tamaño 32x. La cuantificación binaria permite búsquedas ultrarrápidas mediante distancia Hamming en lugar de coseno.

AI Agent Rule & SKILL.md BuilderJump to tool

Optimizador de Estructura de Caching de Prompts de LLMJump to tool

Extractor de Entidades Local (NLP)Jump to tool

Engineering Guides

Domina Esta Herramienta

Guías detalladas y tutoriales para usuarios avanzados.

Decoding React Server Components Flight Data: A Local-First Audit Guide

Discover the hidden data leaks in your Next.js network stream. Learn how to debug Next.js RSC data and inspect RSC payloads 100% locally.

Read Guide

Vector Dimensionality: Why Misaligned Embeddings Break RAG

Discover why projecting 3072-D embeddings into 1536-D indices destroys semantic retrieval. Learn to audit vector math using Cosine Similarity to prevent AI hallucinations.

Read Guide

Securing AI Agents: How to Detect & Prevent Prompt Injection

A Cybersecurity Architect's guide to prompt injection in 2026. Learn about Token to Shell vectors, RAG poisoning, and embedding-based anomaly detection.

Read Guide

Understanding MCP Transport Layers: stdio vs. HTTP vs. WebSockets

A technical deep dive into Model Context Protocol (MCP) transport mechanisms. Compare stdio, HTTP with SSE, and WebSockets for secure AI agent integration.

Read Guide

Debugging RAG: Cosine vs Euclidean Distance

A technical guide for AI Architects on measuring embedding proximity. Learn to debug RAG retrieval errors using vector math and Cosine Similarity metrics.

Read Guide

Instrucciones

Preguntas Frecuentes

Related Tools

Domina Esta Herramienta

Decoding React Server Components Flight Data: A Local-First Audit Guide

Vector Dimensionality: Why Misaligned Embeddings Break RAG

Securing AI Agents: How to Detect & Prevent Prompt Injection

Understanding MCP Transport Layers: stdio vs. HTTP vs. WebSockets

Debugging RAG: Cosine vs Euclidean Distance