name: langchain-embeddings-search description: 'Build and query vector stores with LangChain 1.0 without getting burned by
flipped score semantics, embedding-dim mismatches, reranker quirks, and
chunk-splitter bugs. Use when building a RAG pipeline, choosing between FAISS /
Pinecone / Chroma / PGVector, filtering by similarity score, or adding a reranker.
Trigger with "langchain embeddings", "vector store similarity search",
"langchain RAG retrieval", "FAISS score", "Pinecone score", "reranker score".
' allowed-tools: Read, Write, Edit, Bash(python:), Bash(pip:), Grep version: 2.0.0 license: MIT author: Jeremy Longshore jeremy@intentsolutions.io tags:
- saas
- langchain
- python
- langchain-1.0
- embeddings
- rag
- vector-store compatibility: Designed for Claude Code, also compatible with Codex
LangChain Embeddings and Vector Search (Python)
Overview
FAISS.similarity_search_with_score() returns L2 distance — lower is better.
Pinecone.similarity_search_with_score() returns cosine similarity — higher is
better. Swap your vector store and your if score > 0.8 filter now keeps the
garbage and drops the good results, silently. This is pain-catalog entry P12,
and it is the single most common reason a "we migrated from FAISS to Pinecone
for scale" project loses retrieval quality overnight.
The sibling gotchas:
- P13 —
RecursiveCharacterTextSplitterdefault separators break inside code fences, so RAG over Markdown docs truncates code examples mid-function - P14 — Embedding-dim mismatch crashes at insert time (after 10 minutes of
processing), not at
VectorStore.__init__; the failure blames "dim mismatch: 1536 != 3072" and no earlier error - P15 — Cohere/Jina reranker scores are within-query relative, so a 0.34 top-1 is not worse than a 0.92 top-1 on a different query; filtering by threshold is the wrong heuristic
This skill walks through embedding model selection, vector store creation with
the version-safe dim guard, score normalization, hybrid keyword+vector search,
and rerankers with the correct filter-by-rank pattern. Pin: langchain-core 1.0.x,
langchain-community 1.0.x, langchain-openai 1.0.x, faiss-cpu, pinecone-client.
Pain-catalog anchors: P12, P13, P14, P15, P49, P50.
Prerequisites
- Python 3.10+
langchain-core >= 1.0, < 2.0andlangchain-community >= 1.0, < 2.0- Embedding provider:
pip install langchain-openai(text-embedding-3-small/large) - Vector store:
pip install faiss-cpuORpip install langchain-pinecone - Provider API keys:
OPENAI_API_KEY,PINECONE_API_KEY
Instructions
Step 1 — Initialize embeddings with an explicit dim
from langchain_openai import OpenAIEmbeddings
embeddings = OpenAIEmbeddings(
model="text-embedding-3-small", # 1536 dims
# For text-embedding-3-large, use 3072 dims — must match index
)
# Assert dim at startup (prevents P14)
assert len(embeddings.embed_query("test")) == 1536, "embedding dim drifted"
Swapping models (-small 1536 → -large 3072) is a migration, not a swap.
Plan it — back-fill the index, not just the config.
Step 2 — Choose a vector store
| Store | Score metric | Latency (1M vectors) | When to use |
|---|---|---|---|
FAISS | L2 distance (lower = better) | ~5ms | Local dev, < 1M vectors, in-process |
Chroma | Cosine similarity (higher = better) | ~10ms | Small multi-user, persistent local |
PGVector | Cosine by default (higher = better) | ~20ms | Existing Postgres, transactional needs |
PineconeVectorStore | Cosine similarity (higher = better) | ~50ms (hosted) | > 1M vectors, multi-tenant, managed |
from langchain_community.vectorstores import FAISS
store = FAISS.from_documents(docs, embedding=embeddings)
results = store.similarity_search_with_score("query", k=5)
# FAISS: [(doc, 0.31), (doc, 0.42), ...] — LOWER IS MORE SIMILAR
vs.
from langchain_pinecone import PineconeVectorStore
store = PineconeVectorStore(index_name="prod", embedding=embeddings)
results = store.similarity_search_with_score("query", k=5)
# Pinecone: [(doc, 0.91), (doc, 0.87), ...] — HIGHER IS MORE SIMILAR
See Vector Store Comparison for the feature matrix and the migration gotchas.
Step 3 — Normalize scores before any threshold filter
Write a normalizer at the retriever boundary, so downstream code never sees raw store-specific scores:
def normalize(score: float, store_type: str) -> float:
"""Return similarity in [0, 1] where 1 = identical, 0 = unrelated."""
if store_type == "faiss_l2":
return 1.0 / (1.0 + score) # collapse L2 distance into similarity
if store_type in {"pinecone", "chroma", "pgvector"}:
return max(0.0, min(1.0, score)) # already similarity, clamp just in case
raise ValueError(f"Unknown store type: {store_type}")
Now score > 0.7 means the same thing regardless of backend. See
Score Semantics for the per-store derivation.
Step 4 — Chunk text with language-aware splitters
from langchain_text_splitters import RecursiveCharacterTextSplitter, Language
# BAD — breaks inside Markdown code fences (P13)
bad = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
# GOOD — respects Markdown structure
md_splitter = RecursiveCharacterTextSplitter.from_language(
Language.MARKDOWN, chunk_size=1000, chunk_overlap=100,
)
# For Python source files
py_splitter = RecursiveCharacterTextSplitter.from_language(
Language.PYTHON, chunk_size=1500, chunk_overlap=150,
)
PDF pipelines have their own pain: PyPDFLoader splits by page, tearing tables
in half (P49). Use PyMuPDFLoader or UnstructuredPDFLoader for documents
with tables.
Step 5 — Hybrid search (keyword + vector)
Pure vector search misses exact-match keywords (product SKUs, error codes, function names). Combine BM25 + vector:
from langchain.retrievers import EnsembleRetriever
from langchain_community.retrievers import BM25Retriever
bm25 = BM25Retriever.from_documents(docs); bm25.k = 5
vector = store.as_retriever(search_kwargs={"k": 5})
ensemble = EnsembleRetriever(
retrievers=[bm25, vector],
weights=[0.4, 0.6], # tune on your eval set
)
See Hybrid Search for the eval harness and the weight-tuning procedure.
Step 6 — Rerank by rank, not by score
from langchain_cohere import CohereRerank
reranker = CohereRerank(top_n=3, model="rerank-v3.5")
reranked = reranker.compress_documents(
documents=candidates, query=query,
)
# reranked[0].metadata["relevance_score"] is query-relative — 0.34 may be the best
# WRONG: [d for d in reranked if d.metadata["relevance_score"] > 0.5]
# RIGHT: reranked[:top_n] — trust the rank order
Filter by rank (keep top-k) not threshold. Calibration per-query is possible but rarely worth the engineering cost.
Output
- Embeddings initialized with dim assertion at startup
- Vector store chosen from the comparison matrix with score-semantics awareness
- Score normalizer applied at retriever boundary (no raw scores downstream)
- Language-aware text splitter that respects code fences and PDF structure
- Hybrid retriever combining BM25 and vector with tuned weights
- Reranker filtering by rank, not threshold
Error Handling
| Error | Cause | Fix |
|---|---|---|
PineconeApiException: dim mismatch: 1536 != 3072 | Changed embedding model without reindexing (P14) | Create a new index with the new dim; migrate in a background job |
| Retrieval quality drops after FAISS→Pinecone swap | Score semantics flipped (P12) | Apply normalize() at boundary; retune threshold on eval set |
| RAG answers misquote tables | PyPDFLoader tore table across pages (P49) | Switch to PyMuPDFLoader or UnstructuredPDFLoader |
| RAG retrieval drops code examples mid-function | RecursiveCharacterTextSplitter broke code fence (P13) | Use from_language(Language.MARKDOWN/PYTHON) |
| Cohere reranker top-1 score < 0.5 | Scores are per-query relative (P15) | Filter by rank (reranked[:k]), not threshold |
WebBaseLoader returns 403 / Cloudflare interstitial (P50) | Default User-Agent flagged as bot | Pass header_template={"User-Agent": "Mozilla/5.0 ..."}; respect robots.txt |
ValueError: expected str instance, NoneType found on embed | Empty document content | Filter docs = [d for d in docs if d.page_content.strip()] before embedding |
Examples
Building a RAG retriever with hybrid search
End-to-end: load Markdown docs with language-aware chunking, embed with OpenAI
text-embedding-3-small, index in FAISS for local dev, wrap in an
EnsembleRetriever with BM25 at 0.4 weight and vector at 0.6.
See Hybrid Search for the full builder and the weight-tuning procedure on a golden set.
Migrating from FAISS to Pinecone without quality regression
The three gotchas: (a) score semantics flip (P12), (b) the migration needs a re-embed unless the source embedding is stable, (c) threshold filters must be retuned on the new score scale.
See Vector Store Comparison for the migration checklist.
Per-tenant vector stores without leakage
Use Pinecone namespaces or PGVector row-level security. Construct the retriever per-request with the tenant ID — never bind a retriever at import time (P33).
See the pack's langchain-enterprise-rbac skill for the tenant-isolation pattern.
Resources
- LangChain Python: Vector stores
- LangChain Python: Retrievers
- LangChain: Text splitters
- FAISS docs (score is L2 distance)
- Pinecone metrics (cosine default)
- Cohere Rerank (score per-query relative)
- Pack pain catalog:
docs/pain-catalog.md(entries P12, P13, P14, P15, P49, P50)