name: add-embedding-support description: Add Qdrant embedding support to v3 WordPress components for RAG chatbot. Implements component-level content chunking for searchable, structured embeddings. Use when adding embedding to new or existing v3 components.
Add Embedding Support Skill
You are helping add Qdrant embedding support to WordPress v3 components. This enables component content to be indexed and searched via a RAG-based chatbot powered by Claude's API.
System Overview
The embedding system:
- Chunks at component level: Each component becomes one or more embedding chunks
- Avoids sub-component loading: Write extraction code directly in the component class
- Supports sections: Complex components add multiple sections (sub-chunks) per instance
- Tracks metadata: Links, dates, and custom metadata stored separately
- Respects skip markers: Components can opt-out via
ComponentEmbeddingSkipAwareInterface
How It Works
- CLI command
wp vendi embedding:generateruns - Sets global constant
VENDI_RENDER_CONTEXTtoRenderingContextEnum::EMBEDDING - Loads each component via
vendi_load_component_v3() - Template detects context and returns component instance (no HTML rendering)
- Component's
getEmbedding()method extracts structured data ComponentEmbeddingDTO formats data into JSON chunks for Qdrant
Output Format
Each component produces a JSON object like this:
{
"content": "Heading: Ask a Researcher\nBody: Are you a CRNA with research questions?\nLinks: Contact us",
"metadata": {
"type": "page",
"url": "https://example.com/page/",
"created": "2022-11-29T21:01:08+00:00",
"updated": "2024-03-07T09:07:06+00:00",
"links": [
{
"text": "Contact us",
"url": "https://example.com/contact/"
}
],
"component_type": "content_callout_full_width"
},
"id": "660-3"
}
Component Type Classification
Embeddable Components
- Implements
ComponentEmbeddingAwareInterface - Provides
getEmbedding()method - Content is indexed for chatbot
Skippable Components
- Implements
ComponentEmbeddingSkipAwareInterface(marker interface) - No
getEmbedding()method needed - Ignored during embedding generation
- Use for: ads, navigation, forms, decorative elements
Simple Components
- Single chunk with heading and/or body
- No repeater fields
- Auto-extraction via interfaces
Complex Components
- Multiple sections from repeater/flexible content
- Each item becomes a separate section (sub-chunk)
- May include links/CTAs tracked in metadata
Implementation Patterns
Pattern 1: Simple Component (Single Chunk)
When to use: Component has just heading and/or body copy, no repeater fields
Choosing the Right Interfaces
IMPORTANT: Inspect the actual template file to determine which interfaces to implement:
-
PrimaryHeadingInterface- Use when template displays a component-level heading (outside loops)- Example:
<h2><?php esc_html_e(get_sub_field('headline')); ?></h2>at the top level - NOT for headings inside repeater loops
- Example:
-
PrimaryCopyInterface- Use when template displays component-level body/intro copy (outside loops)- Example:
<?php echo wp_kses_post(get_sub_field('intro_copy')); ?>before any repeaters - NOT for copy inside repeater loops
- Example:
The interfaces should map to what actually exists in the template structure.
Required Interfaces
use Vendi\Theme\ComponentInterfaces\ComponentEmbeddingAwareInterface;
use Vendi\Theme\ComponentInterfaces\PrimaryHeadingInterface; // If template has top-level heading
use Vendi\Theme\ComponentInterfaces\PrimaryCopyInterface; // If template has top-level copy
use Vendi\Theme\DTO\Embedding\ComponentEmbedding;
use Vendi\Theme\DTO\Embedding\ComponentEmbeddingInterface;
Class Implementation
class simple_component extends BaseComponent implements
ComponentEmbeddingAwareInterface,
PrimaryHeadingInterface, // Only if template has top-level heading
PrimaryCopyInterface // Only if template has top-level copy
{
public function getEmbedding(): ?ComponentEmbeddingInterface
{
return ComponentEmbedding::fromComponent($this);
}
public function getPrimaryHeadingText(): ?string
{
// Return the field that corresponds to the top-level heading in template
return get_sub_field('headline');
}
public function getPrimaryCopy(): ?string
{
// Return the field that corresponds to the top-level copy in template
return get_sub_field('copy');
}
}
Output
Heading: [from getPrimaryHeadingText() if interface implemented]
Body: [from getPrimaryCopy() if interface implemented]
Key Points:
- Inspect template first to determine which interfaces are needed
fromComponent()auto-extracts heading and body via interfaces- Single chunk per component instance
- No manual section creation needed
- Don't guess at structure - base decision on actual template code
Pattern 2: Skippable Component (No Embedding)
When to use: Ads, navigation, forms, decorative/visual-only elements
Required Interface
use Vendi\Theme\ComponentInterfaces\ComponentEmbeddingSkipAwareInterface;
Class Implementation
class ad_component extends VendiComponent implements ComponentEmbeddingSkipAwareInterface
{
// No getEmbedding() method needed
// Component completely ignored during embedding generation
}
Key Points:
- Empty marker interface
- No embedding logic required
- Still add template boilerplate (see Template Requirements)
Pattern 3: Complex Component with Sections
When to use: Component has repeater or flexible content fields where each item should be a separate section
Required Interfaces
use Vendi\Theme\ComponentInterfaces\ComponentEmbeddingAwareInterface;
use Vendi\Theme\ComponentInterfaces\PrimaryHeadingInterface;
use Vendi\Theme\ComponentInterfaces\PrimaryCopyInterface;
use Vendi\Theme\DTO\Embedding\ComponentEmbedding;
use Vendi\Theme\DTO\Embedding\ComponentEmbeddingInterface;
Class Implementation
public function getEmbedding(): ?ComponentEmbeddingInterface
{
// Start with base embedding (auto-extracts heading/body from interfaces)
$ret = ComponentEmbedding::fromComponent($this);
// Loop through repeater field
while (have_rows('items')) {
the_row();
$layout = get_row_layout();
// CRITICAL: Filter to relevant layouts only
if (!in_array($layout, ['content_item', 'text_block'], true)) {
continue;
}
$heading = get_sub_field('heading');
$copy = get_sub_field('copy');
// CRITICAL: Always clean HTML from user content
$cleanCopy = ComponentEmbedding::stripAllHtmlFromText($copy);
// Add section with optional custom label
$ret->addSection(
$heading . PHP_EOL . $cleanCopy,
'Section' // Optional: 'FAQ Item', 'Testimonial', etc.
);
}
return $ret;
}
Output
Heading: [component main heading]
Body: [component intro copy]
Section 1: Item 1 Heading
[item 1 copy]
Section 2: Item 2 Heading
[item 2 copy]
Key Points:
- Filter layouts to process only relevant types
- Use
stripAllHtmlFromText()for all HTML content - Each
addSection()creates a separate sub-chunk - Sections are auto-numbered (Section 1, Section 2, etc.)
Pattern 4: Component with Links/CTAs
When to use: Component has call-to-action buttons or links that should be tracked in metadata
Class Implementation
public function getEmbedding(): ?ComponentEmbeddingInterface
{
$ret = ComponentEmbedding::fromComponent($this);
while (have_rows('cards')) {
the_row();
$heading = get_sub_field('heading');
$copy = get_sub_field('copy');
$link = get_sub_field('cta');
// Build structured content with labels
$contentParts = [];
if ($heading) {
$contentParts[] = 'Heading: ' . $heading;
}
if ($copy) {
$contentParts[] = 'Body: ' . $copy;
}
if ($link && is_array($link)) {
$contentParts[] = 'Link: ' . $link['title'];
}
// Only add section if there's content
if ($content = implode(PHP_EOL, array_filter($contentParts))) {
$ret->addSection($content);
}
// CRITICAL: Track link separately in metadata
if ($link && is_array($link)) {
$ret->addLink(
linkText: $link['title'] ?? '',
linkUrl: $link['url'] ?? ''
);
}
}
return $ret;
}
Output
{
"content": "Heading: Component Title\nBody: Intro text\nLinks: Card 1 CTA, Card 2 CTA\nSection 1:\nHeading: Card 1\nBody: Card 1 copy\nLink: Card 1 CTA",
"metadata": {
"links": [
{
"text": "Card 1 CTA",
"url": "/page1/"
},
{
"text": "Card 2 CTA",
"url": "/page2/"
}
],
"component_type": "card_navigation"
}
}
Key Points:
- Links appear in both content text and metadata
- Metadata links enable advanced RAG features
- Use structured content with labels (Heading:, Body:, Link:)
- Filter empty content before adding sections
Pattern 5: Component with HTML Content Containing Links
When to use: Component has HTML content (bios, articles, descriptions) with embedded <a> tags that should be tracked
Class Implementation
public function getEmbedding(): ?ComponentEmbeddingInterface
{
$ret = ComponentEmbedding::fromComponent($this);
while (have_rows('items')) {
the_row();
$name = get_sub_field('name');
$bio = get_sub_field('bio'); // Contains HTML with links
// CRITICAL: Extract links BEFORE stripping HTML
// Use name as prefix for context
ComponentEmbedding::extractAndAddLinksFromHtml($ret, $bio, $name);
// Now strip HTML for text content
$cleanBio = ComponentEmbedding::stripAllHtmlFromText($bio);
$ret->addSection(
'Name: ' . $name . PHP_EOL . 'Bio: ' . $cleanBio,
'Person'
);
}
return $ret;
}
Output
If bio contains: <p>Follow me on <a href="https://twitter.com/jdoe">Twitter</a></p>
{
"content": "Person 1: Name: John Doe\nBio: Follow me on Twitter",
"metadata": {
"links": [
{"text": "John Doe Twitter", "url": "https://twitter.com/jdoe"}
]
}
}
Key Points:
- Call
extractAndAddLinksFromHtml()BEFOREstripAllHtmlFromText() - Use contextual prefix (name, title, etc.) to avoid duplicate generic link text
- Links preserved in metadata even after HTML is stripped from content
Pattern 6: Component with Related Posts
When to use: Component displays content from related WP_Post objects (testimonials, people, etc.)
Class Implementation
public function getEmbedding(): ?ComponentEmbeddingInterface
{
$ret = ComponentEmbedding::fromComponent($this);
foreach ($this->getRelatedPosts() as $post) {
// CRITICAL: Validate post object before accessing fields
if (!$post instanceof WP_Post) {
continue;
}
$name = get_field('name', $post->ID);
$bio = get_field('bio', $post->ID);
// Clean HTML and add with custom section label
$ret->addSection(
$name . PHP_EOL . ComponentEmbedding::stripAllHtmlFromText($bio),
'Person' // Custom label: 'Testimonial', 'Team Member', etc.
);
}
return $ret;
}
Key Points:
- Always check
instanceof WP_Postbefore accessing post fields - Access fields with post ID:
get_field('field_name', $post->ID) - Use descriptive section labels
Template File Requirements
CRITICAL: Every embeddable component template must include this boilerplate at the top.
Required Boilerplate
<?php
use Vendi\Theme\Component\{component_name};
use Vendi\Theme\ComponentUtility;
use Vendi\Theme\Enums\RenderingContextEnum;
/** @var {component_name} $component */
$component = ComponentUtility::get_new_component_instance({component_name}::class);
// CRITICAL: Early return for embedding context
if (defined('VENDI_RENDER_CONTEXT') && VENDI_RENDER_CONTEXT === RenderingContextEnum::EMBEDDING->value) {
return $component;
}
if (!$component->renderComponentWrapperStart()) {
return;
}
?>
<!-- HTML template here -->
<?php
$component->renderComponentWrapperEnd();
Why This Matters
Without the embedding context check:
- Template will render HTML instead of returning component instance
getEmbedding()method will never be called- Component will be skipped in embedding output
This boilerplate is required even for skippable components (for consistency).
Key Methods & Utilities
ComponentEmbedding Static Factory
fromComponent($this)
Purpose: Create base embedding with auto-extraction
Auto-extracts:
- Component type (class short name)
- Post ID and URL
- Creation and modification dates
- Primary heading (if
PrimaryHeadingInterfaceimplemented - based on template inspection) - Primary body copy (if
PrimaryCopyInterfaceimplemented - based on template inspection)
Usage: Always first line of getEmbedding()
public function getEmbedding(): ?ComponentEmbeddingInterface
{
$ret = ComponentEmbedding::fromComponent($this);
// ... add sections, links, etc.
return $ret;
}
Note: The heading and body auto-extraction only works if you've implemented the corresponding interfaces based on what actually exists in the template (see Pattern 1 for details).
Content Building Methods
addSection(string $text, string $sectionLabel = 'Section')
Adds a labeled section to the embedding. Sections are auto-numbered (Section 1, Section 2, etc.).
Best Practice: Use descriptive labels
// Good: Descriptive
$ret->addSection($content, 'Testimonial');
$ret->addSection($content, 'FAQ Item');
$ret->addSection($content, 'Team Member');
// Acceptable: Default auto-numbering
$ret->addSection($content); // "Section 1", "Section 2", etc.
addLink(string $linkText, string $linkUrl)
Adds a link to metadata. Links stored separately from content text for advanced RAG features.
if ($link && is_array($link)) {
$ret->addLink(
linkText: $link['title'] ?? '',
linkUrl: $link['url'] ?? ''
);
}
extractAndAddLinksFromHtml(ComponentEmbedding $embedding, ?string $html, string $linkPrefix = '')
Purpose: Extracts all <a> tags from HTML content and adds them to the embedding's link metadata.
When to use: When content contains HTML with embedded links that should be tracked separately (e.g., biographical text with social media links, articles with reference links).
Parameters:
$embedding- The ComponentEmbedding instance to add links to$html- HTML content to parse for links$linkPrefix- Optional prefix to add context to link text (e.g., person name)
Features:
- Uses DOMDocument for reliable HTML parsing
- Extracts both href and link text
- Filters out links missing href or text
- Adds contextual prefix when provided (useful for avoiding duplicate generic link text)
Usage:
// Basic usage - extract links from HTML
ComponentEmbedding::extractAndAddLinksFromHtml($ret, $htmlContent);
// With prefix for context (recommended when looping through items)
foreach ($persons as $person) {
$name = $person->name;
$bio = $person->bio; // Contains <a href="...">Twitter</a>, <a href="...">LinkedIn</a>
// Prefix links with person name: "John Doe Twitter", "John Doe LinkedIn"
ComponentEmbedding::extractAndAddLinksFromHtml($ret, $bio, $name);
// Clean HTML after extracting links
$cleanBio = ComponentEmbedding::stripAllHtmlFromText($bio);
$ret->addSection("Name: $name\nBio: $cleanBio", 'Person');
}
Why use linkPrefix: Without prefix, 20 people with Twitter links produces 20 identical "Twitter" entries. With prefix, you get "Chris Haas Twitter", "Jane Smith Twitter", etc., providing essential context.
Important: Call extractAndAddLinksFromHtml() BEFORE stripAllHtmlFromText() to preserve the links before HTML is removed.
HTML Cleaning Utility
stripAllHtmlFromText(?string $text, bool $preserveLists = false)
CRITICAL: Always use this for user-entered HTML content
Features:
- Removes
<script>,<style>,<form>tags and HTML comments - Strips all remaining HTML tags
- Decodes HTML entities (
&→&) - Collapses whitespace
- Optional: Preserves list structure with proper formatting
Usage:
// DO THIS:
$cleanCopy = ComponentEmbedding::stripAllHtmlFromText($copy);
$ret->addSection($cleanCopy);
// NOT THIS:
$ret->addSection($copy); // May contain <div>, <p>, <br> tags
Best Practices
1. Avoid Loading Sub-Components
VERY IMPORTANT: Write extraction code directly in getEmbedding(). Do NOT load sub-components.
Strongly Preferred:
public function getEmbedding(): ?ComponentEmbeddingInterface
{
$ret = ComponentEmbedding::fromComponent($this);
// Write code directly - NO sub-component loading
while (have_rows('items')) {
the_row();
$ret->addSection(get_sub_field('copy'));
}
return $ret;
}
Avoid:
// DON'T load sub-components during embedding
vendi_load_component_v3(['parent', 'child']);
Why: The system hasn't found a good pattern for sub-component loading in embeddings yet. Keep it simple and direct.
2. Always Clean HTML from User Content
// CORRECT:
$cleanCopy = ComponentEmbedding::stripAllHtmlFromText($copy);
$ret->addSection($cleanCopy);
// WRONG:
$ret->addSection($copy); // HTML tags leak into embedding
5. Use Structured Content with Labels
Makes content more parseable by the RAG system:
$contentParts = [];
if ($heading) {
$contentParts[] = 'Heading: ' . $heading;
}
if ($subheading) {
$contentParts[] = 'Subheading: ' . $subheading;
}
if ($copy) {
$contentParts[] = 'Body: ' . ComponentEmbedding::stripAllHtmlFromText($copy);
}
if ($link) {
$contentParts[] = 'Link: ' . $link['title'];
}
$ret->addSection(implode(PHP_EOL, $contentParts));
Implementation Checklist
Step 1: Inspect Template File
- Read the component's template file (
.php) to understand its structure - Identify if there's a top-level heading (outside any loops) → Consider
PrimaryHeadingInterface - Identify if there's top-level body/intro copy (outside any loops) → Consider
PrimaryCopyInterface - Note any repeater fields that should become sections
- Note any links/CTAs that should be tracked in metadata
Step 2: Class File Changes
- Add use statements at top of file:
use Vendi\Theme\ComponentInterfaces\ComponentEmbeddingAwareInterface; use Vendi\Theme\DTO\Embedding\ComponentEmbedding; use Vendi\Theme\DTO\Embedding\ComponentEmbeddingInterface; - Only if template has top-level heading: Add interface use statement:
use Vendi\Theme\ComponentInterfaces\PrimaryHeadingInterface; - Only if template has top-level copy: Add interface use statement:
use Vendi\Theme\ComponentInterfaces\PrimaryCopyInterface; - Implement
ComponentEmbeddingAwareInterfacein class declaration - Only if template has top-level heading: Implement
PrimaryHeadingInterface - Only if template has top-level copy: Implement
PrimaryCopyInterface - Add
getEmbedding(): ?ComponentEmbeddingInterfacemethod - If using
PrimaryHeadingInterface: AddgetPrimaryHeadingText(): ?stringreturning the appropriate field - If using
PrimaryCopyInterface: AddgetPrimaryCopy(): ?stringreturning the appropriate field
Step 3: Template File Changes
- Add use statement at top:
use Vendi\Theme\Enums\RenderingContextEnum; - Add embedding context check after component instantiation:
if (defined('VENDI_RENDER_CONTEXT') && VENDI_RENDER_CONTEXT === RenderingContextEnum::EMBEDDING->value) { return $component; }
Step 4: getEmbedding() Implementation
- Start with
$ret = ComponentEmbedding::fromComponent($this); - Loop through any repeater/flexible content fields
- Filter layouts to relevant types only (
in_array()check) - Use
stripAllHtmlFromText()for all HTML content - Add sections with
addSection()for each logical chunk - Add links with
addLink()if component has CTAs - Validate WP_Post objects with
instanceofbefore accessing fields - Filter empty content before adding sections
- Write code directly (do NOT load sub-components)
- Return
$ret
For Skippable Components Only
- Add use statement:
use Vendi\Theme\ComponentInterfaces\ComponentEmbeddingSkipAwareInterface; - Implement
ComponentEmbeddingSkipAwareInterfacein class declaration - Do NOT implement
ComponentEmbeddingAwareInterface - Still add template boilerplate (for consistency)
- No
getEmbedding()method needed
Testing
After implementation, test with the CLI command:
wp vendi embedding:generate
This command:
- Iterates through all published posts/pages
- Sets
VENDI_RENDER_CONTEXTtoEMBEDDING - Loads each component
- Calls
getEmbedding()on embeddable components - Outputs structured JSON for Qdrant
Verify Output
Check the JSON output for:
- ✅ Component appears in embedding data
- ✅ Heading and body extracted correctly
- ✅ Sections appear as separate chunks (Section 1, Section 2, etc.)
- ✅ Links tracked in metadata
- ✅ HTML stripped from content (no
<div>,<p>,<br>tags) - ✅ Content is readable and well-structured
Sample Output Format
{
"content": "Heading: Research Topics\nSection 1: AANA's Current Priorities\nWhat are healthcare executives' perceptions...",
"metadata": {
"type": "page",
"url": "https://example.com/page/",
"created": "2022-11-29T21:01:08+00:00",
"updated": "2024-03-07T09:07:06+00:00",
"component_type": "accordion"
},
"id": "660-2"
}
Common Pitfalls
- Forgetting to clean HTML: Always use
stripAllHtmlFromText()on user content - Loading sub-components: Write extraction code directly in
getEmbedding() - Missing template boilerplate: Component will render HTML instead of being embedded
- Not filtering layouts: Process only relevant flexible content layouts
- Not validating WP_Post: Check
instanceof WP_Postbefore accessing post fields - Adding empty sections: Filter content before calling
addSection() - Forgetting to return component: Template must
return $component;in embedding context - Extracting links after stripping HTML: Call
extractAndAddLinksFromHtml()BEFOREstripAllHtmlFromText() - Missing link context: Use linkPrefix parameter when looping through items to avoid duplicate generic link text
Reference Examples
Examine these components for real-world patterns:
- basic_copy_block - Simple: Single chunk with heading/body
- ad_row - Skippable: Marked with skip interface
- accordion - Complex: Multiple accordion items as sections
- card_navigation - Complex: Cards with CTAs tracked as links
- testimonial - Related Posts: WP_Post objects as sections with custom label
- people_image_grid - Complex: Loops through people, extracts links from bio HTML with name prefix, creates person sections
All located in: vendi-theme-parts/components/[component_name]/[component_name].class.php
Your Role
Guide the user through implementing embedding support for a v3 component:
- Read the template file: Inspect the actual
.phptemplate to understand structure - Identify top-level content: Determine if component has top-level heading and/or copy (outside loops)
- Determine pattern: Is it simple, complex, skippable? Does it have repeaters? Links?
- Choose interfaces: Based on template inspection, decide which interfaces to implement
- Present implementation plan: Describe changes needed with specific field names from template
- Implement changes: Update class and template files
- Test: Run
wp vendi embedding:generateand verify output
Remember:
- Always start by reading the template file - don't guess at structure
- Implement
PrimaryHeadingInterfaceonly if template has top-level heading (outside loops) - Implement
PrimaryCopyInterfaceonly if template has top-level copy (outside loops) - User handles
git addandgit commit- you should NOT run these - Write embedding extraction code directly (avoid sub-component loading)
- Always clean HTML from user content
- Use structured content with labels for better RAG performance