Overview
The Pinecone plugin provides thePineconeIndexer, which sends vector embeddings from documents to Pinecone’s vector database. It supports upsert and update operations, namespaces, and metadata storage.
Maven Module: lucille-pinecone
Java Class: com.kmwllc.lucille.pinecone.indexer.PineconeIndexer
Source: PineconeIndexer.java
Installation
Add the plugin dependency to yourpom.xml:
Configuration
Basic Configuration
With Namespaces
Parameters
Pinecone API key for authentication.Best practice: Use environment variable
${PINECONE_API_KEY}Name of the Pinecone index to write vectors to.Example:
"embeddings", "product-vectors"Document field containing the vector embeddings when not using namespaces.Required if
namespaces is not set.Example: "embedding", "vector"Mapping of namespace names to embedding field names. Allows indexing different vector types to different namespaces.Example:Mutually exclusive with
defaultEmbeddingField (one or the other must be set).List of document fields to store as metadata with each vector. Metadata can be used for filtering during queries.Example:
["title", "category", "timestamp"]Operation mode:
upsert: Insert or replace vectors (recommended)update: Only update existing vectors
Features
Upsert Operations
Insert new vectors or replace existing ones:- If vector ID exists, it’s replaced
- If vector ID is new, it’s inserted
- Includes metadata fields
Update Operations
Modify only the embeddings:- Updates only the vector values
- Does not modify metadata
- No error if vector doesn’t exist (Pinecone returns 200 OK)
Namespace Support
Organize vectors into namespaces:- Isolate different vector types
- Query specific namespaces
- Delete namespace contents independently
- Same document can have vectors in multiple namespaces
Metadata Storage
Store filterable metadata with vectors:- Filtering query results
- Post-retrieval processing
- Displaying context
Deletion Support
Delete vectors using marker fields:Batch Size Limits
Connection Validation
The indexer validates connectivity during startup:- API key is valid
- Index exists
- Connection is stable
Example Configurations
Simple vector indexing
Simple vector indexing
Multi-namespace with different embeddings
Multi-namespace with different embeddings
Update-only mode
Update-only mode
With deletion support
With deletion support
Document Requirements
Example: Filter Documents Without Embeddings
Metadata Format
Metadata fields are stored as strings:toString().
Best Practices
Filter out documents without embeddings
Filter out documents without embeddings
Always use a Drop stage to exclude documents missing embeddings:
Use namespaces for organization
Use namespaces for organization
Organize vectors by type, tenant, or use case:
- Product vectors vs. user vectors
- Different embedding models
- Multi-tenant isolation
- Test vs. production data
Limit metadata fields
Limit metadata fields
Only include metadata that will be used for filtering:
- Reduces storage costs
- Faster queries
- Smaller payloads
Use maximum batch size
Use maximum batch size
Set
batchSize: 1000 for best performance:- Fewer API requests
- Better throughput
- Lower latency
Handle deletion carefully
Handle deletion carefully
- Only delete-by-ID is supported (no metadata queries)
- Deletions happen across all namespaces
- Consider soft deletes (metadata flag) as alternative
Troubleshooting
Embedding field is null error
Embedding field is null error
- Add Drop stage to filter out documents without embeddings
- Verify embedding generation stage runs successfully
- Check field name matches configuration
Batch size exceeds 1000
Batch size exceeds 1000
batchSize to 1000 or less:API key authentication failed
API key authentication failed
Solutions:
- Verify API key is correct
- Check environment variable is set
- Ensure key has write permissions to the index
- Confirm Pinecone account is active
Index not found
Index not found
Solutions:
- Create the index in Pinecone console first
- Verify index name matches exactly (case-sensitive)
- Check dimension matches embedding size
- Ensure metric (cosine, euclidean, dotproduct) is appropriate
Upserted count mismatch
Upserted count mismatch
- Network issues
- Rate limiting
- Pinecone service issues
- Check Pinecone status page
- Reduce batch size
- Implement retry logic
Vector ID Strategy
Pinecone uses the document ID as the vector ID:- IDs must be unique within each namespace
- Same ID in different namespaces = different vectors
- Use
idOverrideFieldif you need different IDs
Performance Characteristics
- Batch size: Up to 1000 vectors or 2MB per request
- Throughput: Depends on Pinecone plan and index type
- Latency: Typically 50-200ms per batch
- Parallel workers: Safe to run multiple workers
See Also
- Weaviate Plugin - Alternative vector database
- Indexers Overview
- Pinecone Documentation