Overview
The Weaviate plugin provides theWeaviateIndexer, which sends documents and optional vector embeddings to Weaviate’s object-based vector database. It supports schema-based object storage, vector search, and property mapping.
Maven Module: lucille-weaviate
Java Class: com.kmwllc.lucille.weaviate.indexer.WeaviateIndexer
Source: WeaviateIndexer.java
Installation
Add the plugin dependency to yourpom.xml:
Configuration
Basic Configuration
With Vectors
Parameters
Weaviate API key for authentication.Best practice: Use environment variable
${WEAVIATE_API_KEY}Weaviate instance hostname (without protocol or port).Example:
"my-cluster.weaviate.network", "localhost"Note: Protocol is assumed to be https. Port defaults to standard.Name of the Weaviate class (object type) in the schema.Must match a class defined in your Weaviate schema.Example:
"Article", "Product", "User"Property name to store the document’s original ID.Weaviate requires UUIDs for object IDs. The indexer generates a UUID from the document ID and stores the original ID in this property.Note:
id is a reserved property in Weaviate.Document field containing vector embeddings to store with the object.If not specified, only document properties are indexed (no vectors).Example:
"embedding", "vector"Features
UUID Generation
Weaviate requires UUID identifiers. The indexer automatically generates UUIDs:- Deterministic: Same document ID always produces the same UUID
- Idempotent: Re-indexing the same document updates it (doesn’t duplicate)
- Traceable: Original ID is preserved in
idDestinationNameproperty
Property Mapping
All document fields (exceptid and children) are mapped to Weaviate properties:
- Document ID → UUID (object ID)
- Document ID →
id_originalproperty (or custom name) - All other fields → Weaviate properties
Vector Support
Optionally attach vector embeddings:- Vector is extracted from the specified field
- Stored with the object for similarity search
- Vector field is removed from properties (not duplicated)
Batch Processing
Documents are sent in batches using Weaviate’sObjectsBatcher:
ConsistencyLevel.ALL for strong consistency.
Error Handling
The indexer tracks failed documents with detailed errors:Connection Validation
The indexer validates connectivity during startup:- Hostname
- Version
- Available modules
Example Configurations
Simple document indexing
Simple document indexing
With vector embeddings
With vector embeddings
Multi-tenant products
Multi-tenant products
With field filtering
With field filtering
Schema Requirements
Before indexing, define the class schema in Weaviate:- Class name must match
classNameconfiguration - Include property for
idDestinationName - Set
vectorizer: "none"if providing your own vectors
Best Practices
Create schema before indexing
Create schema before indexing
Define the Weaviate class schema first:
- Match property names to document fields
- Use appropriate data types (text, int, date, etc.)
- Set vectorizer to “none” if providing embeddings
- Include property for original document ID
Use descriptive class names
Use descriptive class names
Choose class names that reflect your data:
Articlefor news articlesProductfor e-commerceUserfor user profiles
Document for multiple types.Store original IDs
Store original IDs
Always configure
idDestinationName:- Enables lookup by original ID
- Facilitates debugging
- Supports external system integration
Use ignoreFields for cleanup
Use ignoreFields for cleanup
Exclude internal/temporary fields:
Optimize batch size
Optimize batch size
Default is 100 documents per batch:
- Increase to 200-500 for small documents
- Decrease for large documents or vectors
- Monitor Weaviate server performance
UUID Generation Details
The indexer uses deterministic UUID generation:- Deterministic: Same input → same UUID
- Collision-resistant: Different inputs → different UUIDs
- Idempotent: Safe to reindex same document
- Standard: Uses Java’s UUID v3 (name-based)
Vector Handling
WhenvectorField is specified:
- Extract vector from document field
- Convert to Float array
- Attach to Weaviate object
- Remove vector field from properties
List<Float>).
Troubleshooting
Class not found error
Class not found error
- Create the class in Weaviate first
- Verify
classNamematches exactly (case-sensitive) - Check schema with:
GET /v1/schema/{className}
Property type mismatch
Property type mismatch
Cause: Document field type doesn’t match schema.Solutions:
- Update Weaviate schema to match document fields
- Transform fields in pipeline stages
- Use
ignoreFieldsto exclude incompatible fields
Vector dimension mismatch
Vector dimension mismatch
- Verify embedding model outputs correct dimension
- Check Weaviate class vectorizer configuration
- Ensure all vectors have same dimension
Authentication failed
Authentication failed
Solutions:
- Verify API key is correct
- Check environment variable is set:
echo $WEAVIATE_API_KEY - Confirm key has write permissions
- Test connection:
curl -H "Authorization: Bearer $API_KEY" https://host/v1/meta
Connection timeout
Connection timeout
Causes:
- Network connectivity issues
- Weaviate server overloaded
- Incorrect host configuration
- Verify host is accessible
- Check firewall rules
- Test manually:
curl https://host/v1/meta - Reduce batch size to lower server load
Weaviate Client Configuration
The indexer creates a Weaviate client with:- Protocol: HTTPS
- Connection timeout: 6 seconds
- Read timeout: 6 seconds
- Write timeout: 6 seconds
No Deletion Support
Child Documents
Child documents (nested documents) are not currently supported:See Also
- Pinecone Plugin - Alternative vector database
- Indexers Overview
- Weaviate Documentation