Indexer Structure
The indexer configuration consists of two parts:- General indexer settings - control batching, field handling, and deletion behavior
- Backend-specific settings - connection details for Solr, OpenSearch, Elasticsearch, etc.
General Indexer Settings
Indexer type:
Solr, OpenSearch, Elasticsearch, or CSVCan be omitted if you provide indexer.class instead.Fully qualified indexer implementation class. Use for plugins and custom implementations.
Maximum number of documents in a batch before it is flushed to the destination
Milliseconds since the previous add or flush before a batch is considered expired and flushed regardless of size
Enable or disable indexing. Set to
false for testing or when no indexer is required.Field Handling
Document field containing an ID to send to the index instead of the default document ID
Document field containing the destination index/collection name to use instead of the default
Fields that should never be sent to the destination
Deletion Handling
Deletion features are supported in Solr, OpenSearch, and Pinecone indexers.
Document field that indicates whether a document represents a deletion request
Value in
deletionMarkerField that marks a document as a deletion requestDocument field containing the name of the field to use in a delete-by-query requestOnly supported in Solr and OpenSearch indexers.
Document field containing the value to match in a delete-by-query requestOnly supported in Solr and OpenSearch indexers.
Deletion Behavior
When a document hasdeletionMarkerField set to deletionMarkerFieldValue:
- Delete by ID (default): The document with the same ID is deleted from the index
- Delete by Query: If
deleteByFieldFieldanddeleteByFieldValueare also present, all documents matching that field/value are deleted
Solr Configuration
Lucille supports both basic Solr and SolrCloud configurations.Basic Solr (HTTP2SolrClient)
Solr URL including the collection name (e.g.,
http://localhost:8983/solr/collection1)SolrCloud with URL
Use CloudHTTP2SolrClient for SolrCloud deployments
One or more Solr base URLs. For SolrCloud, URLs should NOT include the collection name.
Default collection name for SolrCloud. Required when using
useCloudClient.SolrCloud with ZooKeeper
ZooKeeper connection strings for SolrCloud. Alternative to
url.ZooKeeper chroot path for Solr
Authentication and SSL
Username for HTTP basic authentication
Password for HTTP basic authentication
Allow invalid TLS certificates. Only enable for testing SSL/HTTPS against localhost.
OpenSearch Configuration
OpenSearch HTTP endpoint. Can include credentials in the URL.Use environment variables for production:
Target OpenSearch index nameCan be overridden per-document using
indexer.indexOverrideField.Use partial update API instead of index/replace operation
Allow invalid TLS certificates. Only enable for testing SSL/HTTPS against localhost.
Document field that supplies the routing key for OpenSearch
Versioning type when using external versions (e.g.,
EXTERNAL)Environment Variable Example
Elasticsearch Configuration
Elasticsearch HTTP endpoint
Target Elasticsearch index name
Document type (deprecated in newer Elasticsearch versions)
CSV Indexer
Outputs documents to CSV files instead of a search engine. Useful for testing and data export.CSV indexer configuration is minimal. Documents are written to CSV format based on their fields.
Plugin Indexers
Lucille supports plugin indexers for additional destinations:Pinecone Example
Weaviate Example
Complete Examples
- Solr Basic
- SolrCloud
- OpenSearch
- Elasticsearch
- Testing (No Indexing)
Performance Tuning
Batch Size
Batch Size
Larger batch sizes reduce network overhead but increase memory usage:
- Small batches (10-50): Lower latency, more network requests
- Medium batches (100-250): Balanced for most use cases
- Large batches (500-1000): Better throughput for bulk indexing
Batch Timeout
Batch Timeout
Controls how long to wait before flushing a partial batch:
- Short timeout (100-1000ms): Lower latency, more frequent flushes
- Long timeout (5000-10000ms): Better batching, higher latency
Field Filtering
Field Filtering
Remove unnecessary fields before indexing to reduce payload size:
Troubleshooting
Connection failures
Connection failures
Verify the URL is correct and the search engine is accessible:Check
acceptInvalidCert if using self-signed certificates.Documents not appearing
Documents not appearing
Common causes:
- Indexing disabled: Check
sendEnabled: true - Wrong index: Verify collection/index name
- Field mapping issues: Check destination schema
- Deletion marker: Ensure documents don’t have deletion marker set
Performance issues
Performance issues
- Increase
batchSizefor better throughput - Reduce
batchTimeoutif documents appear slowly - Check network latency to search engine
- Monitor search engine resource usage
Next Steps
Kafka Configuration
Enable distributed mode with Kafka
Running Lucille
Execute your configured pipeline