Overview
TheOpenSearchIndexer sends documents to OpenSearch clusters using the official Java client. It supports bulk operations, partial updates, routing, and external versioning.
Java Class: com.kmwllc.lucille.indexer.OpenSearchIndexer
Source: OpenSearchIndexer.java
Configuration
Basic Configuration
With Authentication
Parameters
Target OpenSearch index name where documents will be stored.Example:
"documents", "logs-2024"OpenSearch HTTP endpoint including protocol and port.Example:
"https://localhost:9200", "http://opensearch.local:9200"Use the partial update API instead of index/replace operations. When
true, only specified fields are updated; existing fields are preserved.Allow invalid TLS certificates. Only use for development/testing environments.
Document field that supplies the routing key. Used to control which shard stores the document.Example:
"user_id", "tenant_id"Versioning type when using external version control. Options:
External: Use external version numberExternalGte: External version must be >= current version
KafkaDocument instances with offset information.Features
Index Override
Route documents to different indices dynamically:target_index field will be sent to that index instead.
Partial Updates
Use the update API to modify only specific fields:- Update = false (default): Document is replaced entirely (index operation)
- Update = true: Only provided fields are modified; other fields remain unchanged
Routing Control
Control document placement across shards:tenant_id value will be stored on the same shard, improving query performance for tenant-specific searches.
External Versioning
Use Kafka offsets or other external version numbers:KafkaDocument instances. The Kafka offset is used as the version number, enabling exactly-once semantics.
Deletion Support
- Delete by ID
- Delete by Query
Bulk Request Processing
The indexer optimizes operations:- Batch accumulation: Documents collected up to
batchSize - Grouping by operation: Separate uploads from deletes
- Index grouping: Operations grouped by target index
- Conflict resolution:
- If a document ID appears in both upload and delete, the most recent operation wins
- Uploads remove previous delete requests
- Deletes remove previous upload requests
Connection Validation
Startup validation ensures OpenSearch is accessible:Error Handling
Failed documents are tracked with detailed error messages:- Index not found
- Mapping type errors
- Routing value missing when required
- Version conflict
Example Configurations
Basic indexing
Basic indexing
Multi-tenant with routing
Multi-tenant with routing
Partial updates with deletion
Partial updates with deletion
Kafka integration with versioning
Kafka integration with versioning
Best Practices
Use routing for multi-tenant applications
Use routing for multi-tenant applications
Setting
routingField ensures all documents for a tenant are on the same shard:- Faster tenant-specific queries
- Efficient deletion of tenant data
- Better cache utilization
Choose update mode carefully
Choose update mode carefully
- Index mode (update=false): Faster, replaces entire document
- Update mode (update=true): Slower, preserves unspecified fields
Optimize batch size
Optimize batch size
- Default: 100 documents
- Increase to 500-1000 for small documents
- Decrease for large documents with many fields
- Monitor OpenSearch heap and bulk queue
Handle version conflicts
Handle version conflicts
When using external versioning:
- Ensure Kafka offsets are monotonically increasing
- Handle version conflict errors appropriately
- Consider using
ExternalGteinstead ofExternal
Troubleshooting
Connection timeout
Connection timeout
- Verify OpenSearch is running and accessible
- Check network connectivity and firewall rules
- Confirm correct URL and port
- Test with:
curl -k https://localhost:9200
Index not found error
Index not found error
Create the index in OpenSearch before indexing:
Mapping errors
Mapping errors
- Ensure index mapping supports your document fields
- Use
ignoreFieldsto exclude incompatible fields - Check for type mismatches (e.g., sending string to long field)
Routing required but missing
Routing required but missing
If the index has required routing:
- Set
routingFieldin configuration - Ensure all documents have the routing field
- Check OpenSearch index settings
Child Documents
The indexer processes child documents from thechildren field: