Overview
TheSolrIndexer sends processed documents to Apache Solr using the SolrJ client library. It supports both standalone Solr instances and SolrCloud deployments with ZooKeeper coordination.
Java Class: com.kmwllc.lucille.indexer.SolrIndexer
Source: SolrIndexer.java
Configuration
Basic Configuration
SolrCloud Configuration
Parameters
One or more Solr base URLs (e.g.,
https://localhost:8983). Used for standalone Solr instances.Example: ["http://solr1:8983", "http://solr2:8983"]Whether to use the SolrCloud client. Set to
true when connecting to a SolrCloud cluster with ZooKeeper.ZooKeeper connection strings when using SolrCloud. Required when
useCloudClient is true.Example: ["zk1:2181", "zk2:2181", "zk3:2181"]ZooKeeper chroot path used with SolrCloud. Typically
/solr or similar.Default Solr collection to index documents into when no
indexOverrideField is present on the document.Username for HTTP basic authentication.
Password for HTTP basic authentication.
Allow invalid TLS certificates. Use with caution - only for development/testing.
Features
Multi-Collection Support
Route documents to different collections using theindexOverrideField:
target_collection field will be sent to that collection instead of the default.
Child Documents
Solr’s nested document feature is fully supported. Documents can contain child documents in thechildren field:
_childDocuments_ format.
Deletion Support
- Delete by ID
- Delete by Field
solrClient.deleteById().SSL/TLS Configuration
Connect to Solr over HTTPS with custom SSL settings:Connection Validation
The indexer validates connectivity during startup:- Standalone Solr: Uses
solrClient.ping()to verify the connection - SolrCloud: Checks cluster status via
CollectionAdminRequest.ClusterStatus()
Batch Processing
Documents are sent to Solr in batches:- Documents accumulate up to
batchSize(default 100) - Within each batch, documents are grouped by collection
- Add/update operations are sent first
- Delete operations are sent after adds/updates
- If an ID appears in both adds and deletes, operations are ordered correctly
Error Handling
Failed documents are tracked and reported:- Connection failures
- Invalid field types
- Collection not found
- Authentication failures
Example Configurations
Basic standalone Solr
Basic standalone Solr
SolrCloud with authentication
SolrCloud with authentication
Multi-collection with deletion
Multi-collection with deletion
Best Practices
Use SolrCloud for production
Use SolrCloud for production
SolrCloud provides high availability, automatic failover, and distributed indexing. Use
useCloudClient: true and connect via ZooKeeper.Optimize batch size
Optimize batch size
- Start with 500-1000 documents per batch
- Reduce for documents with large text fields
- Monitor Solr’s heap usage and adjust
Handle child documents carefully
Handle child documents carefully
- Child documents cannot have their own children (nested limit)
- Map types must be flattened to fields
- Child IDs should be unique across the entire index
Use field-based deletion sparingly
Use field-based deletion sparingly
Field-based deletion (
deleteByFieldField) executes a query before deleting. For high-volume deletions, consider:- Deleting by ID when possible
- Batching deletions
- Using Solr’s time-to-live (TTL) features
Troubleshooting
Connection refused
Connection refused
- Verify Solr is running:
curl http://localhost:8983/solr/admin/ping - Check firewall rules
- Confirm correct port and protocol (HTTP vs HTTPS)
Collection not found
Collection not found
- Create the collection in Solr first
- Verify
defaultCollectionmatches exactly (case-sensitive) - For SolrCloud, ensure collection exists in ZooKeeper
Authentication failures
Authentication failures
- Verify credentials are correct
- Check Solr’s security.json configuration
- Ensure the user has write permissions to the collection
Field type errors
Field type errors
- Ensure Solr schema matches document fields
- Use
ignoreFieldsto exclude problematic fields - Check for Map/Object fields (not supported directly)