Solr Indexer

Overview

The SolrIndexer sends processed documents to Apache Solr using the SolrJ client library. It supports both standalone Solr instances and SolrCloud deployments with ZooKeeper coordination. Java Class: com.kmwllc.lucille.indexer.SolrIndexer Source: SolrIndexer.java

Configuration

Basic Configuration

indexer {
  type: "solr"
  
  solr {
    url: ["http://localhost:8983/solr"]
    defaultCollection: "my_collection"
  }
}

SolrCloud Configuration

indexer {
  type: "solr"
  
  solr {
    useCloudClient: true
    zkHosts: ["zk1:2181", "zk2:2181", "zk3:2181"]
    zkChroot: "/solr"
    defaultCollection: "my_collection"
  }
}

Parameters

url

string[]

One or more Solr base URLs (e.g., https://localhost:8983). Used for standalone Solr instances.Example: ["http://solr1:8983", "http://solr2:8983"]

useCloudClient

boolean

default:"false"

Whether to use the SolrCloud client. Set to true when connecting to a SolrCloud cluster with ZooKeeper.

zkHosts

string[]

ZooKeeper connection strings when using SolrCloud. Required when useCloudClient is true.Example: ["zk1:2181", "zk2:2181", "zk3:2181"]

zkChroot

string

ZooKeeper chroot path used with SolrCloud. Typically /solr or similar.

defaultCollection

string

Default Solr collection to index documents into when no indexOverrideField is present on the document.

userName

string

Username for HTTP basic authentication.

password

string

Password for HTTP basic authentication.

acceptInvalidCert

boolean

default:"false"

Allow invalid TLS certificates. Use with caution - only for development/testing.

Features

Multi-Collection Support

Route documents to different collections using the indexOverrideField:

indexer {
  indexOverrideField: "target_collection"
  
  solr {
    defaultCollection: "default_docs"
  }
}

Documents with a target_collection field will be sent to that collection instead of the default.

Child Documents

Solr’s nested document feature is fully supported. Documents can contain child documents in the children field:

{
  "id": "parent1",
  "title": "Parent Document",
  "children": [
    {
      "id": "child1",
      "content": "Child document content"
    }
  ]
}

Child documents are automatically converted to Solr’s _childDocuments_ format.

Deletion Support

Delete by ID
Delete by Field

indexer {
  deletionMarkerField: "delete"
  deletionMarkerFieldValue: "true"
}

Documents marked for deletion will be removed by ID using solrClient.deleteById().

indexer {
  deletionMarkerField: "delete"
  deletionMarkerFieldValue: "true"
  deleteByFieldField: "account_id"
  deleteByFieldValue: "account_value"
}

Delete all documents matching a field value using Solr’s terms query.

SSL/TLS Configuration

Connect to Solr over HTTPS with custom SSL settings:

solr {
  url: ["https://solr.example.com:8983"]
  
  # SSL settings
  ssl.trustStorePath: "/path/to/truststore.jks"
  ssl.trustStorePassword: "password"
  ssl.keyStorePath: "/path/to/keystore.jks"
  ssl.keyStorePassword: "password"
}

Connection Validation

The indexer validates connectivity during startup:

Standalone Solr: Uses solrClient.ping() to verify the connection
SolrCloud: Checks cluster status via CollectionAdminRequest.ClusterStatus()

If validation fails, the pipeline will not start.

Batch Processing

Documents are sent to Solr in batches:

Documents accumulate up to batchSize (default 100)
Within each batch, documents are grouped by collection
Add/update operations are sent first
Delete operations are sent after adds/updates
If an ID appears in both adds and deletes, operations are ordered correctly

Error Handling

Failed documents are tracked and reported:

// Documents that fail are returned with error details
Set<Pair<Document, String>> failedDocs = sendToIndex(documents);

Errors include:

Connection failures
Invalid field types
Collection not found
Authentication failures

Example Configurations

Basic standalone Solr

indexer {
  type: "solr"
  batchSize: 500
  
  solr {
    url: ["http://localhost:8983/solr"]
    defaultCollection: "documents"
  }
}

SolrCloud with authentication

indexer {
  type: "solr"
  batchSize: 1000
  
  solr {
    useCloudClient: true
    zkHosts: ["zk1:2181", "zk2:2181", "zk3:2181"]
    zkChroot: "/solr"
    defaultCollection: "main_docs"
    userName: "solr_user"
    password: "secret_password"
  }
}

Multi-collection with deletion

indexer {
  type: "solr"
  indexOverrideField: "collection_name"
  deletionMarkerField: "deleted"
  deletionMarkerFieldValue: "yes"
  
  solr {
    useCloudClient: true
    zkHosts: ["localhost:2181"]
    defaultCollection: "default"
  }
}

Best Practices

Use SolrCloud for production

SolrCloud provides high availability, automatic failover, and distributed indexing. Use useCloudClient: true and connect via ZooKeeper.

Optimize batch size

Start with 500-1000 documents per batch
Reduce for documents with large text fields
Monitor Solr’s heap usage and adjust

Handle child documents carefully

Child documents cannot have their own children (nested limit)
Map types must be flattened to fields
Child IDs should be unique across the entire index

Use field-based deletion sparingly

Field-based deletion (deleteByFieldField) executes a query before deleting. For high-volume deletions, consider:

Deleting by ID when possible
Batching deletions
Using Solr’s time-to-live (TTL) features

Troubleshooting

Connection refused

Verify Solr is running: curl http://localhost:8983/solr/admin/ping
Check firewall rules
Confirm correct port and protocol (HTTP vs HTTPS)

Collection not found

Create the collection in Solr first
Verify defaultCollection matches exactly (case-sensitive)
For SolrCloud, ensure collection exists in ZooKeeper

Authentication failures

Verify credentials are correct
Check Solr’s security.json configuration
Ensure the user has write permissions to the collection

Field type errors

Ensure Solr schema matches document fields
Use ignoreFields to exclude problematic fields
Check for Map/Object fields (not supported directly)

Connectors

Stages

Indexers

Plugins

Overview

Configuration

Basic Configuration

SolrCloud Configuration

Parameters

Features

Multi-Collection Support

Child Documents

Deletion Support

SSL/TLS Configuration

Connection Validation

Batch Processing

Error Handling

Example Configurations

Best Practices

Troubleshooting

See Also

Connectors

Stages

Indexers

Plugins

​Overview

​Configuration

​Basic Configuration

​SolrCloud Configuration

​Parameters

​Features

​Multi-Collection Support

​Child Documents

​Deletion Support

​SSL/TLS Configuration

​Connection Validation

​Batch Processing

​Error Handling

​Example Configurations

​Best Practices

​Troubleshooting

​See Also

Overview

Configuration

Basic Configuration

SolrCloud Configuration

Parameters

Features

Multi-Collection Support

Child Documents

Deletion Support

SSL/TLS Configuration

Connection Validation

Batch Processing

Error Handling

Example Configurations

Best Practices

Troubleshooting

See Also