Indexers Overview

What are Indexers?

Indexers are responsible for sending processed documents from Lucille to their final destination - typically a search engine or vector database. After documents have been extracted from sources and transformed by stages, indexers handle the actual writing of data to external systems.

Core Concepts

Indexer Interface

All indexers extend the Indexer base class and implement:

Connection validation: Verify connectivity to the target system
Batch processing: Send documents in configurable batches for efficiency
Error handling: Track failed documents and provide detailed error messages
Connection management: Properly open and close connections

Common Configuration

All indexers support these common configuration parameters:

indexer.batchSize

integer

default:"100"

Number of documents to send in each batch request

indexer.ignoreFields

string[]

List of document fields to exclude from indexing

indexer.idOverrideField

string

Document field to use as the ID instead of the default document ID

indexer.indexOverrideField

string

Document field that specifies which index/collection to send the document to

indexer.deletionMarkerField

string

Field name that marks a document for deletion

indexer.deletionMarkerFieldValue

string

Value that indicates the document should be deleted

Available Indexers

Search Engine Indexers

Solr

Index to Apache Solr with support for SolrCloud

OpenSearch

Send documents to OpenSearch clusters

Elasticsearch

Index to Elasticsearch with join support

Vector Database Indexers

Pinecone

Vector database indexer for embeddings

Weaviate

Vector search with object-based schema

Utility Indexers

CSV Indexer

Export documents to CSV files for testing and data export

Deletion Handling

Indexers support marking documents for deletion using marker fields:

indexer {
  deletionMarkerField: "delete_flag"
  deletionMarkerFieldValue: "true"
  deleteByFieldField: "account_id"  # Optional: delete by field
  deleteByFieldValue: "account_value"  # Value for field-based deletion
}

When a document has the deletion marker, the indexer will delete it from the target system instead of upserting it.

Batch Processing

Indexers process documents in batches for efficiency:

Documents accumulate until batch size is reached
Batch is sent to the target system
Failed documents are tracked and can be retried
Successful documents are acknowledged

Error Handling

Indexers track failed documents and return detailed error information:

Document ID
Error message from the target system
Ability to continue processing other documents
Optional retry logic

Connection Validation

Before processing begins, indexers validate connectivity:

Ping the target system
Verify authentication credentials
Check cluster health (for distributed systems)
Validate index/collection existence

Best Practices

Choose appropriate batch sizes

Larger batches (500-1000) for high-throughput scenarios
Smaller batches (50-100) when documents are large or processing is complex
Monitor memory usage and adjust accordingly

Handle failures gracefully

Use deletion markers for removing documents
Configure retry logic for transient failures
Monitor failed document counts
Set up alerts for indexing errors

Optimize field mappings

Use ignoreFields to exclude unnecessary data
Map document fields to index schema appropriately
Consider field size limits in the target system

Connection management

Keep connections open during pipeline execution
Configure appropriate timeouts
Use connection pooling when available
Implement proper shutdown procedures

Next Steps

Solr Indexer

Configure Apache Solr indexing

OpenSearch Indexer

Set up OpenSearch indexing

Connectors

Stages

Indexers

Plugins

Indexers Overview

What are Indexers?

Core Concepts

Indexer Interface

Common Configuration

Available Indexers

Search Engine Indexers

Solr

OpenSearch

Elasticsearch

Vector Database Indexers

Pinecone

Weaviate

Utility Indexers

CSV Indexer

Deletion Handling

Batch Processing

Error Handling

Connection Validation

Best Practices

Next Steps

Solr Indexer

OpenSearch Indexer

Connectors

Stages

Indexers

Plugins

​What are Indexers?

​Core Concepts

​Indexer Interface

​Common Configuration

​Available Indexers

​Search Engine Indexers

Solr

OpenSearch

Elasticsearch

​Vector Database Indexers

Pinecone

Weaviate

​Utility Indexers

CSV Indexer

​Deletion Handling

​Batch Processing

​Error Handling

​Connection Validation

​Best Practices

​Next Steps

Solr Indexer

OpenSearch Indexer

What are Indexers?

Core Concepts

Indexer Interface

Common Configuration

Available Indexers

Search Engine Indexers

Vector Database Indexers

Utility Indexers

Deletion Handling

Batch Processing

Error Handling

Connection Validation

Best Practices

Next Steps