Elasticsearch Indexer

Overview

The ElasticsearchIndexer sends documents to Elasticsearch using the official Java Client. It provides advanced features including join relations for parent-child documents, routing, and external versioning. Java Class: com.kmwllc.lucille.indexer.ElasticsearchIndexer Source: ElasticsearchIndexer.java

Configuration

Basic Configuration

indexer {
  type: "elasticsearch"
  
  elasticsearch {
    url: "https://localhost:9200"
    index: "documents"
  }
}

With Authentication

indexer {
  type: "elasticsearch"
  
  elasticsearch {
    url: "https://elastic.example.com:9200"
    index: "my_index"
    userName: "elastic"
    password: "${ELASTIC_PASSWORD}"
  }
}

Parameters

index

string

required

Target Elasticsearch index name.Example: "documents", "logs-2024-01"

url

string

required

Elasticsearch HTTP endpoint including protocol and port.Example: "https://localhost:9200"

update

boolean

default:"false"

Use partial update API to modify only specified fields instead of replacing the entire document.

acceptInvalidCert

boolean

default:"false"

Allow invalid TLS certificates. Use only for development/testing.

indexer.routingField

string

Document field that supplies the routing key for shard placement.Example: "user_id", "parent_id"

indexer.versionType

string

Version control type:

External: Use external version numbers
ExternalGte: External version must be >= current version

Requires KafkaDocument instances.

Join Field Configuration

Elasticsearch supports parent-child relationships using join fields:

elasticsearch.join.joinFieldName

string

Name of the join field mapped in the index.Example: "document_join"

elasticsearch.join.isChild

boolean

default:"false"

Whether documents being indexed are children in the join relation.

elasticsearch.join.childName

string

Child relation name. Required when isChild is true.Example: "comment"

elasticsearch.join.parentDocumentIdSource

string

Document field containing the parent document ID. Required when isChild is true.Example: "parent_id"

elasticsearch.join.parentName

string

Parent relation name. Required when isChild is false and join is used.Example: "article"

Features

Join Relations (Parent-Child)

Elasticsearch join fields enable parent-child document relationships:

Parent Documents
Child Documents

elasticsearch {
  join {
    joinFieldName: "doc_relation"
    parentName: "article"
  }
}

The indexer adds the join field automatically:

{
  "id": "article1",
  "title": "Parent Article",
  "doc_relation": "article"
}

elasticsearch {
  join {
    joinFieldName: "doc_relation"
    isChild: true
    childName: "comment"
    parentDocumentIdSource: "article_id"
  }
}
indexer {
  routingField: "article_id"  # Required for children
}

Child documents get the join field with parent reference:

{
  "id": "comment1",
  "text": "Great article!",
  "doc_relation": {
    "name": "comment",
    "parent": "article1"
  }
}

Important: Child documents must be routed to the same shard as their parent. Set routingField to the parent ID field.

Partial Updates

Use update mode to modify only specific fields:

elasticsearch {
  update: true
}

Index mode (update=false): Replaces entire document
Update mode (update=true): Merges fields, preserving unspecified fields

Routing

Control shard placement:

indexer {
  routingField: "tenant_id"
}

Essential for:

Parent-child relationships
Multi-tenant applications
Query performance optimization

External Versioning

Use external version numbers (e.g., Kafka offsets):

indexer {
  versionType: "External"
}

The indexer extracts version from KafkaDocument.getOffset().

Index Override Not Supported

The indexOverrideField configuration is not supported by ElasticsearchIndexer. All documents go to the single index specified in the configuration.

Connection Validation

The indexer pings Elasticsearch during startup:

BooleanResponse response = client.ping();

If validation fails, the pipeline will not start.

Error Handling

Failed documents are returned with error details:

Set<Pair<Document, String>> failedDocs;

Common errors:

Join field validation failures
Parent document not found (for children)
Routing value missing
Version conflicts
Mapping type mismatches

Example Configurations

Simple indexing

indexer {
  type: "elasticsearch"
  batchSize: 500
  
  elasticsearch {
    url: "http://localhost:9200"
    index: "documents"
  }
}

Parent-child relationships

# Parent pipeline
indexer {
  type: "elasticsearch"
  
  elasticsearch {
    url: "https://elastic:9200"
    index: "blog"
    userName: "elastic"
    password: "${ES_PASSWORD}"
    
    join {
      joinFieldName: "blog_join"
      parentName: "article"
    }
  }
}

# Child pipeline (separate)
indexer {
  type: "elasticsearch"
  routingField: "article_id"  # Route to parent's shard
  
  elasticsearch {
    url: "https://elastic:9200"
    index: "blog"  # Same index
    userName: "elastic"
    password: "${ES_PASSWORD}"
    
    join {
      joinFieldName: "blog_join"
      isChild: true
      childName: "comment"
      parentDocumentIdSource: "article_id"
    }
  }
}

With routing and updates

indexer {
  type: "elasticsearch"
  routingField: "user_id"
  batchSize: 1000
  
  elasticsearch {
    url: "https://localhost:9200"
    index: "user_activity"
    update: true  # Partial updates
  }
}

Kafka integration

indexer {
  type: "elasticsearch"
  versionType: "External"
  
  elasticsearch {
    url: "https://elastic:9200"
    index: "events"
  }
}

Best Practices

Design join relationships carefully

Keep parent-child relationships shallow (one level)
Avoid too many children per parent (Elasticsearch limit: 10000 per shard)
Use join only when you need parent-child queries (e.g., has_child, has_parent)
Consider denormalization as an alternative

Always route child documents

Child documents must be on the same shard as their parent:

routingField: "parent_id_field"

Without proper routing, child documents will be unreachable.

Prepare index mapping first

Create the index with join field mapping before indexing:

{
  "mappings": {
    "properties": {
      "doc_relation": {
        "type": "join",
        "relations": {
          "article": "comment"
        }
      }
    }
  }
}

Use separate pipelines for parents and children

Index parents first, then children. This ensures parents exist before children reference them.

Troubleshooting

Join field already exists error

The indexer automatically adds the join field. If the document already has it:

Remove the join field from your source data
Let the indexer populate it based on configuration

Parent document not found

When indexing children:

Ensure parent documents are indexed first
Verify parentDocumentIdSource field contains correct parent IDs
Check that routing is configured correctly

Index override not supported

If you need multi-index support:

Use separate pipelines for each index
Or use OpenSearchIndexer which supports index override

Routing required but missing

For child documents or routing-required indices:

Set routingField in configuration
Ensure all documents have the routing field
Verify routing field contains non-null values

Join Field Implementation

The indexer’s ElasticJoinData class handles join field population:

public void populateJoinData(Document doc) {
  if (isChild) {
    String parentId = doc.getString(parentDocumentIdSource);
    doc.setField(joinFieldName, getChildNode(parentId));
  } else {
    doc.setField(joinFieldName, parentName);
  }
}

This happens automatically before documents are sent to Elasticsearch.

Connectors

Stages

Indexers

Plugins

Elasticsearch Indexer

Overview

Configuration

Basic Configuration

With Authentication

Parameters

Join Field Configuration

Features

Join Relations (Parent-Child)

Partial Updates

Routing

External Versioning

Index Override Not Supported

Connection Validation

Error Handling

Example Configurations

Best Practices

Troubleshooting

Join Field Implementation

See Also

Connectors

Stages

Indexers

Plugins

​Overview

​Configuration

​Basic Configuration

​With Authentication

​Parameters

​Join Field Configuration

​Features

​Join Relations (Parent-Child)

​Partial Updates

​Routing

​External Versioning

​Index Override Not Supported

​Connection Validation

​Error Handling

​Example Configurations

​Best Practices

​Troubleshooting

​Join Field Implementation

​See Also

Overview

Configuration

Basic Configuration

With Authentication

Parameters

Join Field Configuration

Features

Join Relations (Parent-Child)

Partial Updates

Routing

External Versioning

Index Override Not Supported

Connection Validation

Error Handling

Example Configurations

Best Practices

Troubleshooting

Join Field Implementation

See Also