Skip to main content

Overview

The ElasticsearchIndexer sends documents to Elasticsearch using the official Java Client. It provides advanced features including join relations for parent-child documents, routing, and external versioning. Java Class: com.kmwllc.lucille.indexer.ElasticsearchIndexer Source: ElasticsearchIndexer.java

Configuration

Basic Configuration

indexer {
  type: "elasticsearch"
  
  elasticsearch {
    url: "https://localhost:9200"
    index: "documents"
  }
}

With Authentication

indexer {
  type: "elasticsearch"
  
  elasticsearch {
    url: "https://elastic.example.com:9200"
    index: "my_index"
    userName: "elastic"
    password: "${ELASTIC_PASSWORD}"
  }
}

Parameters

index
string
required
Target Elasticsearch index name.Example: "documents", "logs-2024-01"
url
string
required
Elasticsearch HTTP endpoint including protocol and port.Example: "https://localhost:9200"
update
boolean
default:"false"
Use partial update API to modify only specified fields instead of replacing the entire document.
acceptInvalidCert
boolean
default:"false"
Allow invalid TLS certificates. Use only for development/testing.
indexer.routingField
string
Document field that supplies the routing key for shard placement.Example: "user_id", "parent_id"
indexer.versionType
string
Version control type:
  • External: Use external version numbers
  • ExternalGte: External version must be >= current version
Requires KafkaDocument instances.

Join Field Configuration

Elasticsearch supports parent-child relationships using join fields:
elasticsearch.join.joinFieldName
string
Name of the join field mapped in the index.Example: "document_join"
elasticsearch.join.isChild
boolean
default:"false"
Whether documents being indexed are children in the join relation.
elasticsearch.join.childName
string
Child relation name. Required when isChild is true.Example: "comment"
elasticsearch.join.parentDocumentIdSource
string
Document field containing the parent document ID. Required when isChild is true.Example: "parent_id"
elasticsearch.join.parentName
string
Parent relation name. Required when isChild is false and join is used.Example: "article"

Features

Join Relations (Parent-Child)

Elasticsearch join fields enable parent-child document relationships:
elasticsearch {
  join {
    joinFieldName: "doc_relation"
    parentName: "article"
  }
}
The indexer adds the join field automatically:
{
  "id": "article1",
  "title": "Parent Article",
  "doc_relation": "article"
}
Important: Child documents must be routed to the same shard as their parent. Set routingField to the parent ID field.

Partial Updates

Use update mode to modify only specific fields:
elasticsearch {
  update: true
}
  • Index mode (update=false): Replaces entire document
  • Update mode (update=true): Merges fields, preserving unspecified fields

Routing

Control shard placement:
indexer {
  routingField: "tenant_id"
}
Essential for:
  • Parent-child relationships
  • Multi-tenant applications
  • Query performance optimization

External Versioning

Use external version numbers (e.g., Kafka offsets):
indexer {
  versionType: "External"
}
The indexer extracts version from KafkaDocument.getOffset().

Index Override Not Supported

The indexOverrideField configuration is not supported by ElasticsearchIndexer. All documents go to the single index specified in the configuration.

Connection Validation

The indexer pings Elasticsearch during startup:
BooleanResponse response = client.ping();
If validation fails, the pipeline will not start.

Error Handling

Failed documents are returned with error details:
Set<Pair<Document, String>> failedDocs;
Common errors:
  • Join field validation failures
  • Parent document not found (for children)
  • Routing value missing
  • Version conflicts
  • Mapping type mismatches

Example Configurations

indexer {
  type: "elasticsearch"
  batchSize: 500
  
  elasticsearch {
    url: "http://localhost:9200"
    index: "documents"
  }
}
# Parent pipeline
indexer {
  type: "elasticsearch"
  
  elasticsearch {
    url: "https://elastic:9200"
    index: "blog"
    userName: "elastic"
    password: "${ES_PASSWORD}"
    
    join {
      joinFieldName: "blog_join"
      parentName: "article"
    }
  }
}

# Child pipeline (separate)
indexer {
  type: "elasticsearch"
  routingField: "article_id"  # Route to parent's shard
  
  elasticsearch {
    url: "https://elastic:9200"
    index: "blog"  # Same index
    userName: "elastic"
    password: "${ES_PASSWORD}"
    
    join {
      joinFieldName: "blog_join"
      isChild: true
      childName: "comment"
      parentDocumentIdSource: "article_id"
    }
  }
}
indexer {
  type: "elasticsearch"
  routingField: "user_id"
  batchSize: 1000
  
  elasticsearch {
    url: "https://localhost:9200"
    index: "user_activity"
    update: true  # Partial updates
  }
}
indexer {
  type: "elasticsearch"
  versionType: "External"
  
  elasticsearch {
    url: "https://elastic:9200"
    index: "events"
  }
}

Best Practices

  • Keep parent-child relationships shallow (one level)
  • Avoid too many children per parent (Elasticsearch limit: 10000 per shard)
  • Use join only when you need parent-child queries (e.g., has_child, has_parent)
  • Consider denormalization as an alternative
Child documents must be on the same shard as their parent:
routingField: "parent_id_field"
Without proper routing, child documents will be unreachable.
Create the index with join field mapping before indexing:
{
  "mappings": {
    "properties": {
      "doc_relation": {
        "type": "join",
        "relations": {
          "article": "comment"
        }
      }
    }
  }
}
Index parents first, then children. This ensures parents exist before children reference them.

Troubleshooting

The indexer automatically adds the join field. If the document already has it:
  • Remove the join field from your source data
  • Let the indexer populate it based on configuration
When indexing children:
  • Ensure parent documents are indexed first
  • Verify parentDocumentIdSource field contains correct parent IDs
  • Check that routing is configured correctly
If you need multi-index support:
  • Use separate pipelines for each index
  • Or use OpenSearchIndexer which supports index override
For child documents or routing-required indices:
  • Set routingField in configuration
  • Ensure all documents have the routing field
  • Verify routing field contains non-null values

Join Field Implementation

The indexer’s ElasticJoinData class handles join field population:
public void populateJoinData(Document doc) {
  if (isChild) {
    String parentId = doc.getString(parentDocumentIdSource);
    doc.setField(joinFieldName, getChildNode(parentId));
  } else {
    doc.setField(joinFieldName, parentName);
  }
}
This happens automatically before documents are sent to Elasticsearch.

See Also