Connectors Overview

Overview

Connectors in Lucille Search are responsible for reading data from external sources and generating Documents to be processed by pipelines. Each connector connects to a specific data source, extracts data, and publishes Documents for downstream processing.

Connector Interface

All connectors implement the Connector interface, which defines the contract for data ingestion:

package com.kmwllc.lucille.core;

public interface Connector extends AutoCloseable {
  String getName();
  String getPipelineName();
  boolean requiresCollapsingPublisher();
  void preExecute(String runId) throws ConnectorException;
  void execute(Publisher publisher) throws ConnectorException;
  void postExecute(String runId) throws ConnectorException;
  Spec getSpec();
  String getMessage();
}

Location: com.kmwllc.lucille.core.Connector

Connector Lifecycle

During each run, connectors follow a strict lifecycle:

preExecute(runId) - Always called first for setup operations
execute(publisher) - Called if preExecute() succeeds, performs main data ingestion
postExecute(runId) - Called if preExecute() and execute() succeed
close() - Always called for cleanup, even if errors occur

A new Connector instance is created for each run.

AbstractConnector Base Class

Most connectors extend AbstractConnector, which provides common functionality:

package com.kmwllc.lucille.connector;

public abstract class AbstractConnector implements Connector {
  protected final Config config;
  
  public AbstractConnector(Config config);
  public String getName();
  public String getPipelineName();
  public boolean requiresCollapsingPublisher();
  public String getDocIdPrefix();
  public String createDocId(String id);
  public void preExecute(String runId) throws ConnectorException;
  public void postExecute(String runId) throws ConnectorException;
  public void close() throws ConnectorException;
}

Location: com.kmwllc.lucille.connector.AbstractConnector

Base Configuration Parameters

All connectors that extend AbstractConnector support these configuration parameters:

name

String

required

The name of the connector. Connector names should be unique within your Lucille configuration.

class

String

required

The fully qualified class name of the connector implementation.

pipeline

String

The name of the pipeline to feed Documents to. Defaults to null (no pipeline).

docIdPrefix

String

A string to prepend to Document IDs originating from this connector. Defaults to an empty string.

collapse

Boolean

Whether this connector is “collapsing”. A collapsing Publisher combines Documents published in sequence that share the same ID into a single document with multi-valued fields. Defaults to false.

Specification and Validation

All connector implementations must declare a public static Spec SPEC that defines the connector’s configuration properties. This Spec is accessed reflectively in the AbstractConnector constructor, and the provided Config is validated against it.

Connectors will not function without declaring a public static Spec SPEC.

Example Spec Declaration

public static final Spec SPEC = SpecBuilder.connector()
    .requiredString("driver", "connectionString", "jdbcUser", "jdbcPassword", "sql", "idField")
    .optionalString("preSQL", "postSQL")
    .optionalNumber("fetchSize", "connectionRetries", "connectionRetryPause")
    .build();

Creating Connectors from Configuration

Connectors are instantiated reflectively using the static fromConfig() method:

List<Connector> connectors = Connector.fromConfig(config);

Connector constructors must accept a single Config parameter.

Publishing Documents

Connectors receive a Publisher instance in their execute() method. Use this to publish documents:

@Override
public void execute(Publisher publisher) throws ConnectorException {
  Document doc = Document.create("doc-id-123");
  doc.setField("title", "Example Document");
  publisher.publish(doc);
}

Built-in Connectors

Lucille provides several built-in connector implementations:

FileConnector

Traverse local and cloud storage (S3, GCP, Azure)

DatabaseConnector

Query relational databases via JDBC

KafkaConnector

Read messages from Kafka topics

SolrConnector

Query and index Solr collections

RSSConnector

Read items from RSS/Atom feeds

SequenceConnector

Generate empty documents for testing

Plugin Connectors

ParquetConnector

Read Parquet files from local or cloud storage

Deprecated Connectors

The following connectors are deprecated. Use FileConnector with appropriate FileHandlers instead:

CSVConnector (use FileConnector with CSVFileHandler)
JSONConnector (use FileConnector with JSONFileHandler)
XMLConnector (use FileConnector with XMLFileHandler)

Document ID Management

The createDocId(String id) method adds the configured docIdPrefix to IDs:

String fullId = createDocId("123"); // Returns "prefix123" if docIdPrefix="prefix"

This helps namespace documents from different sources.

Error Handling

Connectors should throw ConnectorException when errors occur:

throw new ConnectorException("Error connecting to database", e);

The framework ensures close() is always called for cleanup, even if exceptions are thrown.

Connectors

Stages

Indexers

Plugins

Connectors Overview

Overview

Connector Interface

Connector Lifecycle

AbstractConnector Base Class

Base Configuration Parameters

Specification and Validation

Example Spec Declaration

Creating Connectors from Configuration

Publishing Documents

Built-in Connectors

FileConnector

DatabaseConnector

KafkaConnector

SolrConnector

RSSConnector

SequenceConnector

Plugin Connectors

ParquetConnector

Deprecated Connectors

Document ID Management

Error Handling

Next Steps

FileConnector Reference

DatabaseConnector Reference

Connectors

Stages

Indexers

Plugins

​Overview

​Connector Interface

​Connector Lifecycle

​AbstractConnector Base Class

​Base Configuration Parameters

​Specification and Validation

​Example Spec Declaration

​Creating Connectors from Configuration

​Publishing Documents

​Built-in Connectors

FileConnector

DatabaseConnector

KafkaConnector

SolrConnector

RSSConnector

SequenceConnector

​Plugin Connectors

ParquetConnector

​Deprecated Connectors

​Document ID Management

​Error Handling

​Next Steps

FileConnector Reference

DatabaseConnector Reference

Overview

Connector Interface

Connector Lifecycle

AbstractConnector Base Class

Base Configuration Parameters

Specification and Validation

Example Spec Declaration

Creating Connectors from Configuration

Publishing Documents

Built-in Connectors

Plugin Connectors

Deprecated Connectors

Document ID Management

Error Handling

Next Steps