Skip to main content

Overview

Connectors in Lucille Search are responsible for reading data from external sources and generating Documents to be processed by pipelines. Each connector connects to a specific data source, extracts data, and publishes Documents for downstream processing.

Connector Interface

All connectors implement the Connector interface, which defines the contract for data ingestion:
package com.kmwllc.lucille.core;

public interface Connector extends AutoCloseable {
  String getName();
  String getPipelineName();
  boolean requiresCollapsingPublisher();
  void preExecute(String runId) throws ConnectorException;
  void execute(Publisher publisher) throws ConnectorException;
  void postExecute(String runId) throws ConnectorException;
  Spec getSpec();
  String getMessage();
}
Location: com.kmwllc.lucille.core.Connector

Connector Lifecycle

During each run, connectors follow a strict lifecycle:
  1. preExecute(runId) - Always called first for setup operations
  2. execute(publisher) - Called if preExecute() succeeds, performs main data ingestion
  3. postExecute(runId) - Called if preExecute() and execute() succeed
  4. close() - Always called for cleanup, even if errors occur
A new Connector instance is created for each run.

AbstractConnector Base Class

Most connectors extend AbstractConnector, which provides common functionality:
package com.kmwllc.lucille.connector;

public abstract class AbstractConnector implements Connector {
  protected final Config config;
  
  public AbstractConnector(Config config);
  public String getName();
  public String getPipelineName();
  public boolean requiresCollapsingPublisher();
  public String getDocIdPrefix();
  public String createDocId(String id);
  public void preExecute(String runId) throws ConnectorException;
  public void postExecute(String runId) throws ConnectorException;
  public void close() throws ConnectorException;
}
Location: com.kmwllc.lucille.connector.AbstractConnector

Base Configuration Parameters

All connectors that extend AbstractConnector support these configuration parameters:
name
String
required
The name of the connector. Connector names should be unique within your Lucille configuration.
class
String
required
The fully qualified class name of the connector implementation.
pipeline
String
The name of the pipeline to feed Documents to. Defaults to null (no pipeline).
docIdPrefix
String
A string to prepend to Document IDs originating from this connector. Defaults to an empty string.
collapse
Boolean
Whether this connector is “collapsing”. A collapsing Publisher combines Documents published in sequence that share the same ID into a single document with multi-valued fields. Defaults to false.

Specification and Validation

All connector implementations must declare a public static Spec SPEC that defines the connector’s configuration properties. This Spec is accessed reflectively in the AbstractConnector constructor, and the provided Config is validated against it.
Connectors will not function without declaring a public static Spec SPEC.

Example Spec Declaration

public static final Spec SPEC = SpecBuilder.connector()
    .requiredString("driver", "connectionString", "jdbcUser", "jdbcPassword", "sql", "idField")
    .optionalString("preSQL", "postSQL")
    .optionalNumber("fetchSize", "connectionRetries", "connectionRetryPause")
    .build();

Creating Connectors from Configuration

Connectors are instantiated reflectively using the static fromConfig() method:
List<Connector> connectors = Connector.fromConfig(config);
Connector constructors must accept a single Config parameter.

Publishing Documents

Connectors receive a Publisher instance in their execute() method. Use this to publish documents:
@Override
public void execute(Publisher publisher) throws ConnectorException {
  Document doc = Document.create("doc-id-123");
  doc.setField("title", "Example Document");
  publisher.publish(doc);
}

Built-in Connectors

Lucille provides several built-in connector implementations:

FileConnector

Traverse local and cloud storage (S3, GCP, Azure)

DatabaseConnector

Query relational databases via JDBC

KafkaConnector

Read messages from Kafka topics

SolrConnector

Query and index Solr collections

RSSConnector

Read items from RSS/Atom feeds

SequenceConnector

Generate empty documents for testing

Plugin Connectors

ParquetConnector

Read Parquet files from local or cloud storage

Deprecated Connectors

The following connectors are deprecated. Use FileConnector with appropriate FileHandlers instead:
  • CSVConnector (use FileConnector with CSVFileHandler)
  • JSONConnector (use FileConnector with JSONFileHandler)
  • XMLConnector (use FileConnector with XMLFileHandler)

Document ID Management

The createDocId(String id) method adds the configured docIdPrefix to IDs:
String fullId = createDocId("123"); // Returns "prefix123" if docIdPrefix="prefix"
This helps namespace documents from different sources.

Error Handling

Connectors should throw ConnectorException when errors occur:
throw new ConnectorException("Error connecting to database", e);
The framework ensures close() is always called for cleanup, even if exceptions are thrown.

Next Steps

FileConnector Reference

Complete reference for file and cloud storage ingestion

DatabaseConnector Reference

JDBC-based database querying and ingestion