Overview
Connectors in Lucille Search are responsible for reading data from external sources and generating Documents to be processed by pipelines. Each connector connects to a specific data source, extracts data, and publishes Documents for downstream processing.Connector Interface
All connectors implement theConnector interface, which defines the contract for data ingestion:
com.kmwllc.lucille.core.Connector
Connector Lifecycle
During each run, connectors follow a strict lifecycle:- preExecute(runId) - Always called first for setup operations
- execute(publisher) - Called if preExecute() succeeds, performs main data ingestion
- postExecute(runId) - Called if preExecute() and execute() succeed
- close() - Always called for cleanup, even if errors occur
A new Connector instance is created for each run.
AbstractConnector Base Class
Most connectors extendAbstractConnector, which provides common functionality:
com.kmwllc.lucille.connector.AbstractConnector
Base Configuration Parameters
All connectors that extendAbstractConnector support these configuration parameters:
The name of the connector. Connector names should be unique within your Lucille configuration.
The fully qualified class name of the connector implementation.
The name of the pipeline to feed Documents to. Defaults to null (no pipeline).
A string to prepend to Document IDs originating from this connector. Defaults to an empty string.
Whether this connector is “collapsing”. A collapsing Publisher combines Documents published in sequence that share the same ID into a single document with multi-valued fields. Defaults to false.
Specification and Validation
All connector implementations must declare apublic static Spec SPEC that defines the connector’s configuration properties. This Spec is accessed reflectively in the AbstractConnector constructor, and the provided Config is validated against it.
Example Spec Declaration
Creating Connectors from Configuration
Connectors are instantiated reflectively using the staticfromConfig() method:
Config parameter.
Publishing Documents
Connectors receive aPublisher instance in their execute() method. Use this to publish documents:
Built-in Connectors
Lucille provides several built-in connector implementations:FileConnector
Traverse local and cloud storage (S3, GCP, Azure)
DatabaseConnector
Query relational databases via JDBC
KafkaConnector
Read messages from Kafka topics
SolrConnector
Query and index Solr collections
RSSConnector
Read items from RSS/Atom feeds
SequenceConnector
Generate empty documents for testing
Plugin Connectors
ParquetConnector
Read Parquet files from local or cloud storage
Deprecated Connectors
The following connectors are deprecated. Use FileConnector with appropriate FileHandlers instead:
- CSVConnector (use FileConnector with CSVFileHandler)
- JSONConnector (use FileConnector with JSONFileHandler)
- XMLConnector (use FileConnector with XMLFileHandler)
Document ID Management
ThecreateDocId(String id) method adds the configured docIdPrefix to IDs:
Error Handling
Connectors should throwConnectorException when errors occur:
close() is always called for cleanup, even if exceptions are thrown.
Next Steps
FileConnector Reference
Complete reference for file and cloud storage ingestion
DatabaseConnector Reference
JDBC-based database querying and ingestion