Quickstart Guide
This guide will get you up and running with Lucille in just a few minutes. We’ll walk through a simple example that reads a CSV file of songs and indexes them to Apache Solr.This quickstart assumes you have Java 17+ and Maven installed. If not, see the Installation guide first.
Prerequisites
Before you begin, make sure you have:- Java 17 or later installed
- Maven installed
- Apache Solr 8.x or later running on port 8983
- A Solr collection named
quickstartcreated
Step 1: Clone and Build Lucille
Build the project
Step 2: Understand the Configuration
The example includes a configuration file that defines the entire ETL workflow. Let’s examineconf/simple-csv-solr-example.conf:
conf/simple-csv-solr-example.conf
Lucille uses HOCON (Human-Optimized Config Object Notation) for configuration files. It’s a superset of JSON that’s easier to read and write.
Configuration Breakdown
Connectors Section
Connectors Section
Connectors extract data from source systems. This example uses the
FileConnector to read a CSV file:class: The Java class that implements the connectorpaths: List of files or directories to readname: Unique identifier for this connectorpipeline: Which pipeline should process these documentsfileHandlers.csv: Configuration for CSV file parsing (uses defaults)
Pipelines Section
Pipelines Section
Pipelines transform and enrich documents. This example has an empty pipeline (no transformations), but you can add stages to:
- Parse dates and numbers
- Extract text from documents
- Generate embeddings
- Query databases
- Apply business logic
Indexer Section
Indexer Section
Indexers send processed documents to their destination. The
type: "Solr" tells Lucille to use the Solr indexer.Solr Section
Solr Section
Solr configuration specifies how to connect:
useCloudClient: Use Solr Cloud client (works for standalone too)defaultCollection: Collection name to index intourl: Solr server URL
Step 3: Review the Source Data
The example includes a CSV file with song data. Here’s a sample ofconf/songs.csv:
Step 4: Run the Example
Start Solr and create collection
In a separate terminal, make sure Solr is running on port 8983 and you have a collection named
quickstart:Step 5: Verify the Results
Query Solr
Check that documents were indexed:Or visit the Solr Admin UI at
http://localhost:8983/solr/#/quickstart/query and run a *:* query.Expected Output
When you run the ingest, you should see output similar to:Understanding the Flow
Here’s what happened when you ran the example:Connector reads CSV
The
FileConnector reads songs.csv and converts each row into a Lucille Document. Each CSV column becomes a document field.Pipeline processes documents
Documents flow through the pipeline. Since the pipeline is empty, no transformations are applied.
Indexer sends to Solr
The
SolrIndexer batches documents and sends them to the Solr collection via the Solr API.Next Steps
Now that you’ve run your first Lucille example, here are some things to try:Add Pipeline Stages
Modify the pipeline to add transformations like parsing dates or normalizing text.
Try Other Connectors
Explore examples for databases, S3, RSS feeds, and more in the
lucille-examples directory.Index to Other Engines
Change the indexer type to “Elasticsearch” or “OpenSearch” to try other search engines.
Run in Distributed Mode
Scale up by running Workers and Indexers as separate processes with Kafka.
Troubleshooting
Error: Connection refused to Solr
Error: Connection refused to Solr
Make sure Solr is running on port 8983:If not running, start Solr:
bin/solr startError: Collection 'quickstart' not found
Error: Collection 'quickstart' not found
Create the collection:
No documents appear in Solr
No documents appear in Solr
Make sure you committed the documents:
Java version error
Java version error
Lucille requires Java 17 or later. Check your version:If needed, install Java 17+ and set
JAVA_HOME.For more detailed configuration options and advanced features, see the full documentation.