Skip to main content

Quickstart Guide

This guide will get you up and running with Lucille in just a few minutes. We’ll walk through a simple example that reads a CSV file of songs and indexes them to Apache Solr.
This quickstart assumes you have Java 17+ and Maven installed. If not, see the Installation guide first.

Prerequisites

Before you begin, make sure you have:
  • Java 17 or later installed
  • Maven installed
  • Apache Solr 8.x or later running on port 8983
  • A Solr collection named quickstart created
New to Solr? Follow the Apache Solr Quick Start to get Solr running locally.

Step 1: Clone and Build Lucille

1

Clone the repository

git clone https://github.com/kmwtechnology/lucille.git
cd lucille
2

Build the project

mvn clean install
This will compile Lucille and all its modules. The build may take a few minutes the first time as Maven downloads dependencies.
3

Navigate to the example

cd lucille-examples/lucille-simple-csv-solr-example

Step 2: Understand the Configuration

The example includes a configuration file that defines the entire ETL workflow. Let’s examine conf/simple-csv-solr-example.conf:
conf/simple-csv-solr-example.conf
# This example illustrates how Lucille can handle a simple use case 
# like indexing the contents of a CSV file into Solr

# CONNECTORS: Define data sources
connectors: [
  {
    class: "com.kmwllc.lucille.connector.FileConnector",
    paths: ["conf/songs.csv"],
    name: "connector1",
    pipeline: "pipeline1"
    fileHandlers: {
      csv: { }
    }
  }
]

# PIPELINES: Define transformation stages
pipelines: [
  {
    name: "pipeline1",
    stages: []  # Empty pipeline means no transformations
  }
]

# INDEXER: Define the destination
indexer {
  type: "Solr"
}

# SOLR: Configure connection details
solr {
  useCloudClient: true
  defaultCollection: "quickstart"
  url: ["http://localhost:8983/solr"]
}
Lucille uses HOCON (Human-Optimized Config Object Notation) for configuration files. It’s a superset of JSON that’s easier to read and write.

Configuration Breakdown

Connectors extract data from source systems. This example uses the FileConnector to read a CSV file:
  • class: The Java class that implements the connector
  • paths: List of files or directories to read
  • name: Unique identifier for this connector
  • pipeline: Which pipeline should process these documents
  • fileHandlers.csv: Configuration for CSV file parsing (uses defaults)
Pipelines transform and enrich documents. This example has an empty pipeline (no transformations), but you can add stages to:
  • Parse dates and numbers
  • Extract text from documents
  • Generate embeddings
  • Query databases
  • Apply business logic
Indexers send processed documents to their destination. The type: "Solr" tells Lucille to use the Solr indexer.
Solr configuration specifies how to connect:
  • useCloudClient: Use Solr Cloud client (works for standalone too)
  • defaultCollection: Collection name to index into
  • url: Solr server URL

Step 3: Review the Source Data

The example includes a CSV file with song data. Here’s a sample of conf/songs.csv:
title,artist,top genre,year released,added,bpm,nrgy,dnce,dB,live,val,dur,acous,spch,pop,top year,artist type
STARSTRUKK (feat. Katy Perry),3OH!3,dance pop,2009,2022‑02‑17,140,81,61,-6,23,23,203,0,6,70,2010,Duo
My First Kiss (feat. Ke$ha),3OH!3,dance pop,2010,2022‑02‑17,138,89,68,-4,36,83,192,1,8,68,2010,Duo
I Need A Dollar,Aloe Blacc,pop soul,2010,2022‑02‑17,95,48,84,-7,9,96,243,20,3,72,2010,Solo
The CSV has 100 songs with various attributes like title, artist, genre, BPM, and more.

Step 4: Run the Example

1

Build the example

Make sure you’re in the lucille-simple-csv-solr-example directory:
mvn clean install
2

Start Solr and create collection

In a separate terminal, make sure Solr is running on port 8983 and you have a collection named quickstart:
# If using Solr standalone
bin/solr start
bin/solr create -c quickstart
3

Run the ingest

Execute the provided shell script:
./scripts/run_ingest.sh
This script runs Lucille with the configuration file:
java -Dconfig.file=conf/simple-csv-solr-example.conf \
  -cp 'target/lib/*' \
  com.kmwllc.lucille.core.Runner

Step 5: Verify the Results

1

Commit the documents

After the ingest completes, commit the documents to make them visible:
curl "http://localhost:8983/solr/quickstart/update?commit=true&openSearcher=true"
2

Query Solr

Check that documents were indexed:
curl "http://localhost:8983/solr/quickstart/select?q=*:*"
Or visit the Solr Admin UI at http://localhost:8983/solr/#/quickstart/query and run a *:* query.
3

Search for specific songs

Try searching for a specific artist:
curl "http://localhost:8983/solr/quickstart/select?q=artist:Rihanna"

Expected Output

When you run the ingest, you should see output similar to:
INFO  [main] Runner - Starting run with id a7b3c9d2-4e1f-4a8b-9c3d-2e5f8a9b1c4d
INFO  [main] Runner - Running connector connector1 feeding to pipeline pipeline1
INFO  [Worker-1] FileConnector - Processing file: conf/songs.csv
INFO  [Indexer-1] SolrIndexer - Indexed 100 documents
INFO  [main] Runner - Connector connector1 feeding to pipeline pipeline1 complete. Time: 2.34 secs.
INFO  [main] Runner - Run took 2.51 secs.
You should see 100 documents indexed to Solr.

Understanding the Flow

Here’s what happened when you ran the example:
1

Connector reads CSV

The FileConnector reads songs.csv and converts each row into a Lucille Document. Each CSV column becomes a document field.
2

Pipeline processes documents

Documents flow through the pipeline. Since the pipeline is empty, no transformations are applied.
3

Indexer sends to Solr

The SolrIndexer batches documents and sends them to the Solr collection via the Solr API.
4

Completion

Lucille waits for all documents to be indexed, logs metrics, and exits.

Next Steps

Now that you’ve run your first Lucille example, here are some things to try:

Add Pipeline Stages

Modify the pipeline to add transformations like parsing dates or normalizing text.

Try Other Connectors

Explore examples for databases, S3, RSS feeds, and more in the lucille-examples directory.

Index to Other Engines

Change the indexer type to “Elasticsearch” or “OpenSearch” to try other search engines.

Run in Distributed Mode

Scale up by running Workers and Indexers as separate processes with Kafka.
Check out the other examples in lucille-examples/ to see more advanced use cases:
  • lucille-rss-example: Index RSS feeds
  • lucille-opensearch-ingest-example: Index to OpenSearch
  • lucille-vector-ingest-example: Generate embeddings and index to vector databases
  • lucille-distributed-example: Run in distributed mode with Kafka

Troubleshooting

Make sure Solr is running on port 8983:
curl http://localhost:8983/solr/
If not running, start Solr: bin/solr start
Create the collection:
bin/solr create -c quickstart
Make sure you committed the documents:
curl "http://localhost:8983/solr/quickstart/update?commit=true&openSearcher=true"
Lucille requires Java 17 or later. Check your version:
java -version
If needed, install Java 17+ and set JAVA_HOME.
For more detailed configuration options and advanced features, see the full documentation.