Quickstart Guide

This guide will get you up and running with Lucille in just a few minutes. We’ll walk through a simple example that reads a CSV file of songs and indexes them to Apache Solr.

This quickstart assumes you have Java 17+ and Maven installed. If not, see the Installation guide first.

Prerequisites

Before you begin, make sure you have:

Java 17 or later installed
Maven installed
Apache Solr 8.x or later running on port 8983
A Solr collection named quickstart created

New to Solr? Follow the Apache Solr Quick Start to get Solr running locally.

Step 1: Clone and Build Lucille

Clone the repository

git clone https://github.com/kmwtechnology/lucille.git
cd lucille

Build the project

mvn clean install

This will compile Lucille and all its modules. The build may take a few minutes the first time as Maven downloads dependencies.

Navigate to the example

cd lucille-examples/lucille-simple-csv-solr-example

Step 2: Understand the Configuration

The example includes a configuration file that defines the entire ETL workflow. Let’s examine conf/simple-csv-solr-example.conf:

conf/simple-csv-solr-example.conf

# This example illustrates how Lucille can handle a simple use case 
# like indexing the contents of a CSV file into Solr

# CONNECTORS: Define data sources
connectors: [
  {
    class: "com.kmwllc.lucille.connector.FileConnector",
    paths: ["conf/songs.csv"],
    name: "connector1",
    pipeline: "pipeline1"
    fileHandlers: {
      csv: { }
    }
  }
]

# PIPELINES: Define transformation stages
pipelines: [
  {
    name: "pipeline1",
    stages: []  # Empty pipeline means no transformations
  }
]

# INDEXER: Define the destination
indexer {
  type: "Solr"
}

# SOLR: Configure connection details
solr {
  useCloudClient: true
  defaultCollection: "quickstart"
  url: ["http://localhost:8983/solr"]
}

Lucille uses HOCON (Human-Optimized Config Object Notation) for configuration files. It’s a superset of JSON that’s easier to read and write.

Configuration Breakdown

Connectors Section

Connectors extract data from source systems. This example uses the FileConnector to read a CSV file:

class: The Java class that implements the connector
paths: List of files or directories to read
name: Unique identifier for this connector
pipeline: Which pipeline should process these documents
fileHandlers.csv: Configuration for CSV file parsing (uses defaults)

Pipelines Section

Pipelines transform and enrich documents. This example has an empty pipeline (no transformations), but you can add stages to:

Parse dates and numbers
Extract text from documents
Generate embeddings
Query databases
Apply business logic

Indexer Section

Indexers send processed documents to their destination. The type: "Solr" tells Lucille to use the Solr indexer.

Solr Section

Solr configuration specifies how to connect:

useCloudClient: Use Solr Cloud client (works for standalone too)
defaultCollection: Collection name to index into
url: Solr server URL

Step 3: Review the Source Data

The example includes a CSV file with song data. Here’s a sample of conf/songs.csv:

title,artist,top genre,year released,added,bpm,nrgy,dnce,dB,live,val,dur,acous,spch,pop,top year,artist type
STARSTRUKK (feat. Katy Perry),3OH!3,dance pop,2009,2022‑02‑17,140,81,61,-6,23,23,203,0,6,70,2010,Duo
My First Kiss (feat. Ke$ha),3OH!3,dance pop,2010,2022‑02‑17,138,89,68,-4,36,83,192,1,8,68,2010,Duo
I Need A Dollar,Aloe Blacc,pop soul,2010,2022‑02‑17,95,48,84,-7,9,96,243,20,3,72,2010,Solo

The CSV has 100 songs with various attributes like title, artist, genre, BPM, and more.

Step 4: Run the Example

Build the example

Make sure you’re in the lucille-simple-csv-solr-example directory:

mvn clean install

Start Solr and create collection

In a separate terminal, make sure Solr is running on port 8983 and you have a collection named quickstart:

# If using Solr standalone
bin/solr start
bin/solr create -c quickstart

Run the ingest

Execute the provided shell script:

./scripts/run_ingest.sh

This script runs Lucille with the configuration file:

java -Dconfig.file=conf/simple-csv-solr-example.conf \
  -cp 'target/lib/*' \
  com.kmwllc.lucille.core.Runner

Step 5: Verify the Results

Commit the documents

After the ingest completes, commit the documents to make them visible:

curl "http://localhost:8983/solr/quickstart/update?commit=true&openSearcher=true"

Query Solr

Check that documents were indexed:

curl "http://localhost:8983/solr/quickstart/select?q=*:*"

Or visit the Solr Admin UI at http://localhost:8983/solr/#/quickstart/query and run a *:* query.

Search for specific songs

Try searching for a specific artist:

curl "http://localhost:8983/solr/quickstart/select?q=artist:Rihanna"

Expected Output

When you run the ingest, you should see output similar to:

INFO  [main] Runner - Starting run with id a7b3c9d2-4e1f-4a8b-9c3d-2e5f8a9b1c4d
INFO  [main] Runner - Running connector connector1 feeding to pipeline pipeline1
INFO  [Worker-1] FileConnector - Processing file: conf/songs.csv
INFO  [Indexer-1] SolrIndexer - Indexed 100 documents
INFO  [main] Runner - Connector connector1 feeding to pipeline pipeline1 complete. Time: 2.34 secs.
INFO  [main] Runner - Run took 2.51 secs.

You should see 100 documents indexed to Solr.

Understanding the Flow

Here’s what happened when you ran the example:

Connector reads CSV

The FileConnector reads songs.csv and converts each row into a Lucille Document. Each CSV column becomes a document field.

Pipeline processes documents

Documents flow through the pipeline. Since the pipeline is empty, no transformations are applied.

Indexer sends to Solr

The SolrIndexer batches documents and sends them to the Solr collection via the Solr API.

Completion

Lucille waits for all documents to be indexed, logs metrics, and exits.

Next Steps

Now that you’ve run your first Lucille example, here are some things to try:

Add Pipeline Stages

Modify the pipeline to add transformations like parsing dates or normalizing text.

Try Other Connectors

Explore examples for databases, S3, RSS feeds, and more in the lucille-examples directory.

Index to Other Engines

Change the indexer type to “Elasticsearch” or “OpenSearch” to try other search engines.

Run in Distributed Mode

Scale up by running Workers and Indexers as separate processes with Kafka.

Check out the other examples in lucille-examples/ to see more advanced use cases:

lucille-rss-example: Index RSS feeds
lucille-opensearch-ingest-example: Index to OpenSearch
lucille-vector-ingest-example: Generate embeddings and index to vector databases
lucille-distributed-example: Run in distributed mode with Kafka

Troubleshooting

Error: Connection refused to Solr

Make sure Solr is running on port 8983:

curl http://localhost:8983/solr/

If not running, start Solr: bin/solr start

Error: Collection 'quickstart' not found

Create the collection:

bin/solr create -c quickstart

No documents appear in Solr

Make sure you committed the documents:

curl "http://localhost:8983/solr/quickstart/update?commit=true&openSearcher=true"

Java version error

Lucille requires Java 17 or later. Check your version:

java -version

If needed, install Java 17+ and set JAVA_HOME.

For more detailed configuration options and advanced features, see the full documentation.

Get Started

Core Concepts

Configuration

Deployment

Guides

Quickstart

Quickstart Guide

Prerequisites

Step 1: Clone and Build Lucille

Step 2: Understand the Configuration

Configuration Breakdown

Step 3: Review the Source Data

Step 4: Run the Example

Step 5: Verify the Results

Expected Output

Understanding the Flow

Next Steps

Add Pipeline Stages

Try Other Connectors

Index to Other Engines

Run in Distributed Mode

Troubleshooting

Get Started

Core Concepts

Configuration

Deployment

Guides

​Quickstart Guide

​Prerequisites

​Step 1: Clone and Build Lucille

​Step 2: Understand the Configuration

​Configuration Breakdown

​Step 3: Review the Source Data

​Step 4: Run the Example

​Step 5: Verify the Results

​Expected Output

​Understanding the Flow

​Next Steps

Add Pipeline Stages

Try Other Connectors

Index to Other Engines

Run in Distributed Mode

​Troubleshooting

Quickstart Guide

Prerequisites

Step 1: Clone and Build Lucille

Step 2: Understand the Configuration

Configuration Breakdown

Step 3: Review the Source Data

Step 4: Run the Example

Step 5: Verify the Results

Expected Output

Understanding the Flow

Next Steps

Troubleshooting