Overview
Local mode runs all Lucille components (Runner, Worker, Indexer) inside a single JVM process. This deployment mode is ideal for:- Development and testing - Quick iteration without external dependencies
- Small-scale ingestion - Processing datasets that fit within single-machine resources
- Proof of concept - Evaluating Lucille before scaling to distributed mode
- Simple use cases - When throughput requirements don’t demand horizontal scaling
Local mode uses in-memory queues for inter-component communication. No external message broker is required.
Architecture
In local mode, the Runner launches Worker and Indexer threads within the same JVM:java \
-Dconfig.file=path/to/application.conf \
-cp 'lucille-core/target/lucille.jar:lucille-core/target/lib/*' \
com.kmwllc.lucille.core.Runner
-Dconfig.file - Path to your configuration file-cp - Classpath including Lucille JAR and dependenciescom.kmwllc.lucille.core.Runner - Main class (no arguments = local mode)25/10/31 13:40:21 6790d2e9-1079 INFO WorkerPool: 27017 docs processed.
One minute rate: 1787.10 docs/sec. Mean pipeline latency: 10.63 ms/doc.
25/10/31 13:40:22 6790d2e9-1079 INFO Indexer: 17016 docs indexed.
One minute rate: 455.07 docs/sec. Mean backend latency: 6.90 ms/doc.
Thread Configuration
Local mode creates these threads:- Main Thread - Launches components and monitors completion
- Connector Thread - Reads source data and publishes documents
- Worker Thread(s) - Process documents through pipeline stages
- Indexer Thread - Batches and sends documents to destination
Configuring Worker Threads
By default, Lucille creates one worker thread per CPU core. Override this in your config:Use Cases
Development and Testing
Best For:- Writing and debugging custom stages
- Testing pipeline configurations
- Validating connector behavior
- Integration tests in CI/CD
Small-Scale Production Workloads
Best For:- Periodic batch jobs (under 1M documents)
- Non-time-critical ingestion
- Single-source ETL pipelines
- Resource-constrained environments
Limitations
Single Point of Failure
If the JVM crashes or the process is killed, all in-flight work is lost. There is no recovery mechanism.Memory Constraints
All components share the same heap:- In-memory queues hold documents between stages
- Large documents or deep queues can cause OutOfMemoryErrors
- Worker threads and indexer batches compete for heap space
No Horizontal Scaling
You cannot add more machines to increase throughput. Performance is bounded by:- Single-machine CPU cores (limits worker parallelism)
- Single-machine memory (limits queue depth and batch sizes)
- Single-machine network I/O (limits indexing throughput)
Limited Observability
Metrics are logged to console only. There is no:- Centralized metrics collection
- Distributed tracing
- External monitoring integration
Validation and Testing
Lucille provides a validation mode to check configurations before running:Always validate configurations in CI/CD pipelines to catch errors before deployment.
Graceful Shutdown
Local mode handlesSIGINT (Ctrl+C) gracefully:
- Connector stops producing new documents
- Workers finish processing in-flight documents
- Indexer flushes final batch
- Connections are closed cleanly
When to Use Local Mode
- ✅ Use Local Mode When
- ❌ Avoid Local Mode When
- Developing and testing pipelines locally
- Processing small datasets (under 100K documents)
- Running one-off batch jobs
- Evaluating Lucille before production
- Constrained to single-machine deployment
- External dependencies (Kafka) are not available
Next Steps
Distributed Mode
Scale to distributed deployment with Kafka for production workloads
Production Best Practices
Learn monitoring, tuning, and troubleshooting for production systems