What are Plugins?
Lucille plugins extend the core framework with additional functionality. Plugins are separate Maven modules that provide specialized stages, indexers, and utilities for specific use cases like text extraction, OCR, and vector database integration.Plugin Architecture
Plugins are organized as separate modules underlucille-plugins/:
- Has its own dependencies and versioning
- Extends core Lucille interfaces (Stage, Indexer, etc.)
- Provides configuration specifications via the Spec framework
- Includes tests and documentation
Available Plugins
Tika
Extract text and metadata from documents using Apache Tika
OCR
Optical character recognition for images and PDFs
Pinecone
Vector database indexer for embeddings
Weaviate
Object-based vector search engine
Using Plugins
Maven Dependency
Add the plugin as a dependency in yourpom.xml:
Configuration
Reference plugin stages or indexers in your pipeline configuration:Plugin Types
Stage Plugins
Extend theStage interface to transform documents:
- TextExtractor (Tika): Extract text from files
- ApplyOCR (OCR): Recognize text in images
- Named entity extraction: Identify entities in text
Indexer Plugins
Extend theIndexer interface to send documents to external systems:
- PineconeIndexer: Vector database for embeddings
- WeaviateIndexer: Object-based vector search
Utility Plugins
Provide helper functions and integrations:- API clients: Connect to external services
- File handlers: Support for specialized file formats
- Data converters: Transform between formats
Creating Custom Plugins
Plugin Structure
Extend Base Classes
Define Configuration Spec
Use the Spec framework to declare configuration:Write Tests
Plugin Dependencies
Managing External Libraries
Plugins can include their own dependencies:Avoiding Conflicts
- Use dependency management to align versions
- Exclude transitive dependencies when needed
- Test plugin isolation
Best Practices
Follow naming conventions
Follow naming conventions
- Stage classes:
*Stage,*Extractor,*Processor - Indexers:
*Indexer - Utilities:
*Utils,*Helper - Package structure:
com.kmwllc.lucille.<plugin>.<type>
Provide complete configuration specs
Provide complete configuration specs
- Declare all required and optional parameters
- Include descriptions in Javadoc
- Validate configuration in constructor
- Provide sensible defaults
Handle resources properly
Handle resources properly
- Use
start()to initialize resources - Use
stop()to clean up - Close connections and file handles
- Release memory for large objects
Write comprehensive tests
Write comprehensive tests
- Unit tests for each stage/indexer
- Integration tests with real dependencies
- Configuration validation tests
- Error handling tests
Document configuration
Document configuration
- Provide README with examples
- Document all configuration parameters
- Include usage examples
- Explain limitations and requirements
Plugin Lifecycle
Initialization
- Constructor: Parse and validate configuration
- start(): Initialize resources (clients, caches, files)
- validateConnection() (indexers): Verify connectivity
Execution
- processDocument() (stages): Transform documents
- sendToIndex() (indexers): Send batches to external systems
Cleanup
- stop(): Release resources and close connections
- closeConnection() (indexers): Close client connections
Publishing Plugins
Plugins can be:- Internal: Part of the main Lucille repository
- External: Separate repositories with their own lifecycle
- Private: Company-specific plugins not published publicly
Distribution
- Maven Central (for open source)
- Private Maven repository
- JAR files with dependencies