Overview
TheDatabaseConnector executes SQL SELECT statements against relational databases and publishes each result row as a Document. It supports result set joining, custom ID fields, column filtering, and pre/post SQL execution.
Class: com.kmwllc.lucille.connector.jdbc.DatabaseConnectorExtends:
AbstractConnector
Key Features
- Execute SQL queries against any JDBC-compatible database
- Publish result rows as Lucille Documents
- Join multiple result sets with ordered queries
- Execute pre and post SQL statements (DDL, DML)
- Control fetch size for memory efficiency
- Ignore specific columns from indexing
- Connection retry logic with configurable backoff
- Support for child documents via SQL joins
Class Signature
Configuration Parameters
Required Parameters
JDBC driver class name. The driver JAR must be on the classpath.Common values:
- MySQL:
com.mysql.cj.jdbc.Driver - PostgreSQL:
org.postgresql.Driver - Oracle:
oracle.jdbc.OracleDriver - SQL Server:
com.microsoft.sqlserver.jdbc.SQLServerDriver - H2:
org.h2.Driver
JDBC connection URL for the database.Examples:
- MySQL:
jdbc:mysql://localhost:3306/mydb - PostgreSQL:
jdbc:postgresql://localhost:5432/mydb - Oracle:
jdbc:oracle:thin:@localhost:1521:orcl - SQL Server:
jdbc:sqlserver://localhost:1433;databaseName=mydb - H2:
jdbc:h2:./data/testdb
Database username for authentication.
Database password for authentication.
Primary SQL SELECT statement to execute. This query retrieves the main result set.Important: When using
otherSQLs for joins, this query must be ordered by the join key.Example:Column name to use as the Document ID. This field from the result set becomes the Lucille document’s unique identifier.Example:
"id", "article_id", "uuid"Optional Parameters
Number of rows to fetch at a time from the database. Controls memory usage and network traffic.MySQL: Set to
Integer.MIN_VALUE to enable streaming (prevents buffering entire result set in memory)H2: Streaming not supportedBehavior varies by JDBC driver. Many drivers default to 0 (fetch all rows at once).
SQL statement to execute before the main query. Should not return a result set.Use cases: Setup, temporary tables, parameter settingExamples:
SQL statement to execute after the main query completes. Should not return a result set.Use cases: Cleanup, status updates, loggingExamples:
List of additional SQL queries for joining related data. Each query creates child documents.Requirements:
- Must be used with
otherJoinFields - All queries (main
sqlandotherSQLs) must be ordered by their join key - Only integer join keys are supported
Join field names for each query in
otherSQLs. Required if otherSQLs is provided.The order must match the otherSQLs list. Each field is the column name used to join with the primary result set.Example:List of column names to exclude from the published Documents.Column names are matched case-insensitively.Example:
["internal_id", "deleted_flag", "password_hash"]Number of times to retry connecting to the database if the initial connection fails.
Duration in milliseconds to wait between connection retry attempts.
Document Fields
For each row in the result set, the connector creates a Document with:- ID: Value from the
idFieldcolumn (withdocIdPrefixapplied) - Fields: All result columns (except ID field and ignored columns)
- Field names: Lowercased column names
- Child documents: Flattened rows from
otherSQLs(if configured)
Configuration Examples
Basic MySQL Query
PostgreSQL with Pre/Post SQL
SQL Server with Joins (Child Documents)
Connection Retry Configuration
SQL Joins and Child Documents
The connector supports joining multiple result sets to create Documents with child documents:How It Works
- Execute the main
sqlquery - Execute all
otherSQLsqueries in parallel - For each main result row:
- Create a parent Document
- Find matching rows in other result sets using join keys
- Create child Documents for each matching row
- Add children to parent using
doc.addChild()
Requirements
Example Document Structure
Given this configuration:Data Type Handling
The connector usesJDBCUtils.parseResultToDoc() to convert SQL types to Lucille field values:
- NULL values: Not added to the Document
- Strings: Added as String fields
- Numbers: Added as appropriate numeric types (Integer, Long, Double, etc.)
- Dates/Timestamps: Converted to Instant
- Binary data: Added as byte arrays
- Unsupported types: Skipped with a warning
Performance Considerations
Fetch Size
ThefetchSize parameter is critical for large result sets:
- MySQL
- PostgreSQL
- Oracle
- H2
Set This prevents buffering the entire result set in memory.
fetchSize to Integer.MIN_VALUE (-2147483648) to enable streaming:Connection Management
- Connections are created on-demand and closed in
close() - Multiple connections are created when using
otherSQLs(one per query) - Use connection pooling at the database level for better resource management
Query Optimization
- Add appropriate indexes on columns used in WHERE clauses
- Use
ignoreColumnsto reduce data transfer - Order by indexed columns when using joins
- Consider partitioning large queries across multiple connector instances
Error Handling
Connection Failures
The connector retries connection failures based onconnectionRetries and connectionRetryPause:
Query Failures
SQL exceptions during query execution immediately throwConnectorException. No retry logic is applied to query execution.
Data Conversion Errors
If a column cannot be converted to a Lucille field value, a warning is logged and that field is skipped. The Document is still published.Lifecycle Methods
execute(Publisher publisher)
- Create database connection with retries
- Execute
preSQLif configured - Execute main
sqlquery - Execute all
otherSQLsqueries - Iterate through main result set:
- Create Document from row
- Match and attach child documents from other result sets
- Publish Document
- Execute
postSQLif configured - Close result sets and statements (connections closed in
close())
close()
Closes all database connections created during execution. Always called, even if exceptions occur.Limitations
- Join keys must be integers (or comparable as integers)
- Only supports forward-only, read-only result sets
- No support for stored procedures with multiple result sets
- No support for batch inserts back to database
- Column names must not conflict with Lucille reserved fields
Next Steps
KafkaConnector
Read messages from Kafka topics
FileConnector
Traverse local and cloud storage