JDBC Connectors
The JDBC Connector bridges between SQL semantic and data type difference between Data Virtuality Server and a target RDBMS. Data Virtuality Server has a range of specific connectors that target the most popular open-source and proprietary databases.
Type names:
type name | description | specific features |
---|---|---|
jdbc-ansi | declares support for most SQL constructs supported by Data Virtuality Server, except for row limit/offset and EXCEPT/INTERSECT. | Translates source SQL into ANSI-compliant syntax. This connector should be used when another more specific type is not available. |
jdbc-simple | same as jdbc-ansi, except disables support for function, UNION, and aggregate pushdown | |
access | for use with Microsoft Access 2003 or later | |
bigquery | for use with Google BigQuery | |
clickhouse | for use with ClickHouse database | |
db2 | for use with DB2 8 or later. | |
derby | for use with Derby 10.1 or later | |
datavirtuality | for use with Data Virtuality Server 1.0.0 or later | |
exasol | for use with EXASOL database | |
excel-odbc | for use with Excel 2003 or later via the JDBC-ODBC bridge | |
greenplum | for use with the Greenplum database. | |
h2 | for use with H2 version 1.1 or later | |
hive | For use with Hive database based on Hadoop | Hive is a data warehousing infrastructure based on the Hadoop. Hadoop provides massive scale out and fault tolerance capabilities for data storage and processing (using the map-reduce programming paradigm) on commodity hardware. Hive has limited support for data types as it supports integer variants, boolean, float, double and string. It does not have native support for time-based types, XML or LOBs. These limitations are reflected in the connector capabilities. The view table can use these types, however, the transformation would need to specify the necessary transformations. Note that in those situations, the evaluations will be done in the Data Virtuality Server engine. Another limitation Hive has is, it only supports EQUI join, so using any other join types on its source tables will result in inefficient queries. To write criteria based on partitioned columns, they can be modelled on the source table, but not included in selected columns. |
hsql | for use with HSQLDB 1.7 or later | |
ingres | for use with Ingres 2006 or later | |
ingres93 | for use with Ingres 9.3 or later | |
intersystems-cache | for use with Intersystems Cache Object database (the only relational aspect of it) | |
informix | for use with any Informix version | |
kdb | for use with kdb+ | |
memsql | for use with MemSQL 5.0 or later, SingleStore (former MemSQL) | |
metamatrix | for use with MetaMatrix 5.5.0 or later | |
modeshape | for use with Modeshape 2.2.1 or later | The PATH, NAME, LOCALNODENAME, DEPTH, and SCORE functions should be accessed as pseudo-columns, e.g. "nt:base"."jcr:path". Data Virtuality Server UFDs (prefixed by JCR_) are available for CONTIANS, ISCHILDNODE, ISDESCENDENT, ISSAMENODE, REFERENCE - see the JCRFunctions.xmi. If a selector name is needed in a JCR function, you should use the pseudo-column "jcr:path", e.g. JCR_ISCHILDNODE(foo.jcr_path, 'x/y') would become ISCHILDNODE(foo, 'x/y') in the ModeShape query. An additional pseudo-column "mode:properties" should be imported by setting the ModeShape JDBC connection property teiidsupport=true. The column "mode:properties" should be used by the JCR_REFERENCE and other functions that expect a .* selector name, e.g. JCR_REFERENCE(nt_base.jcr_properties) would become REFERENCE("nt:base".*) in the ModeShape query. |
mysql / mysql5 | for use with MySQL version 4.x and 5 or later respectively | The MySQL connectors expect the database or session to be using ANSI mode. If the database is not using ANSI mode, an initialization query should be used on the pool to set ANSI mode:set SESSION sql_mode = 'ANSI' |
netezza | for use with any Netezza version | |
neo4j | for use with any Neo4j 3.x or later | |
oracle | for use with Oracle 9i or later | Sequences may be used with the Oracle connector. A sequence may be modelled as a table with a name in the source of DUAL and columns with the name in the source set to Oracle-specific translator properties: OracleSuppliedDriver indicates that the Oracle-supplied driver (typically prefixed by ojdbc) is being used. Defaults to true. Set to false when using DataDirect or other Oracle JDBC drivers. |
postgresql | for use with 8.0 or later clients and 7.1 or later server. | |
redis | for use with Redis 2.8 or later | |
redshift | for use with Amazon Redshift | |
sqlserver | for use with SQL Server 2000 or later. A SQL Server JDBC driver version 2.0 or later (or compatible e.g. JTDS 1.2 or later) should be used. | SQL Server specific translator properties: JtdsDriver indicates that the open source JTDS driver is being used. Defaults to false. |
sybase | for use with Sybase version 12.5 or later and Sybase IQ | Sybase specific translator properties: JtdsDriver indicates that the open-source JTDS driver is being used. Defaults to false. |
teradata | for use with Teradata V2R5.1 or later | Teradata specific translator properties: supportsRandFunction enables pushing down to source for RAND() function. Defaults to false. |
Connector Configuration
Metadata
Before issuing Queries to the file, we need to configure a connection using the SYSADMIN.createConnection() procedure:
Example
call SYSADMIN.createConnection('somedb', '<type name>', 'db=somedb,user-name=user,password=pwd');
call SYSADMIN.createDatasource('somedb', '<type name>', null, null);
Connection Properties
Are described in chapter Connection Configuration Interface, section "Predefined CLI templates".
Translator Properties Shared by All JDBC Connectors
Name | Description | Default |
---|---|---|
comparisonStringConversion | Sets a template to convert WHERE clauses on strings. Examples
SQL
| MS SQL Server: MySQL, SingleStore (former MemSQL):
others: |
DatabaseTimeZone | The time zone of the database. Used when fetchings date, time, or timestamp values. | The system default time zone |
DatabaseVersion | The specific database version. Used to further tune pushdown support. | Automatically detected by the server through a data source JDBC driver, if possible |
ForceQuotedIdentifiers | If true, all identifiers in the final query (that is being generated by a translator to be sent directly to the target DBMS) will be quoted. | true |
forbidNonMatchingNames | Is only considered when importer.tableNamePattern is set. When set to false allows creating tables with names that do not match the tableNamePattern. The tables with non-matching names will only be accessible until the server restart | true |
OrderByStringConversion | Sets a template to convert ORDER BY clauses on strings. Examples
SQL
| DB2 for AS/400: DB2 for zOS: MS SQL Server: MySQL, SingleStore (former MemSQL): PostgreSQL: others: |
supportsConvertFromClobToString | True to indicate that the translator supports CONVERT/CAST function from Clob to String data types. |
|
supportsNativeQueries | Forces a translator to issue a native() system procedure which can be used to pass native queries directly to an underlying DBMS. native()-Usage Example
SQL
|
|
supportsOrderByAlias | If false, Data Virtuality Server doesn't generate aliases in ORDER BY clause. True to indicate that the source supports aliases in ORDER BY clause. |
|
supportsOrderByString | If false, Data Virtuality Server doesn't push down ORDER BY clause if it contains at least one expression of string type. True to indicate that the source supports strings in ORDER BY clause. |
|
TrimStrings | True to trim trailing whitespace from fixed-length character strings. Note that Data Virtuality Server only has a string or varchar type that treats trailing whitespace as meaningful. | false |
UseBindVariables | True to indicate that PreparedStatements should be used and that literal values in the source query should be replaced with bind variables. If false only LOB values will trigger the use of PreparedStatements. | true |
UseCommentsInSourceQuery | This will embed a / comment / leading comment with session/request id in the source SQL query for informational purposes | false |
MaxPreparedInsertBatchSize | The max size of a prepared insert batch. | 2048 |
The names of the translator properties are case sensitive.
Translator Properties Specific for DBMS
Name | DBMS | Description | Default |
---|---|---|---|
UseStreamingResults | MySQL, PostgreSQL | By default MySQL and PostgreSQL JDBC drivers try to load all results into memory. With big tables, there might be performance problems and sometimes it might result in OutOfMemory. This option is used to have better memory usage by the JDBC driver and, in the case of MySQL, to quicker transfer of large data sets. It is recommended to set this option to true if OutOfMemory problems are experienced when reading large datasets. However, such connections should be used only for reading data, since enabling streaming breaks the normal transactional behaviour. | true |
replaceNullCharsWith | PostgreSQL, Redshift, Greenplum, BigQuery | A string property. If it's set, the translator replaces all null characters in strings before execute INSERT/UPDATE with the string specified as the property value. You may specify an empty string as well. | Single space |
uploadMode | Redshift | Values: bulkInsertSingleThread, bulkInsertMultiThread bulkInsertSingleThread: Bulk inserts in a single thread. | |
Redshift, Snowflake | Value: s3Load Uploads data as files on Amazon S3 storage. Translator properties "region", "keyId", and "secretKey" properties should be specified for the mode to work properly | ||
Snowflake | Value: blob_azure Uploads data as files on Azure Blob storage. Translator properties "storageAccountName", "storageAccountKey" and "sasToken" should be specified for the mode to work properly | ||
Azure | Values: blob, standard | ||
SingleStore (former MemSQL), MySQL | Values: bulkInsertSingleThread , bulkInsertMultiThread , standard bulkInsertSingleThread - sends batches in a single thread | bulkInsertSingleThread | |
MS SQL | Values: bulkInsertSingleThread, bulkInsertMultiThread, standard bulkInsertSingleThread - sends batches in a single thread | standard | |
BigQuery | Values: CLOUDSTORAGE (default), STREAMING, INSERTS
| CLOUDSTORAGE | |
maxChunkSize | Redshift | Value: a number Sets the size limit of a temp file in bytes. | 16000 |
SingleStore (former MemSQL), MySQL, MS SQL | Value: a number Sets the approximate size in bytes for batches created. | SingleStore (former MemSQL): 1048576 MySQL: 1039597 | |
numberOfThreads | Redshift | Value: a number Specifies the maximum number of uploader threads. | 10 |
SingleStore (former MemSQL), MySQL, MS SQL | Value: a number. The number of threads for the bulkInsertMultiThread mode. | 3 | |
singleTransactionInsert | MS SQL | Value: boolean. When set to true update is commited only when all data is loaded | true |
tempFolder | Redshift, Azure, Snowflake | Value: a path to a folder (relative or absolute). Specifies the folder to be used for creating temporary files instead of the system configured one. | |
useMSI | Azure | Value: boolean. For blob uploadMode only | false |
bucketName | Redshift, Snowflake | Value: a bucket name. Only for "s3Load"-uploadMode. | |
BigQuery | Value: a bucket name. Only for default (CLOUDSTORAGE) uploadMode. | ||
bucketPrefix | Redshift, Snowflake | Value: a bucket prefix. Only for "s3Load"-uploadMode. | |
BigQuery | Value: a bucket prefix. Only for default (CLOUDSTORAGE) uploadMode . | ||
createBucket | Redshift, Snowflake | Value: boolean. Specifies if the bucket set in bucketName parameter should be created if it does not exist | false |
region | Redshift, Snowflake | Value: a region. Only for "s3Load"-uploadMode. | |
keyId, secretKey | Redshift, Snowflake | Only for "s3Load"-uploadMode. You can read respectively set the values for keyId and secretKey via AWS-console -> S3 -> username in right upper corner -> Security Credetials -> Access Keys (Access Key ID and Secret Access Key). | |
iamRole, awsAccountId | Redshift | Only for "s3Load"-uploadMode. Alternate (to keyId and secretKey) way to perform S3 authorization. | |
EncryptDataOnS3 | Redshift | Only for "s3Load"-uploadMode. Indicates whether the uploaded file should be encrypted. | false |
useDoubleSlashToEscapeRegex | Redshift | this property is used to change the default escaping behaviour in LIKE_REGEX expressions. | true |
uploadZipped | Redshift | Enables compression of temporary files before uploading them to S3. May be disabled for environments with high network bandwidth in order to save some CPU power but this will increase the disk usage. | false |
truncateStrings | Redshift, Azure, | If it's false, an exception is thrown if a string/clob value exceeds storage size. Otherwise, if the property is set to true, values which exceed size, will be truncated. | Redshift: false Azure, PostgreSQL: true |
varcharReserveAdditionalSpacePercent | Redshift, Snowflake, Vertica | As Redshift and Vertica measure varchar size in bytes, not chars, and stores strings in UTF-8 encoding, a char may be 1 to 4 bytes long. You can specify the percentage by which the byte size will exceed the original char size. Also, there's a special value: 65535 which makes every varchar to be of 65535 bytes long. | 0 |
acceptInvChars | Redshift | Value: any ASCII character except NULL Only for "s3Load"-uploadMode. This property enables the loading of data into VARCHAR columns even if the data contains invalid UTF-8 characters. If it's specified, ACCEPTINVCHARS option will be passed to the S3 COPY command and each invalid UTF-8 character will be replaced with a string of equal length consisting of the character specified. For example, if the replacement character is ' If acceptInvChars is not specified, an error will be thrown whenever an invalid UTF-8 character is encountered. | |
copyParams | Redshift | Arbitrary parameters to be passed to the COPY command when uploading data from S3. | null |
keepTempFiles | Redshift, Azure, Snowflake | Keep temporary files after uploading. | false - it can be enabled for debugging |
maxTableNameLength | MySQL, PostgreSQL, SQL Server, Oracle, Redshift | The maximum length of a table name. The default value relies on underlying DBMS. | MySQL: 64, PostgreSQL: 63, SQL Server: 128, Oracle: 30, Redshift: 127 |
maxColumnNameLength | MySQL, PostgreSQL, SQL Server, Oracle, Redshift | The maximum length of a column name. The d efault value relies on underlying DBMS. Five chars of defined maximum length will be reserved for internal purposes and cannot be used for column-identifier. | MySQL: 64, |
createStringsAsClobs | MySQL | When set to true string columns in the target table are converted to clobs | false |
stringClobsLengthThreshold | MySQL | The property sets a length threshold for strings to be converted to clobs when the 'createStringsAsClobs' is set to true. String columns with length less than the threshold would not be converted to clobs. | 64 |
webhdfsEndpoint | hive | Example: Important: the hostname should be the node name of the Hadoop node which runs WebHDFS service. It is not possible to use the IP address or some alternative DNS name. If the node name does not resolve on Data Virtuality server, please add node name with the correct IP address to the host file on the Data Virtuality server. | |
webhdfsLocation | hive | Example: webhdfsLocation="/user/hive" | |
tableType | hive | possible values: TEXTFILE, PARQUET Example: | |
defaultEndpointsProtocol | Azure, Snowflake | Azure default endpoints protocol Snowflake: for blob_azure uploadMode only | |
storageAccountName | Azure, Snowflake | Azure Storage Account name Azure: for blob uploadMode only Snowflake: for blob_azure uploadMode only | |
storageAccountKey | Azure, Snowflake | Azure Storage Account key Snowflake: for blob_azure uploadMode only Azure: for blob uploadMode only | |
sasToken | Snowflake | The SAS Token used for accessing the Blob Storage from Snowflake. For blob_azure uploadMode only | |
stringDelimiter | Azure | The string delimiter to be used in the text file | | |
deleteContainer | Azure | Delete the temporary container when it is not needed anymore | true - it can be disabled for debugging |
useDdl | BigQuery | Value: boolean. When set to true creates and drops are performed using JDBC statements instead of API | false |
useToolkitReplace | Netezza | Turn on the push down of REPLACE function using TOOLKIT.SQLEXT.REPLACE from Netezza SQL Extensions toolkit | false |
dateAsTimestamp | Oracle | When set to true Data Virtuality Server treats Oracle date data type as timestamp data type | false |
castUuidToString | SQL Server | When set to true UNIQUEIDENTIFIER SQL Server data type is always cast to CHAR(36) SQL Server data type. Otherwise, no implicit casting is applied to the UNIQUEIDENTIFIER SQL Server data type. If set to false, please, use explicit casting of UNIQUEIDENTIFIER SQL Server data type to VARCHAR(36) when using it in window functions | true |
The names of the DBMS specific translator properties are case sensitive.
Data Source Properties Shared by All JDBC Connectors
Name | Description | Default |
---|---|---|
importer.autoCorrectColumnNames | Replace any usage of . in a column name with _ as the period character is not supported by Data Virtuality Server in column names. | true |
importer.defaultSchema |
| empty |
importer.enableMetadataCache | Turns on metadata cache for a single data source even when the global option is turned off. | false |
importer.excludeProcedures | A case-insensitive regular expression that when matched against a fully qualified procedure name will exclude it from import. | empty |
importer.excludeSchemas | A comma-separated list of schemas (no % or ? wildcards allowed) to exclude listed schemas from import. A schema specified in defaultSchema or schemaPattern will be imported despite being listed in excludeSchemas . | Oracle:
CODE
all others: empty |
importer.excludeTables | A case-insensitive regular expression that when matched against a fully qualified table name will exclude it from import. Examples
CODE
| empty |
importer.fetchSize | The fetch size assigned to a resultset on loading metadata. | no default value |
importer.importApproximateIndexes | True to import approximate index information. | true |
importer.importIndexes | True to import index/unique key/cardinality information. | false |
importer.importKeys | True to import primary and foreign keys. | false |
importer.importProcedures | True to import procedures and procedure columns. Overloaded procedures can be imported with an additional option Note, that it is currently not possible to import procedures which use the same name for more than one parameter (e.g. same name for IN and OUT-parameter). Such procedures can be excluded from import with the parameter | false |
importer.loadMetadataWithJdbc | True to turn off all custom metadata load ways. | false |
importer.procedureNamePattern | Procedure(s) to import. If omitted, all procedures will be imported. % as a wildcard is allowed. importer.procedurePatternName=foo% will e.g. import foo, foobar etc. W orks only in combination with importProcedures . | empty |
importer.quoteNameInSource | False to direct Data Virtuality Server to create source queries using unquoted identifiers. | true |
importer.renameDuplicateColumns | True will rename duplicate columns caused by either mixed case collisions or autoCorrectColumnNames replacing . with _. A suffix _n where n is an integer will be added to make the name unique. | true |
importer.renameDuplicateTables | True will rename duplicate tables caused by mixed case collisions. A suffix _n where n is an integer will be added to make the name unique. | true |
importer.replaceSpecSymbsInColNames | True will replace all special symbols (any symbols not in ^A-Za-z0-9_ sequence) to "_" symbols in column names of tables. | false / true only for BigQuery |
importer.schemaPattern | Schema(s) to import. If omitted or has "" value, all schemas will be imported. % as wildcard is allowed: importer.schemaPattern=foo% will e.g. import foo, foobar etc. To specify several schema names or/and patterns, values should be comma-separated and enclosed with double quotes: importer.schemaPattern="schema1,schema2,pattern1%,pattern2%". For proper escaping of special characters depending on the type of data source check "Escaping special characters in schema names" section below or use wildcards instead: "[schema_name]" can be rewritten as "%schema%name%". | empty |
importer.skipMetadataLoadOnStartup | True will allow skipping metadata loading on server startup if there's cached metadata for the data source. | false |
importer.tableNamePattern | Table(s) to import. If omitted, all tables will be imported. % as wildcard is allowed. importer.tableNamePattern=foo% will e.g. import foo, foobar etc . | empty |
importer.tableTypes | Comma-separated list - without spaces - of table types to import. Depends on DBMS, which types are available. Usually use "TABLE,VIEW". Other typical types are "SYSTEM TABLE", "GLOBAL TEMPORARY", "LOCAL TEMPORARY", "ALIAS", "SYNONYM". | empty |
importer.useCatalogName | True will use any non-null/non-empty catalogue name as part of the name in source, e.g. "catalogue"."table"."column", and in the Data Virtuality Server runtime name if useFullSchemaName is true. false will not use the catalogue name in either the name in source or the Data Virtuality Server runtime name. Should be set to false for sources that do not fully support a catalogue concept, but return a non-null catalogue name in their metadata - such as HSQL. | true |
importer.useFullSchemaName | False will direct the importer to drop the source catalogue/schema from the Data Virtuality Server object name so that the Data Virtuality Server fully qualified name will be in the form of <model name>.<table name> Note: that this may lead to objects with duplicate names when importing from multiple schemas, which results in an exception. | true |
importer.useProcedureSpecificName | True will allow the import of overloaded procedures (which will normally result in a duplicate procedure error) by using the unique procedure specific name in Data Virtuality Server. This option will only work with JDBC 4.0 compatible drivers that report specific names. | false |
importer.widenUnsignedTypes | True to convert unsigned types to the next widest type. For example, SQL Server reports tinyint as an unsigned type. With this option enabled, tinyint would be imported as a short instead of a byte. | true |
Warning
The default import settings will crawl all available metadata. This import process is time-consuming and full metadata import is not needed in most situations. Most commonly you'll want to limit import by schemaPattern and tableTypes.
Example importer settings to only import tables and views from my-schema
importer.tableTypes="TABLE,VIEW",importer.schemaPattern=my-schema
Example
call createConnection('dso','oracle','host=localhost,db=XE,user-name=<USER>,password=<Password>') ;;
call createDatasource('dso','oracle','importer.schemaPattern=ELIGENT,importer.useFullSchemaName=False,importer.tableTypes="TABLE,VIEW",importer.replaceSpecSymbsInColNames=true','') ;;
Data Source Properties Specific for DBMS
Name | DBMS | Description | Default |
---|---|---|---|
importer.loadAllOracleTableComments | Oracle | Set to false in order to prevent importing of table comments during the data source creation. | true |
importer.loadAllOracleColumnComments | Oracle | Set to false in order to prevent importing of column comments during the data source creation. | true |
Escaping Special Characters in Schema Names
Characters which are forbidden for Oracle schema names and can't be escaped: ", /, &, {, }.
All other special characters can be used as is or should be escaped depending on the type of data source:
Special character | Oracle | PostgreSQL | MS SQL Server | DB2 |
---|---|---|---|---|
" | forbidden | \" | \" | \" |
\ | \\\\ | \\\\\\ | \\\\\\ | \\\\\\ |
_ | /_ | \\\\_ | \\\\_ | \\\\_ |
, | \\, | \\, | \\, | \\, |
% | /% | \\\\% | \\\\% | \\\\% |
[ | [ | [ | [[] | [ |
' | '' | '' | '' | '''' |
Usage
Usage of a JDBC source is straight-forward. Using Data Virtuality Server SQL, the source may be queried as if the tables and procedures were local to the Data Virtuality Server system.
useToolkitReplace Netezza translator property is available since v2.1.6
bucketPrefix Redshift, Snowflake translator property is available since v2.1.7
dateAsTimestamp Oracle translator property is available since v2.1.11
createBucket Redshift, Snowflake translator property is available since v2.1.15
bucketName and bucketPrefix BigQuery translator parameters are available since v2.1.16
uploadMode BigQuery translator parameter is available since v2.1.16
useDdl BigQuery translator parameter is available since v2.1.16
uploadMode, maxChunkSize, numberOfThreads MS SQL translator parameters are available since v2.1.19
singleTransactionInsert MS SQL translator parameter is available since v2.1.19
forbidNonMatchingNames JDBC translator parameter is available since v2.1.20
blob_azure upload mode and defaultEndpointsProtocol, storageAccountName, storageAccountKey, sasToken translator properties available for Snowflake since v2.1.20
kdb translator since v2.1.21
clickhouse translator since v2.1.21
stringClobsLengthThreshold MySQL translator property is available since v2.1.23
useMSI translator property for Azure Synapse Analytics (former SQL Data Warehouse) with Azure Blob Storage uploadMode and Managed Service Identity since available v2.1.28
The default value of uploadZipped translator property changed to "true" for Redshift since v2.1.28
The default value of uploadZipped translator property changed to "false" for Redshift since v2.1.36
importer.loadAllOracleTableComments and importer.loadAllOracleColumnComments model properties for Oracle since v2.1.28
importer.enableMetadataCache model property is available since v2.1.28
castUuidToString SQL Server translator property is available since v2.1.35