The MongoDB Connector is used to read data from a MongoDB Server and make it available in the Data Virtuality Server in the relational form. In this way, users can query a MongoDB Server via SQL query language and also write complex queries that are directed to multiple different Data Sources, e.g. a single SQL query can join tables from MySQL and combine them with collections (that are represented as tables) coming from MongoDB.

Connector Configuration

Connect to MongoDB

Before issuing the SQL queries to the MongoDB Server, we need to represent Mongo's Metadata in relational format, i.e. represent non-relation collections as relation tables etc. This is done by the connecting to MongoDB data source using createConnection/createDatasource commands.

The complete list of connection properties can be found in chapter Connection Configuration Interface.

The MongoDB specific connection properties options and readMode, translator property queryOption model properties (all) are explained in the sections below.

Example

call "SYSADMIN.createConnection"(
    "name" => 'mongodb',
    "jbossCLITemplateName" => 'mongodb',
    "connectionOrResourceAdapterProperties" => 
		'server=localhost
		,port=27019
		,database=test
		,options="connectionsPerHost=51,description=test,alwaysUseMBeans=true"',
    "encryptedProperties" => null
) ;;

call "SYSADMIN.createDataSource"(
    "name" => 'mongodb',
    "translator" => 'mongodb',
    "modelProperties" => 
		'importer.useFullSchemaName=false
		,importer.TableTypes="TABLE,VIEW"
		,mongodb.maxColumnsNumber=200
		,mongodb.maxRowsNumber=100
		,mongodb.maxVarcharColumnWidth=256
		,mongodb.maxRecursionDepth=2
		,mongodb.fieldDelimiterChar="_"',
    "translatorProperties" => 'queryOption="queryoption_notimeout"',
    "encryptedModelProperties" => null,
    "encryptedTranslatorProperties" => null
) ;;
CODE

MongoDB Driver Options 

MongoDB driver options may be provided as a list of comma-separated properties via connection property options

Supported MongoDB driver options:

Option typeOption nameDefault value
Stringdescription
intminConnectionsPerHost0
intconnectionsPerHost100
intthreadsAllowedToBlockForConnectionMultiplier5
intserverSelectionTimeout30000
intmaxWaitTime120000
intmaxConnectionIdleTime0
intmaxConnectionLifeTime0
intconnectTimeout10000
intsocketTimeout0
booleansocketKeepAlivefalse
booleansslEnabledfalse
booleansslInvalidHostNameAllowedfalse
booleanalwaysUseMBeansfalse
intheartbeatFrequency10000
intminHeartbeatFrequency500
intheartbeatConnectTimeout20000
intheartbeatSocketTimeout20000
intlocalThreshold15
StringrequiredReplicaSetName
booleancursorFinalizerEnabledtrue

Important

The list of all possible options are supported dynamically by MongoDB Connector and can be extended in future versions of MongoDB driver automatically.

MongoDB readMode 

MongoDB readMode may be provided via connection property readMode.

The not mandatory parameter readMode specifies how MongoDB connection routes read operations to the members of a replica set (a cluster of MongoDB servers that implements master-slave replication and automated failover. MongoDB’s recommended replication strategy). By default, a connection directs its read operations to the primary member (the primary member is the current master instance, which receives all write operations) in a replica set. Because write operations are issued to the single primary, reading from the primary returns the latest version of a document. For a connection that does not require fully up-to-date data, you can improve read throughput or reduce latency by distributing some or all reads to secondary members of the replica set. MongoDB driver supports five read modes:

  • primary Default mode. All operations read from the current replica set primary.
  • primaryPreferred In most situations, operations read from the primary but if it is unavailable, operations read from secondary members.
  • secondary All operations read from the secondary members of the replica set.
  • secondaryPreferred In most situations, operations read from secondary members but if no secondary members are available, operations read from the primary.
  • nearest Operations read from the member of the replica set with the least network latency, irrespective of the member’s type.

Important

All read preference modes except primary may return stale data because secondaries replicate operations from the primary with some delay.

MongoDB Query Options

The MongoDB specific query options can be set via MongoDB translator property queryOption. 

These options are supported:

QUERYOPTION_TAILABLE
QUERYOPTION_SLAVEOK
QUERYOPTION_OPLOGREPLAY
QUERYOPTION_NOTIMEOUT
QUERYOPTION_AWAITDATA
QUERYOPTION_EXHAUST
QUERYOPTION_PARTIAL

For details about these options consider MongoDB documentation.

MongoDB Model Properties

MongoDB model properties may be provided via modelProperies in the createDatasource statement.

Importing Collection Fields

  • mongodb.maxColumnsNumber, mongodb.maxRowsNumber 

To represent a non-relation MongoDB collection as a relational table, we need to represent the semi-structured collection rows as structured relational rows. To do this completely, we need to check all the fields in all rows of a MongoDB's collection and to add them as columns to a relation table representing that collection. Since extracting MongoDB's collection metadata in this way could be a very costly operation on big collections, Data Virtuality Server iterates only through the maximal number of rows (defined in mongodb.maxRowsNumber) and adds at most maxColumnsNumber of columns to the resulting relational table.

In addition, the same-named collection field can have a different format in different rows. Data Virtuality Server first takes to declare the column type as it has found it first in the field. If it finds out that the field type has changed, it changes the resulting column type to the "bigger" type if needed.

In general, that a type widening goes from a Numeric type to String to CLOB.  Per default maxColumnsNumber = 200 and maxRowsNumber = 100. *

  • mongodb.maxVarcharColumnWidth 

States, which column length should be used for varchar columns, e.g. sets varchar(mongodb.maxVarcharColumnWidth). The default value is 256.

  • mongodb.overrideStructureByExample

On connecting or refreshing of MongoDB data source, Data Virtuality Server always reads the collections to determine their structure. Sometimes the collections are extremely large, but still need to be read completely, because the latest documents have the latest version of the schema. This operation can take a large amount of time and to decrease it we introduced the mongodb.overrideStructureByExample model property. This property can hold a potentially very large text, which represents a strict mode JSON document containing one or multiple comma-separated examples of the structure of MongoDB collections to be overridden. These documents will be read instead of the documents from the real collections to determine their structure.

Creation and usage of the current property explained in detail in the Usage section of this page.

Nested Objects

  • mongodb.maxRecursionDepth

Nested Objects in a MongoDB's collection row can be represented either as a JSON string contained in the table's column or them as additional fields in the resulting table. This parameter controls, how far inside the particular fields should be analyzed to represent their sub-fields as separate columns in the resulting table. The default value is 0 (output the complete JSON string with no additional columns)

  • mongodb.fieldDelimiterChar 

Additional columns generated for the nested objects have the name of their parent objects as a prefix inside. This parameter controls the character, that is used to separate this prefix from the field name itself. Default is '_'.

Usage

The MongoDB's collections are represented as relational tables in the schema named like the name given in createDatasource()

Select Query

To query these tables a normal SQL select query like the following can be used:

 select * from mongodb.table1
CODE

Stored Procedures

Alternatively, one can use the stored procedure named query to pass a native query inside. This stored procedure has the following parameters:

  • collection - the name of the collection to query 
  • query - the query in the native JSON format
  • columns - the output columns in the JSON format like 
 { '_id' : 1, 'name': 1 }
CODE

          or NULL if all columns should be outputted

  • result - the CLOB containing the results 
 call "mongodb".query('table1','{"city":"berlin"}',NULL)
CODE

XML Functions

In order to parse the JSON string values in a field, XML Functions

  • jsonToXml
  • xpathValue
  • xmlserialize

of the Data Virtuality Server can be used like following:

Examples

select xpathvalue( jsontoxml( 'test', a.result), '/test/test/[2]/firstname' ), xmlserialize( jsontoxml( 'test', a.result ) ), result FROM ( call "mongodb".query( 'table1', '{"city":"berlin"}', NULL) ) as a;
CODE
SELECT cast(xpathvalue(jsontoxml('root',"zipcodes"),'/root/root/[1]') as string) FROM "mongodb"."zip_history_log";
CODE
SELECT cast(jsontoxml('root',"zipcodes") as string) FROM "mongodb"."zip_history_log";
CODE
select * from (exec mongo.query('foo', '{ $query : {},$orderby : {} }','{_id:1,desription:1}')) w, xmltable('/root' passing jsontoxml('root',w.result)) t
CODE

Creation and Usage of the JSON Examples with the mongodb.overrideStructureByExample Property

In order to create JSON example for CollectionA in database test containing such documents:

{ "_id" : ObjectId("57ea8fc806f2c1c32bcc4a11"), "col1" : 10 } { "_id" : ObjectId("57ea8fd106f2c1c32bcc4a12"), "col1" : NumberLong(200) }
JS

next steps should be performed:

  1. Document with the desired structure for the whole collection should be obtained using mongoexport CLI tool or constructed manually using strict mode JSON syntax. For CollectionA the second document can be used, as a data type of "col1" is wider, than the first one. To do it with mongoexport one of the next commands can be used:

    mongoexport --db test --collection CollectionA --query "{'col1': {$type: 'long'}}" mongoexport --db test --collection CollectionA --query "{'col1': 200}"
    POWERSHELL

    Such JSON will be a result:

    {"_id":{"$oid":"57ea8fd106f2c1c32bcc4a12"},"col1":{"$numberLong":"200"}}
    JS
  2. Collection name should be added as a key to the JSON document obtained on the previous step:

    { "CollectionA": {"_id":{"$oid":"57ea8fd106f2c1c32bcc4a12"},"col1":{"$numberLong":"200"}} }
    JS
  3. All double quotes should be escaped with the backslash symbol:

    { \"CollectionA\": {\"_id\":{\"$oid\":\"57ea8fd106f2c1c32bcc4a12\"},\"col1\":{\"$numberLong\":\"200\"}} }
    JS
  4. Resulting document now can be set as a value to the mongodb.overrideStructureByExample property:

    mongodb.overrideStructureByExample=" { \"CollectionA\": {\"_id\":{\"$oid\":\"57ea8fd106f2c1c32bcc4a12\"},\"col1\":{\"$numberLong\":\"200\"}} } "
    JS

    Multiple examples should be separated by a comma:

    mongodb.overrideStructureByExample=" { \"CollectionA\": {\"_id\":{\"$oid\":\"57ea8fd106f2c1c32bcc4a12\"},\"col1\":{\"$numberLong\":\"200\"}}, \"CollectionB\": {\"_id\":{\"$oid\":\"57dfffcf190889d653a5374b\"},\"col2\":{\"$date\":\"2012-12-19T06:01:17.171Z\"}} } "
    JS

The example above can be used to add the data source as follows:

call SYSADMIN.createConnection
(
"name" => 'mongodb',
"jbossCLITemplateName" => 'mongodb',
"connectionOrResourceAdapterProperties" => 'server=localhost,port=27017,database=test'
) ;;

call SYSADMIN.createDatasource
(
"name" => 'mongodb',
"translator" => 'mongodb',
"modelProperties" => 'mongodb.overrideStructureByExample="
	{
		\"CollectionA\": 
						{\"_id\":{\"$oid\":\"57ea8fd106f2c1c32bcc4a12\"},\"col1\":{\"$numberLong\":\"200\"}},
		\"CollectionB\": 
						{\"_id\":{\"$oid\":\"57dfffcf190889d653a5374b\"},\"col2\":{\"$date\":\"2012-12-19T06:01:17.171Z\"}}
	}
"',
"translatorProperties" => ''
) ;;
SQL