CosmosDB-Spark Connector Library Conflict

This topic explains how to resolve an issue running applications that use the CosmosDB-Spark connector in the Azure Databricks environment.

Affected versions

Databricks Runtime 4.0 and above (runtimes that include Spark 2.3).

Problem

Normally if you add a Maven dependency to your Spark cluster, your app should be able to use the required connector libraries. But currently, if you simply specify the CosmosDB-Spark connector’s Maven co-ordinates as a dependency for the cluster, you will get the following exception:

java.lang.NoClassDefFoundError: Could not initialize class com.microsoft.azure.cosmosdb.Document

Cause

This occurs because Spark 2.3 uses jackson-databind-2.6.7.1, whereas the CosmosDB-Spark connector uses jackson-databind-2.9.5. This creates a library conflict, and at the executor level you observe the following exception:

java.lang.NoSuchFieldError: ALLOW_TRAILING_COMMA
at com.microsoft.azure.cosmosdb.internal.Utils.<clinit>(Utils.java:69)

Solution

To avoid this problem:

  1. Directly download the CosmosDB-Spark connector Uber JAR: azure-cosmosdb-spark_2.3.0_2.11-1.2.2-uber.jar.
  2. Upload the downloaded JAR to Azure Databricks following the instructions in Install the uploaded libraries.
  3. Install the uploaded libraries to your Azure Databricks cluster.

For more information, see Azure Cosmos DB.