How to Connect Data Ex-filtration Protection(DEP) enabled Synapse Workspace to Azure Cosmos DB (For NoSQL)

This blog is going to be an interesting one — we’re diving into how to connect to a NoSQL resource from Azure Synapse pipelines using a Managed Virtual Network (VNet), and how to securely access Azure Cosmos DB for NoSQL via a private endpoint.

What makes this post exciting is that we’re not just talking about assigning the Synapse workspace’s managed identity to Cosmos DB with Contributor or Reader access and calling it a day.

Nope — we’re going deeper!

We’ll explore:

  • What is Data Exfiltration Protection (DEP)?

  • What’s really happening behind the scenes?

  • The difference between the control plane and data plane

  • And how all of this works in Azure Data Factory (ADF)


Alright, let’s get started! 

Here’s a simple diagram showing the overall architecture of a DEP-enabled Synapse workspace. In this setup, pipelines are created within Synapse, and the linked service connects to Azure Cosmos DB for NoSQL using a Managed VNet Integration Runtime via a private endpoint.

Fig1: Overall Data Flow Architecture from Synapse to Azure Cosmos DB For NoSQL

What is DEP?

Data Exfilteration Protection(DEP) is a setting configured while creating synapse workspace where in "Network" settings you get the option as showed in the below picture:-

Fig2: Data Ex-filtration Protection(Setup)
                                                               
Based on Fig2, you can anticipate that when you enable the Managed Virtual Network , a particular option "Allow outbound traffic only to approved targets" when selected as "Yes" creates DEP layer where the synapse workspace is under Managed VNET and any data from the synapse workspace cannot go out without the desired permissions granted to the azure resources present on the receiving end.
Data Ex-filtration Protection prevents any data leakage from synapse to any external service or azure resources, this is mainly done for data security and strict data movement policies.

Lets understand some more concepts:-

Control Plane

Control Plane is a management layer that is responsible for the creation/updation/deletion of resources, managing and configuring RBAC roles alongside configurations settings for each resource.
This plane is responsible for making the orchestration and logically telling the azure "how & what to do"
For example- In ADF, if you are creating a data factory along with objects such as pipelines, triggers etc., all of these activities occurs in the control plane. 

Data Plane

Data plane is the layer responsible for actual data movement from the source to the destination. For example- Querying any cloud database or synapse warehouse ,data movement from source to destination which is configured in copy activity.  

Use Case

Both (control & data)plane(s) can be described with a use-case scenario. Let's say you have created a synapse workspace as this happens the arm template is generated for that resource and send to resource provider after authentication to deploy that resource, all of that happens in the control plane . However , any actual data movement from a source to destination configured via copy activity or any other activity in synapse pipelines or via any other azure migration tool happens in data plane.


Now coming back to the track, lets see how exactly we can create a synapse workspace where DEP is enabled and achieve a successful "test connection" while creating a linked service connecting to a container of Azure Cosmos DB For NoSQL.


Please follow the steps as mentioned below:-

Step-1:    Create a "Azure Cosmos DB For NoSQL" Resource

To create a Azure Cosmos DB For NoSQL:-

Search for Cosmos DB. Click on "Create" and choose Azure Cosmos DB for NoSQL. Thereafter, configure the name, capacity, region settings etc. for the resource as shown in Fig3.

Fig3: Initial Configuration of Azure Cosmos DB For NoSQL

From Fig3 we can see the name of our Azure Cosmos DB For NoSQL is "testingdep" and we have chosen the capacity as "Serverless" as we don't want our billing to be on hourly basis. 

Fig4: Network Settings For Azure Cosmos DB For NoSQL

Review & Create the resource.

Step-2:    Create a Synapse Workspace(DEP Enabled)

The name of synapse workspace we aim to created is "testsynapsedep". In the Security Section, provide the user login and password that you want to set up for authentication, Refer-Fig5.

Fig5: Security Section While Creating Synapse Workspace

While creating a synapse workspace make sure that in the network settings, keep "Managed Virtual Network" as enabled and "Allow Outbound Traffic only to approved targets" as "Enabled".

Disable the public access so that Synapse workspace can only be accessed from private endpoints. Please refer to Fig6.     

Fig6: Network Section For Synapse Workspace(Creation)

After creating the synapse workspace the right way for the demonstration you can find whether or not DEP is enabled by going to the "Overview" section after creating the resource.

Fig7: Final Synapse Workspace View

From here onwards, we need permissions on both levels(control plane & data plane). When DEP is enabled , key-based authentication is locked and connection to Azure Cosmos DB For NoSQL is only possible via Managed Identity or User Based Identity. Henceforth, Managed VNET IR connecting via private end point to the Azure Cosmos DB For NoSQL authenticates based on system managed identity and permissions such as "Cosmos DB Account Reader" role only provide access at the control plane which checks whether the account and resource exist at specific region or not. But to access the metadata of the container created inside the Cosmos DB For NoSQL account, we need separate permission at the data plane layer.

Please refer to the below image to see the overall pictorial view of how exactly does the request flow looks like:-

Fig:8 Overall Request Flow From Synapse to Cosmos DB For NoSQL

The above picture Fig8, can be summarized in the below points that would help you understand the overall diagram in a comprehensive way.

  • The moment you perform a test connection to the linked service for Azure NoSQL For Cosmos DB via Managed Identity and using Managed VNET IR, the request goes to Azure Active Directory (AAD). The request goes via public internet to the public endpoint of AAD. This step is highlighted with (1) in Fig8.

  • Azure Active Directory(AAD) authenticates whether or not the managed identity(synapse) is a valid user, if authentication is successful then AAD sends the access token containing the ID of managed identity and reference resource called audience(in this case-cosmos db for nosql). The access token is issued separately for control plane and data plane.
  • The access token received by Synapse for control plane is send to the Azure Resource Manager(ARM) - which checks whether the managed identity which is specified in access token has access to the metadata-resource at "Resource" Level such as permissions to list the objects present in Cosmos DB Account , its databases, containers etc. This step is highlighted with (2) in Fig8.
  • Access token for data plane goes to the Cosmos DB For NoSQL Private Endpoint via private link. The Cosmos DB For NoSQL check the permissions of system managed identity presented by access token on metadata of containers which is at "data level" handled under data plane. This step is highlighted with (3) in Fig8.
  • Once the services (ARM & Cosmos DB For NoSQL) checks the permissions on the roles assigned in control & data plane respectively to the system managed identity for synapse and finds adequate access , it shows "successful" in the test connection of linked service. But if any of the authorization checks are failed at any plane, then the test connection fails for the linked service.
Now that we got the understanding of how exactly happens under the hood, lets continue with our steps to see it practically.

Step3: Create an container in Cosmos DB For NoSQL

Lets recall from Step1, we created a Cosmos DB For NoSQL account named "testingdep". Now we need to create a database and inside that create an container. For this, go to "Data Explorer" section in the "testingdep" account, and click on "new container". You will get option to provide database name and container name. For this blog post demonstration i have kept the container name as "testcontainer" and partition key as "/id".

Hence we can conclude that we created following configurations for Azure Cosmos DB For NoSQL:-

1. Account Name: testingdep
2. Database name: testdb
3. Container name: testcontainer
4. Partition Key: /id

Please find the below image for reference:-
Fig9: Setting Configuration For Container

Step 4: Assign RBAC Role to Managed Identity(Synapse) to connect the resource(Cosmos DB) with desired permissions in control plane

Now lets provide the "Cosmos DB Account Reader" Role to Cosmos DB which would be control plane level access as this role doesn't allow system managed identity of synapse to read metadata of container. However, it allows managed identity to list resource-level metadata such as databases, containers etc. and not the data level access.

To fetch the managed identity object ID, go to the synapse workspace(name in my case- "testsynapsedep" as stated in step 2). On the overview section, you will see "Managed Object Identity" which is your managed identity for synapse. Please refer Fig10 below:-

Fig10: Managed Identity For Synapse

Follow the below steps to assign role of this managed identity object to Cosmos DB For NoSQL.
Steps:-

  1. Go to the azure portal and search for the Cosmos DB For NoSQL Account(testingdep) ,then go to the IAM section. Click on "Add new assignment" and search for the role as "Cosmos DB Account Reader" Role.
  2. On the member's section, select the option of "Managed Identity". In the dropdown menu options you will see "Synapse Workspace". Select your synapse workspace and then click on your workspace. Finally ,click on "Review+Assign".
The image would look something like Fig11:-

Fig11: Role Assignment -Cosmos DB Account Reader Role to Managed Identity(Synapse)


After you have assigned the role to the Managed Identity of Synapse in Cosmos DB For NoSQL account. Let's go and check if we are able to create a linked service for the same.

Step 5: Create a Linked Service For Cosmos DB For NoSQL

Go to the azure portal and search for Synapse. Open the "Synapse Studio". Then go to the linked service section and create the linked service for Cosmos DB For NoSQL.

Now we have to do some configurations before we can do the test connection. As soon as you click on "Azure Cosmos DB For NoSQL" in linked service, you will the following image:-

Fig12: Interactive Authoring Enabling For Test Connection

Click on the pencil icon presented on Fig12, you will get the option to enable interactive authoring. This is required because without enabling that we cannot perform test connection. 

Now when "Interactive Authoring" is enabled, please choose "System Assigned Managed Identity" in authentication type and then we need to create a "Managed Private Endpoint" on type "SQL" as presented in the below image:-

Fig13: Configuration Settings For Linked Service

Based on Fig13, you can see the option to create new on type "SQL". Click on that "Create new" option as we have only created database and no enabled analytical workspace in Cosmos DB For NoSQL so we only need to create managed private endpoint for SQL. 

Please refer the below image:-

Fig14: Configuration For Managed Private Endpoint

 As you can see from Fig14, we are providing a name to the private end point and target resource id also shows our Cosmos DB For NoSQL account name ending with "/testing dep". Finally click on "Create" option to create the managed private endpoint.

If you face this error:-

Fig15: Error while creating Managed Private Endpoint

Then click on "cancel". Go to the synapse workspace and go to the networking section. Enable the "Public Network Access To Workspace Endpoints" to enabled and add client IP address.

Fig16: Network Setting Changes In Synapse


Now try creating the managed private endpoint again. It will allow you to do the same. The image would look like the one below:-

Fig17: Final Settings Linked Service(Synapse)

Once you create the managed private endpoint, click on the pencil icon as shown in Fig16. You will see a link called "Manage Approvals In Azure Portal" to go to the specific target resource, just click on the link. It will redirect you to Cosmos DB For NoSQL account "testingdep". Go to the "Private Endpoint" section, you will see the name of your private endpoint created in synapse. Just "Approve" the same.

Fig18: Private Endpoint Approval In Cosmos DB for NoSQL

Once you have the entire setup ready, try doing the test connection. It will definitely
fail with the following error:-

Fig19: Test Connection Failure For Linked Service

This test connection failed with an error presented in Fig19 where you can clearly see that the managed identity object of synapse doesn't have permission to read metdata on resource -"Microsoft.DocumentDB/databaseAccounts/readMetadata"

This error is bound to occur because we have not provided the access to managed identity of synapse at data plane. We only provided the access at control plane yet by giving it "Cosmos DB Account Reader" role which allows it to access metadata at resource level and not at "data" level.

Moreover , DEP enabled in synapse needs permissions both at control & data planes respectively.

Step 5: Provide permission to managed identity(synapse) at data plane

Please refer to the documentation below to get the source of the codes- Data Plane Access Control.

Go to the Azure CLI, and paste the following code template, make sure to replace the entities based on your azure tenant.

Code:-

az cosmosdb sql role definition list --resource-group "<name-of-existing-resource-group>" --account-name "<name-of-existing-nosql-account>"

In my case the name of my no-sql account is "testingdep" and resource group : "first". 

The output of this command will show the roles available. We would go for "Cosmos DB Build-IN Data Contributor Role" which has id & name respectively, 

 "id": "/subscriptions/XXXXXXXXXXXXXXXX/resourceGroups/first/providers/Microsoft.DocumentDB/databaseAccounts/testingdep/sqlRoleDefinitions/00000000-0000-0000-0000-000000000002",

    "name": "00000000-0000-0000-0000-000000000002",
    "permissions": [
      {
        "dataActions": [
          "Microsoft.DocumentDB/databaseAccounts/readMetadata",
          "Microsoft.DocumentDB/databaseAccounts/sqlDatabases/containers/*",
          "Microsoft.DocumentDB/databaseAccounts/sqlDatabases/containers/items/*"
        ]

Note: Copy it in a notepad and save it as it will be used to paste it in "Role Definition ID" in future.

Fig20: Role Definition In Cosmos DB For NoSQL

Next, type the following command :-

az cosmosdb show --resource-group "<name-of-existing-resource-group>" --name "<name-of-existing-nosql-account>" --query "{id:id}"

Again just change the placeholders of resource group and no-sql account. My resource group and no-sql account name is "first" & "testingdep" respectively. Don't do anything to {id:id}.

Fig21: Unique Identifier For Cosmos DB For NoSQL


Note: Copy it in a notepad and save it as it will be used to paste it in "Scope Definition" in the next code.

Now, type the following command:-

az cosmosdb sql role assignment create --resource-group "<name-of-existing-resource-group>" --account-name "<name-of-existing-nosql-account>" --role-definition-id "<id-of-new-role-definition>" --principal-id "<id-of-existing-identity>" --scope "/subscriptions/XXXXXXXXX/resourceGroups/first/providers/Microsoft.DocumentDB/databaseAccounts/testingdep"


In the above code, in place of <id-of-new-role-definition> give the role-definition id you received from the output mentioned in  Fig-20. The glimpse of the id is mentioned below:-

/subscriptions/XXXXXXXXXX/resourceGroups/first/providers/Microsoft.DocumentDB/databaseAccounts/testingdep/sqlRoleDefinitions/00000000-0000-0000-0000-000000000002


In case of <principal-id> , just go to your synapse workspace in the "Overview" section, you find the "managed identity object" which is the value for "principal-id". 

Finally in-place of --scope, paste the id you have copied from the previous linux command from Fig-21. The glimpse of it is mentioned below:-

"/subscriptions/XXXXXXXX/resourceGroups/first/providers/Microsoft.DocumentDB/databaseAccounts/testingdep"


The overall code formation would look like the following:-

Fig22: Role Creation For Managed Identity In Cosmos DB For NoSQL

Now to verify whether the role has been created for managed identity(synapse) for "Build-IN Data Contributor Role" on data plane, you can be checked it by running the following code on your Azure CLI, it will provide you the role assigned to the managed identity in Azure Cosmos DB For NoSQL:-

az cosmosdb sql role assignment list --resource-group "<name-of-existing-resource-group>" --account-name "<name-of-existing-nosql-account>"

Step 7: Test the connection in Linked Service

Since we gave the permissions both at control plane("Cosmos DB Account Reader" role) & data plane ("Build-In Data Contributor") role.

Let's check the test connection for the linked service we have configured.

Fig23: Test Connection Successful For Linked Service

Conclusion

Since Synapse workspace when data ex-filteration protection networking setting is enabled, it gets tricky to get the test connection successful. This can get really messy when we don't know what exactly happens behind the scene. The managed identity(synapse) needs permissions at both data & control plane respectively and we saw what would happen if we miss providing permissions for any one of them.
I hope you like this blog. Thankyou. Happy Reading!


- MANAN CHOUDHARY























Comments

Popular posts from this blog

How to Set Up an SFTP Server and Seamlessly Connect It to Azure Data Factory

How to Configure Email Notifications For ADF Pipeline Runs Using Logic Apps