How to Set Up an SFTP Server and Seamlessly Connect It to Azure Data Factory


 Introduction 

This blog is about the process of setting up SFTP server and then connecting to linked service in Azure Data Factory successfully.

Prerequisites:-

1. You already have create a Linux machine from Azure Portal.
2. You have already created a data factory. 

IMP Note:-
**Throughout the demonstration to establish SFTP server & connecting to ADF, port 22 should be allowed as an inbound rule in NSG configured for linux vm.


The flow of the blog is such that we will first setup a new user named 'sftpuser' and restrict the permissions for access such that it is the only user for authentication to the SFTP server. Then we will connect to SFTP server via ADF Linked Service. 

Unless explicitly stated otherwise in a step, all commands are to be executed by the user account created during the Linux machine deployment.


Step-By-Step Process:-

1. Login to your virtual machine

I am using gitbash, you can use any external tool as well such as Putty. The below commands from this step onwards will be gitbash commands:-

$ssh {user_name_you_provided_while_creating_vm}@{public_ip_address_of_your_vm}

For ex- $ ssh manan@74.XXX.XXX.176
Fig1: Login to your linux vm

We can see from the Fig1, at the bottom that our user name is "manan" and name of virtual machine is "abcd21"

2. Install SFTP Server & Verify

Run the following commands to install SFTP server in linux machine and check if it has been installed

$sudo apt install openssh-server -y
$sudo systemctl status ssh

If the SFTP server has been set up correctly, it should appear as shown in the image below:-

Fig2: SFTP Server Status Check


3. Create a SFTP User

Run the following command and provide the password and keep pressing "Enter" till you get to the next command line.

$sudo adduser sftpuser (Paste this code as it is)

Give password to that user.

We are doing this step to securely allow only this user to connect to adf(Azure Data Factory) and sftp files. 

Now, copy and paste the below command to create a directory called ".ssh" 

$sudo mkdir -p /home/sftpuser/.ssh (Paste this code as it is)

This will create a directory .ssh on which we will create a file called "authorized_keys" which will contain the public keyand that will be used for authentication while testing connection in ADF linked service.

4. Generate Key Pairs Using RSA algorithm

Note: The below commands for this step should be run on local machine and not on the linux machine.

In latest linux-version, by default key-host algorithm used is ED25519 but for SSH public key authentication we will use RSA algorithm because it is highly supported by as it can be converted to PEM format easily. Keeping that in mind we will generate a key in RSA format for SSH pubic key authentication.

To generate public & private key, go to your local machine and using gitbash only, run the following command. The below command will generate the public & private key in /.ssh folder for your local machine user in C drive at this file path- C:\Users\{your_user_name_of_local_machine}\.ssh .

$ssh-keygen -t rsa -b 2048 -f ~/.ssh/id_rsa

$ssh-keygen -p -m PEM -f ~/.ssh/id_rsa (If it ask you whether you want to generate passphrase, just click on Enter button to skip it)

The first command will generate RSA based public & private keys in /.ssh directory in Local C drive and second command will convert the private key into PEM format which is mandatory for connecting to SFTP linked service in ADF.

You can check the content of public key by the following commands:-

$cd C:/Users/{your_user_name_of_local_machine}/.ssh
$ls -l
$cat id_rsa.pub (for Public key)
$cat id_rsa (for private key)


The compatible format which is PEM format looks for public & private key like the one in the below image:-

Fig3: The Desired Format For Pubic and Private Keys

Always make sure that your private key should always start and end with "-----BEGIN RSA/DSA PRIVATE KEY-----" format (which is a PEM format) and this can also be found the documentation-

https://learn.microsoft.com/en-us/azure/data-factory/connector-sftp?tabs=data-factory

5. Paste the public key in SFTP user file directory

This can be accomplished by the following commands which you need to run after login in back to your linux vm using the user credentials used while creating that virtual machine.

$sudo nano /home/sftpuser/.ssh/authorized_keys (Then put the public key in it.)

The above code will open a file editor in linux screen, paste the code which got generated from the cat command shown in Fig-3 inside the file editor and then press "Ctrl +X" -> "Ctrl +Y" (to save) and then press "Enter"


6. Provide File & Directory Level Permissions To Ensure Security

Run the following commands:-

$sudo chown -R sftpuser:sftpuser /home/sftpuser/.ssh
$sudo chmod 700 /home/sftpuser/.ssh
$sudo chmod 600 /home/sftpuser/.ssh/authorized_keys


7. Change the configuration settings to allow permissions for SSH public key authentication

Run the following command:-
$ sudo nano /etc/ssh/sshd_config 

Now uncomment the following lines:-

#PubkeyAuthentication yes
#AuthorizedKeysFile     .ssh/authorized_keys .ssh/authorized_keys2

Now add the following line in the end:-

PubkeyAcceptedKeyTypes +ssh-rsa

Fig4: SSH Config File


Fig5: Added Line In SSH Config File



Finally, just click on "Ctrl+X"->"Ctrl+Y"-> Press "Enter"

It's important to restart so that the changes can be saved, it can be done using the following command:-

$sudo systemctl restart ssh



8.  Check the test connection in ADF linked service

Go to the data factory, and create a new linked service. Search for SFTP and click "Create".

Host : Provide the public ip address of your linux vm.
SSH Host Key Validation: Disbled
Authentication Type: SSH Public Key Authentication
Username: sftpuser

Now to provide the private key, navigate in your local machine in C-drive under Users>{your accountname}>/.ssh>id_rsa, open it in a notepad.

Copy the content and paste it in base64encode, and copy the encoded part to the "Private Content" option in SFTP linked service you are creating.

Finally, click on the test connection.




Fig6: Test Connection In ADF


Additional Changes For SSH Key Host Validation

This section of the blog is not necessary for you to read but someone who wants the server to be validated before authenticating the public key can follow the steps below:-

Just Enable SSH Host Validation Option in SFTP linked service and provide any random values in SSH Host key fingerprint option.

Fig7: Error Message On Host Key Validation

So it is clear that the host key is using Ed25519 key-algorithm for server from the error message, hence we have to navigate back to the linux machine signed in with the user created while deploying this linux machine, and modify the config file using the following command-

$sudo nano /etc/ssh/sshd_config

After seeing the file editor mode uncomment the following line:-
#HostKey /etc/ssh/ssh_host_ed25519_key

Fig8: Modifications In SSH Config File Inside Linux Vm

Then press "Ctrl+X"->"Ctrl+Y"->"Enter" button

Finally run this command:-

$sudo systemctl restart ssh

Let's now head back to the SFTP linked service:-

From Fig:7 you can see that the "real finger-print is " part is the actual value we must insert in that section of "SSH host key validation". But copy this error message in the notepad and from there copy this key 'ssh-ed25519 321a:31.....'  .Then paste it in the SSH host key fingerprint option, and test connection. 


Fig9: Test Connection With SSH host validation enabled


Conclusion-

The steps described in this blog has been tried and tested with multiple permutations & combinations and I know its hard to debug the issue while setting up SFTP server and testing the connection in SFTP linked service. 

I hope you will be able to follow the steps with minimal resistance and I assure you by the following the steps described above, you will be able to achieve the test connection from ADF linked service to SFTP server.

Happy Reading!




Comments

Popular posts from this blog

How to Connect Data Ex-filtration Protection(DEP) enabled Synapse Workspace to Azure Cosmos DB (For NoSQL)

How to Configure Email Notifications For ADF Pipeline Runs Using Logic Apps