Tuesday, May 27, 2014

Downloading Files from SFTP


Normally, if we want to download FTP files from SSIS, we have FTP task through which it can be easily done. But, if we have to download files from SFTP(Secured FTP), then the regular FTP files don’t work

We need a third-party tool to connect to SFTP. Normally winscp is used which is a Free Ware.

Usual Development Scenario:

Normally while coding, we open SFTP using winscp to check if all files are there. For the first time, when we open SFTP, it will prompt a message whether to cache it’ key. When we submit YES button, from then on it goes on smoothly.

Caching this host key is user specific. So when the SSIS package which connects to SFTP using this winscp, will be the same credential which had opened winscp manually. So there won’t be any issues while developing the ETL and the running the package using BIDS.

Problem:

Normally there will be an error when this SSIS which connects to SFTP is run as SQL job

Reason

Usually SQL job will run the job using a specific ID like “Service Account”. For this SQL Service Account, the host key for SFTP was not cached. So there will be an error as below

The server's host key was not found in the cache.

Along with this error, usually if debug we get even the host key which was missing

-----------------------------------------------------------------------------------------------

The server's host key was not found in the cache.
You have no guarantee that the server is the computer you think it is. 
The server's dss key fingerprint is:  ssh-dss 1024 c1:70:cb:3b:27:16:10:ed:a5:c8:e5:ba:8f:1b:ec:ad 
If you trust this host, press Yes.
To connect without adding host key to the cache, press No.
To abandon the connection press Cancel. 
Continue connecting and add host key to the cache?  (Y)es, (N)o, C(a)ncel, (C)opy Key: Cancel  Host key wasn't verified!  Authentication failed.

 

Solution

There are 2 solutions for this

1.       Include the hostkey while connecting to SFTP as below

ConnectString = "open " + FtpType.ToLower() + "://" + Username + ":" + Password + "@" + Host + ":" + Port + " -hostkey=" + """ssh-dss 1024 c1:70:cb:3b:27:16:10:ed:a5:c8:e5:ba:8f:1b:ec:ad"""


2.       Other option is to login to the system using SQL Service Account and connect to SFTP and cache the host key. This can be done only by the Windows IT team.

 

 

 

 

 

Process all files, even when single file is corrupt


Usually in ETL, we get requirements to process all files from a particular folder and load them to database. If one of the file is corrupt, we will be asked to move the file to Reject Folder and send an email. But continue to process other files from the source folder.




Default SSIS behavior is such that, even if a single error occurs, the package execution is stopped.
So, to continue even if a single file is failed to process, below steps needs to be followed:
Go to Event Handler Section for that Data Flow Task
Go to Variables Section and select system variables. Select the variable Propogate and set is FALSE
Exception:
Some times for excel file, we may have to do some additional settings along with the above Propogate
Either of the below approach will work fine
·         Set the property, validate metadata at the source to false
·         Set the MaxErrorCount in ForEachLoop to 1000

ScriptTask does not open in Debug Mode when BreakPoint is applied


Some Times in SSIS, we apply break points in script tasks to debug. But to our surprise, we won’t be able to see the script tasks in debug mode.

If we face such issues, we need to go into the solution properties, select “Debugging” and check Run64BitRuntime to True