Thursday, August 7, 2014

Loop Multiple Kind of Files using Regular Expression


In SSIS, we normally process multiple files in a folder using ForEach Loop Container. This container has a wildcard filter, where we can specify the type of files to be processed.

But there can be cases, where folder may contain multiple kinds of files and we have to process 2 or 3 kinds of files from them. During such cases, single wildcard filter won’t help us out.

Example for the above scenario:

Some time back we had a requirement, to pull 6 set of FTP files daily. All these files were in zipped format.

All zipped files (except one), were in “.csv” format and only one file was in “.txt” format.

Now, directly if we unzip and process files using ForEach, it will try to load the data even from zipped files to a table. But this will throw an error. To resolve this we have 3 options:

1.       Move .zip files to some other folder using another ForEach

2.       Create 2 separate ForEach, one for .csv and another for .txt

3.       Next option is to ignore the .zip files and process only .txt and .csv files

 Here is a step-by-step process on how the third option can be achieved

Suppose, we had to process all the files which had a name “SSIS” in it and all were in “.csv” format:

Inside ForEach Loop, under Files section we would have given *SSIS*.csv, something as below:



 

But now along with .csv file we need to process”.txt” file which has a word SSIS in its filename.

i.e., we need to process both “.txt” and “.csv” files which has a word SSIS in its filename

To loop such multiple kinds of files, we can use regular expressions. Detail on how it can be used in SSIS is given below

Create 2 variables

1.       RegExFilter: Data Type String; Value = ^.*\.(txt|csv)$

2.       PassesRegexFilter: Data Type Boolean;

 

Using above 2 variables, we will check if the file which will be processed is a csv or txt file using a Script Task.

Drag a Script task inside ForEachLoop container and add the below code( C# code )

/*

   Microsoft SQL Server Integration Services Script Task

   Write scripts using Microsoft Visual C# 2008.

   The ScriptMain is the entry point class of the script.

*/

 

using System;

using System.Data;

using Microsoft.SqlServer.Dts.Runtime;

using System.Windows.Forms;

using System.Text.RegularExpressions;   // Added

using System.IO;                        // Added

 

 

namespace ST_8102847e64a74fb5ac4b523cfa570381.csproj

{

    [System.AddIn.AddIn("ScriptMain", Version = "1.0", Publisher = "", Description = "")]

    public partial class ScriptMain : Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase

    {

 

        #region VSTA generated code

        enum ScriptResults

        {

            Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success,

            Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure

        };

        #endregion

 

        /*

            The execution engine calls this method when the task executes.

            To access the object model, use the Dts property. Connections, variables, events,

            and logging features are available as members of the Dts property as shown in the following examples.

 

            To reference a variable, call Dts.Variables["MyCaseSensitiveVariableName"].Value;

            To post a log entry, call Dts.Log("This is my log text", 999, null);

            To fire an event, call Dts.Events.FireInformation(99, "test", "hit the help message", "", 0, true);

 

            To use the connections collection use something like the following:

            ConnectionManager cm = Dts.Connections.Add("OLEDB");

            cm.ConnectionString = "Data Source=localhost;Initial Catalog=AdventureWorks;Provider=SQLNCLI10;Integrated Security=SSPI;Auto Translate=False;";

 

            Before returning from this method, set the value of Dts.TaskResult to indicate success or failure.

           

            To open Help, press F1.

      */

 

 

        public void Main()

        {

            // Get the filename from the complete filepath

            String FileName = Path.GetFileName(Dts.Variables["User::ImportFile"].Value.ToString());

 

            // Create a regex object with the pattern from the SSIS variable

            Regex FileFilterRegex = new Regex(Dts.Variables["User::RegExFilter"].Value.ToString());

 

            // Check if it is match and return that value (boolean) to the SSIS variable

            Dts.Variables["User::PassesRegexFilter"].Value = FileFilterRegex.IsMatch(FileName);

 

            Dts.TaskResult = (int)ScriptResults.Success;

 

        }

    }

}

This code will check if the file is txt /csv. If yes then the variable PassesRegexFilter  is set to true.

After this task, add other logic to process the file. Prior to these task logics, add a script task and connect them using an expression @[User::PassesRegexFilter] == True as below

 

The ForEach Loop looks something as below



 

Reference:



 

Tuesday, May 27, 2014

Downloading Files from SFTP


Normally, if we want to download FTP files from SSIS, we have FTP task through which it can be easily done. But, if we have to download files from SFTP(Secured FTP), then the regular FTP files don’t work

We need a third-party tool to connect to SFTP. Normally winscp is used which is a Free Ware.

Usual Development Scenario:

Normally while coding, we open SFTP using winscp to check if all files are there. For the first time, when we open SFTP, it will prompt a message whether to cache it’ key. When we submit YES button, from then on it goes on smoothly.

Caching this host key is user specific. So when the SSIS package which connects to SFTP using this winscp, will be the same credential which had opened winscp manually. So there won’t be any issues while developing the ETL and the running the package using BIDS.

Problem:

Normally there will be an error when this SSIS which connects to SFTP is run as SQL job

Reason

Usually SQL job will run the job using a specific ID like “Service Account”. For this SQL Service Account, the host key for SFTP was not cached. So there will be an error as below

The server's host key was not found in the cache.

Along with this error, usually if debug we get even the host key which was missing

-----------------------------------------------------------------------------------------------

The server's host key was not found in the cache.
You have no guarantee that the server is the computer you think it is. 
The server's dss key fingerprint is:  ssh-dss 1024 c1:70:cb:3b:27:16:10:ed:a5:c8:e5:ba:8f:1b:ec:ad 
If you trust this host, press Yes.
To connect without adding host key to the cache, press No.
To abandon the connection press Cancel. 
Continue connecting and add host key to the cache?  (Y)es, (N)o, C(a)ncel, (C)opy Key: Cancel  Host key wasn't verified!  Authentication failed.

 

Solution

There are 2 solutions for this

1.       Include the hostkey while connecting to SFTP as below

ConnectString = "open " + FtpType.ToLower() + "://" + Username + ":" + Password + "@" + Host + ":" + Port + " -hostkey=" + """ssh-dss 1024 c1:70:cb:3b:27:16:10:ed:a5:c8:e5:ba:8f:1b:ec:ad"""


2.       Other option is to login to the system using SQL Service Account and connect to SFTP and cache the host key. This can be done only by the Windows IT team.

 

 

 

 

 

Process all files, even when single file is corrupt


Usually in ETL, we get requirements to process all files from a particular folder and load them to database. If one of the file is corrupt, we will be asked to move the file to Reject Folder and send an email. But continue to process other files from the source folder.




Default SSIS behavior is such that, even if a single error occurs, the package execution is stopped.
So, to continue even if a single file is failed to process, below steps needs to be followed:
Go to Event Handler Section for that Data Flow Task
Go to Variables Section and select system variables. Select the variable Propogate and set is FALSE
Exception:
Some times for excel file, we may have to do some additional settings along with the above Propogate
Either of the below approach will work fine
·         Set the property, validate metadata at the source to false
·         Set the MaxErrorCount in ForEachLoop to 1000

ScriptTask does not open in Debug Mode when BreakPoint is applied


Some Times in SSIS, we apply break points in script tasks to debug. But to our surprise, we won’t be able to see the script tasks in debug mode.

If we face such issues, we need to go into the solution properties, select “Debugging” and check Run64BitRuntime to True

Friday, April 5, 2013

Changing the font color/background color of a row dynamically while sending mails from SQL Server

Normally if we like to track our data from the DB on a daily basis, either we generate some reports using SSRS and schedule the reports to specific mail ID, or we write some queries and send the summary data as a report in a mail using sp_send_dbmail in an HTML format.

If there are some specfic values which we are tracking on a daily basis, and if we want to highlight the particular row with specific color based on some conditions, it is easy to set the color dynamically in SSRS.

To achieve the same using sp_send_dbmail , we have to write SQL queries which generate such kind of XMLs.

Consider the below example. Suppose we have a table which has data as below

place
Temperature
Sirsi
30
Bangalore
35
Hyderabad
45
Mumbai
41


We have to highlight the data for which the temperature is more than 40 degrees. We can highlight either by making it as bold/changing background color/changing font color

Dynamic Background Color Change

To dynamically change background color we need xml as given below:
<tr bgcolor = "red">
       <td>Hyderabad<td>
       <td>45</td></tr>
<tr bgcolor = "white">
        <td>Sirsi<td>
        <td>30</td></tr>

Note: This is just a sample xml

We can generate such XML using below query
SELECT
         CASE WHEN temperature > 40 THEN '#F78181' 
                    ELSE 'white'
             END AS [@bgcolor],
         td = place, '',
         td = temperature, ''
FROM test_fontcolor
FOR XML PATH('tr')



To send an email use the below format

declare @tableHTML nvarchar(max)
SET @tableHTML =
      N'<H1>Test Data</H1>' +
      N'<H4>TEST_sub_header</H4>' +
      N'<H4>TEST_header_1</H4>' +
      N'<table border="1">' +
      N'<tr><th>place</th>
      <th>Temperature</th>' +
      N'</tr>' +
      CAST ( (
        SELECT
         CASE WHEN temperature > 40 THEN '#F78181' 
                    ELSE 'white'
             END AS [@bgcolor],
         td = place, '',
         td = temperature, ''
        FROM test_fontcolor
        FOR XML PATH('tr')
      ) AS NVARCHAR(MAX) ) +
      N'</table>' ;
   
EXEC msdb.dbo.sp_send_dbmail
        --@profile_name='WP Store BI Notifications',
     @recipients='v-raheg@microsoft.com',
     @from_address = 'storenote@microsoft.com',
     @subject = 'Test Background Color',
     @body = @tableHTML,
     @body_format = 'HTML' ;

The output of would be  as below:

Test Data
TEST_sub_header
TEST_header_1
place
Temperature
Sirsi
30
Bangalore
35
Hyderabad
45
Mumbai
41




Dynamaic Font Color Change

To change the color of the font dynamically the XML should look like as below:
<tr>
   <font color = "red">
         <td> Hyderabad </td>
         <td> 45 </td>
   </font>
</tr>
<tr>
   <font color = "black">
         <td> Sirsi</td>
         <td> 30 </td>
   </font>
</tr>


SQL query to generate XML as above would look like :

SELECT
      CASE WHEN Temperature>40 THEN 'red'
            ELSE 'black'
      END  AS "font/@color",
      place  as "font/td",'',
      CASE WHEN Temperature>40 THEN 'red'
            ELSE 'black'
      END  AS "font/@color",
    Temperature as "font/td"
FROM
      test_fontcolor
ORDER BY place
FOR XML PATH('tr')

NOTE: Above query does not generate XML exactly like above. For every td tag there would be one font tag. It will generate like:
<tr>
   <font color = "red">
         <td> Hyderabad </td>
   </font>
  <font color = "red">
         <td> 45 </td>
   </font>
</tr>

To send an email use the below code:

declare @tableHTML nvarchar(max)
SET @tableHTML =
                N'<H1>Test Data</H1>' +
                N'<H4>TEST_sub_header</H4>' +
                N'<H4>TEST_header_1</H4>' +
                N'<table border="1">' +
                N'<tr><th>place</th><th>Temperature</th>' +
                N'</tr>' +
                CAST ( ( SELECT
                                          CASE WHEN Temperature>40 THEN 'red'
                                                ELSE 'black'
                                          END  AS "font/@color",
                                          place  as "font/td",'',
                                          CASE WHEN Temperature>40 THEN 'red'
                                                ELSE 'black'
                                          END  AS "font/@color",
                                          Temperature as "font/td"
                                    FROM
                                          test_fontcolor
                                    ORDER BY place
                                    FOR XML PATH('tr')
                ) AS NVARCHAR(MAX) ) +
                N'</table>' ;  
                               
EXEC msdb.dbo.sp_send_dbmail
     --@profile_name='WP Store BI Notifications',
       @recipients='v-raheg@microsoft.com',
      @from_address = 'storenote@microsoft.com',
       @subject = 'Test Font Color',
       @body = @tableHTML,
       @body_format = 'HTML' ;

The out put would be :

Test Data
TEST_sub_header
TEST_header_1
place
Temperature
Sirsi
30
Bangalore
35
Hyderabad
45
Mumbai
41