Thursday, August 7, 2014

Loop Multiple Kind of Files using Regular Expression


In SSIS, we normally process multiple files in a folder using ForEach Loop Container. This container has a wildcard filter, where we can specify the type of files to be processed.

But there can be cases, where folder may contain multiple kinds of files and we have to process 2 or 3 kinds of files from them. During such cases, single wildcard filter won’t help us out.

Example for the above scenario:

Some time back we had a requirement, to pull 6 set of FTP files daily. All these files were in zipped format.

All zipped files (except one), were in “.csv” format and only one file was in “.txt” format.

Now, directly if we unzip and process files using ForEach, it will try to load the data even from zipped files to a table. But this will throw an error. To resolve this we have 3 options:

1.       Move .zip files to some other folder using another ForEach

2.       Create 2 separate ForEach, one for .csv and another for .txt

3.       Next option is to ignore the .zip files and process only .txt and .csv files

 Here is a step-by-step process on how the third option can be achieved

Suppose, we had to process all the files which had a name “SSIS” in it and all were in “.csv” format:

Inside ForEach Loop, under Files section we would have given *SSIS*.csv, something as below:



 

But now along with .csv file we need to process”.txt” file which has a word SSIS in its filename.

i.e., we need to process both “.txt” and “.csv” files which has a word SSIS in its filename

To loop such multiple kinds of files, we can use regular expressions. Detail on how it can be used in SSIS is given below

Create 2 variables

1.       RegExFilter: Data Type String; Value = ^.*\.(txt|csv)$

2.       PassesRegexFilter: Data Type Boolean;

 

Using above 2 variables, we will check if the file which will be processed is a csv or txt file using a Script Task.

Drag a Script task inside ForEachLoop container and add the below code( C# code )

/*

   Microsoft SQL Server Integration Services Script Task

   Write scripts using Microsoft Visual C# 2008.

   The ScriptMain is the entry point class of the script.

*/

 

using System;

using System.Data;

using Microsoft.SqlServer.Dts.Runtime;

using System.Windows.Forms;

using System.Text.RegularExpressions;   // Added

using System.IO;                        // Added

 

 

namespace ST_8102847e64a74fb5ac4b523cfa570381.csproj

{

    [System.AddIn.AddIn("ScriptMain", Version = "1.0", Publisher = "", Description = "")]

    public partial class ScriptMain : Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase

    {

 

        #region VSTA generated code

        enum ScriptResults

        {

            Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success,

            Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure

        };

        #endregion

 

        /*

            The execution engine calls this method when the task executes.

            To access the object model, use the Dts property. Connections, variables, events,

            and logging features are available as members of the Dts property as shown in the following examples.

 

            To reference a variable, call Dts.Variables["MyCaseSensitiveVariableName"].Value;

            To post a log entry, call Dts.Log("This is my log text", 999, null);

            To fire an event, call Dts.Events.FireInformation(99, "test", "hit the help message", "", 0, true);

 

            To use the connections collection use something like the following:

            ConnectionManager cm = Dts.Connections.Add("OLEDB");

            cm.ConnectionString = "Data Source=localhost;Initial Catalog=AdventureWorks;Provider=SQLNCLI10;Integrated Security=SSPI;Auto Translate=False;";

 

            Before returning from this method, set the value of Dts.TaskResult to indicate success or failure.

           

            To open Help, press F1.

      */

 

 

        public void Main()

        {

            // Get the filename from the complete filepath

            String FileName = Path.GetFileName(Dts.Variables["User::ImportFile"].Value.ToString());

 

            // Create a regex object with the pattern from the SSIS variable

            Regex FileFilterRegex = new Regex(Dts.Variables["User::RegExFilter"].Value.ToString());

 

            // Check if it is match and return that value (boolean) to the SSIS variable

            Dts.Variables["User::PassesRegexFilter"].Value = FileFilterRegex.IsMatch(FileName);

 

            Dts.TaskResult = (int)ScriptResults.Success;

 

        }

    }

}

This code will check if the file is txt /csv. If yes then the variable PassesRegexFilter  is set to true.

After this task, add other logic to process the file. Prior to these task logics, add a script task and connect them using an expression @[User::PassesRegexFilter] == True as below

 

The ForEach Loop looks something as below



 

Reference: