SSIS - Validate File Name, File Header and Footer Information against Definition Table

Scenario:

Create a SSIS package that will load csv files to different tables but before loading we want to validate the Header and read the Footer information. If Header does not match with our header definition that we have saved in table then we need to send an email and also update the wrong header value in our table for further investigation. If Header matches with our definition then we want to read the Footer row that has number of records. We need to compare this with actual number or rows in the file.

File Format

One Intro line
HeaderLine with | or , delimited
Details Row
Footer Cnt

Solution:


Step 1: 
Lets create a sample file that we can use for our package , FileName =Test_File_20131108.txt. We will getting the file with same name , the only part will change is date in file name.

Sample File TestFile_20131109 created at 12:00:00
ID|Name|Address|Phone
1|Aamir|ABC Address|5054141969
2|Raza| XYZ Address|505000000
RowCnt 5

Step 2: 
Lets create a table(dbo.FileValidation) where we can save the filename without datetime, File extension, folder path and Header information.

CREATE TABLE [dbo].[FileValidation](
[FileValidationID] [int] IDENTITY(1,1) NOT NULL,
[FileName] [varchar](100) NULL,
[Header] [varchar](4000) NULL,
[FolderPath] [varchar](200) NULL,
[FileExtension] CHAR(3)
)
Insert the record for one first file

INSERT INTO Dbo.FileValidation (FileName,Header,FolderPath,FileExtension)
 Values ('Test_File_','ID|Name|Address|Phone','C:\Users\ashahzad\Desktop\SourceFile','txt')




Create  [dbo].[FileLoadInformaiton] table so we can save the information at end of  validation either success or failure.

CREATE TABLE [dbo].[FileLoadInformaiton](
[Id] [int] IDENTITY(1,1) NOT NULL,
[FileValidationID] [int] NULL,
[FileName] [varchar](100) NULL,
[LoadStatus] [bit] NULL,
[RowCnt] [int] NULL,
[FileHeader] [varchar](500) NULL,
[ErrorMessage] [varchar](500) NULL,
[LoadDateTime] [datetime] NULL
)

Step 3:
Lets read the folder path from the Table so we can use for our Foreach loop to read files from Source Folder. Save the Folder path in VarFolderPath Variable.




Step 4:
We will be using the For each loop to read all the files from given folder one at a time. Lets use this VarFolderPath in For each Loop as shown below






we need to save File name with extension in some variable( VarFileName). Lets Click on Variable Mapping on Left Side of Foreach Loop Editor.


By pressing Ok we are done with configuration of Foreach loop container.

File Name Validation :

First of all we want to validate the name of file against our validation table before evening we validate the Header row with definition in our table. 
To validate the file name , we will be using Execute SQL Task inside the foreach loop.

Configure Execute Sql task as shown below


Use this code in Expressions
"DECLARE @FileName VARCHAR(100)
SET @FileName='"+ @[User::VarFileName] +"'
IF EXISTS (SELECT 1
           FROM   [dbo].[FileValidation]
           WHERE  filename = Substring(@FileName, 1, Len(@FileName) - 12))
  BEGIN
      SELECT 1 AS FileExistsFlg,
             Header
      FROM   [dbo].[FileValidation]
      WHERE  filename = Substring(@FileName, 1, Len(@FileName) - 12)
  END
IF NOT EXISTS (SELECT 1
               FROM   [dbo].[FileValidation]
               WHERE  filename = Substring(@FileName, 1, Len(@FileName) - 12))
  BEGIN
      SELECT 0  AS FileExistsFlg,
             '' AS Header
  END "



Lets Map the values to the VarFileExistsFlg and VarHeader variables.



Now if VarFileExistsFlg=0 then we want to send an email or move the file to archive folder. use precedence constraints to do this as shown below


When the VarFileExistsFlg=1 then we should be moving to SCR_Validate Header Script Task.



Header Validation : In progress

Step 1 : 
Create Variables as shown below so we can use these variables in Script task for validation and return values from Script task.


Step 2: 
Bring Script Task to Control flow Pane inside For-each Loop container and provide the readonly variables as well ReadWriteVariables as shown


Click on Edit Script and then paste code as shown below. I have added code that is shown in RED

/*
   Microsoft SQL Server Integration Services Script Task
   Write scripts using Microsoft Visual C# 2008.
   The ScriptMain is the entry point class of the script.
*/

using System;
using System.Data;
using Microsoft.SqlServer.Dts.Runtime;
using System.Windows.Forms;
using System.IO;

namespace ST_dd0444829b574914af253673e7059135.csproj
{
    [System.AddIn.AddIn("ScriptMain", Version = "1.0", Publisher = "", Description = "")]
    public partial class ScriptMain : Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase
    {

        #region VSTA generated code
        enum ScriptResults
        {
            Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success,
            Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
        };
        #endregion

        /*
The execution engine calls this method when the task executes.
To access the object model, use the Dts property. Connections, variables, events,
and logging features are available as members of the Dts property as shown in the following examples.

To reference a variable, call Dts.Variables["MyCaseSensitiveVariableName"].Value;
To post a log entry, call Dts.Log("This is my log text", 999, null);
To fire an event, call Dts.Events.FireInformation(99, "test", "hit the help message", "", 0, true);

To use the connections collection use something like the following:
ConnectionManager cm = Dts.Connections.Add("OLEDB");
cm.ConnectionString = "Data Source=localhost;Initial Catalog=AdventureWorks;Provider=SQLNCLI10;Integrated Security=SSPI;Auto Translate=False;";

Before returning from this method, set the value of Dts.TaskResult to indicate success or failure.

To open Help, press F1.
*/

        public void Main()
        {
            string FilFullPath="";
            string HeaderRow = "";
            string FooterRow = "";
            Int32 RowCnt = 0;
            string RowCntFromFooter = "";

            FilFullPath = Dts.Variables["User::VarFolderPath"].Value.ToString() + "\\" + Dts.Variables["User::VarFileName"].Value.ToString();

           //Check if file path is correct
            // MessageBox.Show(FilFullPath);

            try
            {
                string[] lines = System.IO.File.ReadAllLines(FilFullPath);

                // Display the file contents by using a foreach loop.
               
                foreach (string line in lines)
                {
                    // Use a tab to indent each line of the file.
                   // MessageBox.Show(line);
                    RowCnt += 1;

                    if (RowCnt == 2)
                    {
                        HeaderRow = line;
                    }

                }

                FooterRow = lines[RowCnt - 1].ToString();
                RowCntFromFooter = FooterRow.Replace("RowCnt","").Trim();
                
                //Used only for Debugging...
                //MessageBox.Show("Row Cnt from File Foooter::" + RowCntFromFooter);
                //MessageBox.Show("Row Cnt from File ::" + RowCnt.ToString());
                //MessageBox.Show("HEader Row from File::" + HeaderRow);
                //MessageBox.Show(" Footer Row::" + lines[RowCnt - 1].ToString());
                //MessageBox.Show("Header from Definition table" + Dts.Variables["User::VarHeader"].Value.ToString());

                //Validation
                if (Dts.Variables["User::VarHeader"].Value.ToString() != HeaderRow)
                {
                    Dts.Variables["User::VarErrorMessage"].Value = "Header Does not Match";

                    if (RowCnt != Convert.ToInt32(RowCntFromFooter))
                    {
                        Dts.Variables["User::VarErrorMessage"].Value = "Header does not match as well Footer Row Count(" + RowCntFromFooter + ") does not match with total file Row count(" + RowCnt.ToString()+")";
                    }

                    Dts.Variables["User::VarValidationFlg"].Value = 0;
                    Dts.Variables["User::VarFileHeader"].Value = HeaderRow;
                    
                }

                if (Dts.Variables["User::VarHeader"].Value.ToString() == HeaderRow && RowCnt == Convert.ToInt32(RowCntFromFooter))
                {
                    Dts.Variables["User::VarErrorMessage"].Value = "Header Row and Footer Count matches";
                    Dts.Variables["User::VarValidationFlg"].Value = 1;
                    Dts.Variables["User::VarFileRowCnt"].Value = RowCnt;
                }

                               
            }
            catch (Exception e)
            {
                Console.WriteLine("The file could not be read:");
                Console.WriteLine(e.Message);
            }
            // TODO: Add your code here
            Dts.TaskResult = (int)ScriptResults.Success;
        }
    }
}


Step 3:
 Bring two Execute SQL Task and name them SQL1_Successful Validation Entry and SQL2_Invalid Validation Entry.

Configure Precedence Constraints for both of the Execute SQL Tasks as shown


Step 4: 
We will be using SQL1_Successful Validation Entry ( Execute SQL task) and SQL2_Invalid Validation Entry( Execute SQL Task) to insert information in our dbo.FileLoadInformation table.

Double Click on SQL1_Successful Validation Entry and Choose the data Source and then configure as shown in below Fig.

Use this code for Expression
"Insert into dbo.FileLoadInformaiton(FileValidationID,FileName,LoadStatus,RowCnt,FileHeader,LoadDateTime)"+
"Select (Select FileValidationID from dbo.FileValidation WHERE filename = Substring('"+@VarFileName+"' ,1,LEN('"+@VarFileName+"')-12)) AS FileValidationId,'"+ @[User::VarFileName]+"' AS FileName,1 as LoadStatus,"
+ (DT_STR,50,1252)@[User::VarFileRowCnt] +" AS RowCnt,'"+ @[User::VarHeader] +"' AS FileHeader,getdate() AS LoadDateTime"



Now you need to configure SQL2_Invalid Validation Entry( Execute SQL Task). follow the same steps you did above but use this code

"Insert into dbo.FileLoadInformaiton(FileValidationID,FileName,LoadStatus,RowCnt,FileHeader,LoadDateTime,ErrorMessage)"+
"Select (Select FileValidationID from dbo.FileValidation WHERE filename = Substring('"+@VarFileName+"' ,1,LEN('"+@VarFileName+"')-12)) AS FileValidationId,'"+ @[User::VarFileName]+"' AS FileName,0 as LoadStatus,"
+ (DT_STR,50,1252)@[User::VarFileRowCnt] +" AS RowCnt,'"+ @[User::VarFileHeader] +"' AS FileHeader,getdate() AS LoadDateTime,'"+ @[User::VarErrorMessage]+"' AS ErrorMessage"

Final OutPut : 

Run your package few times , change the header information in the file, change the row count in footer etc to check everything working fine.
Use the Select query to check the information loaded into dbo.FileLoadInformation table


Complete Package 

Complete package looks like this. This is just a sample package and I might have missed so many things. The goal of this package was to give any idea how File Validation can be done before even loading the file and send notifications or Audit information in Table.

1 comment: