Wednesday, February 25, 2009

Passing variable to Child package from Parent package in SSIS

(APPLICABLE ONLY IN SSIS 2005, Use Parent Package Configuration for SSIS 2008)

Very frequent Question which looks so complicated to programmers.

Actually passing a variable value to a child package is very trivial task. We can pass on the value by configuring parent variable in package configuration but there is an easy way of achieve this and the fact lies beneath the fundamental principle of Variable Scope.

If you call a Child package then it is like a container itself and all the variables defined in above hierarchy will be accessible in the Child package.

Let me show this with an example in which I will declare a variable "ParentVar" in my parent package and call a Child package which will access "ParentVar" and display in a msgbox.

1) Parent: Create Parent Package and declare a variable "ParentVar"

2) Child: Create a Child package and use a script task and define readonly variable as ParentVar

3) Child: Now in Script you can use ParentVar like any other I am using to display it in a msgbox. I would suggest to create another child package variable and assign Parent package variable value to it and use it in the child package variable through out the package.

4) Child: Whole Child package will look like

5) Parent: Now in parent package call the child package through Execute Package task.
The Parent package will look like

6) Result: Execute Parent will in turn call child package and it will display msgbox

Here was a simple method to use parent package variable in Child package.

Monday, February 23, 2009

SSIS Interview Questions

Common search for new SSIS programmer looking for change is what questions to expect on SSIS. Based on the interviews I take on SSIS, I will list down my favorites and expected questions on SSIS.

Q1 Explain architecture of SSIS?

SSIS architecture consists of four key parts:
a) Integration Services service: monitors running Integration Services packages and manages the storage of packages.
b) Integration Services object model: includes managed API for accessing Integration Services tools, command-line utilities, and custom applications.
c) Integration Services runtime and run-time executables: it saves the layout of packages, runs packages, and provides support for logging, breakpoints, configuration, connections, and transactions. The Integration Services run-time executables are the package, containers, tasks, and event handlers that Integration Services includes, and custom tasks.
d) Data flow engine: provides the in-memory buffers that move data from source to destination.

Q2 How would you do Logging in SSIS?
Logging Configuration provides an inbuilt feature which can log the detail of various events like onError, onWarning etc to the various options say a flat file, SqlServer table, XML or SQL Profiler.

Q3 How would you do Error Handling?
A SSIS package could mainly have two types of errors
a) Procedure Error: Can be handled in Control flow through the precedence control and redirecting the execution flow.
b) Data Error: is handled in DATA FLOW TASK buy redirecting the data flow using Error Output of a component.

Q4 How to pass property value at Run time? How do you implement Package Configuration?
A property value like connection string for a Connection Manager can be passed to the pkg using package configurations.Package Configuration provides different options like XML File, Environment Variables, SQL Server Table, Registry Value or Parent package variable.

Q5 How would you deploy a SSIS Package on production?

 A) Through Manifest
1. Create deployment utility by setting its propery as true .
2. It will be created in the bin folder of the solution as soon as package is build.
3. Copy all the files in the utility and use manifest file to deply it on the Prod.

B) Using DtsExec.exe utility
C)Import Package directly in MSDB from SSMS by logging in Integration Services.

Q6 Difference between DTS and SSIS?
Every thing except both are product of Microsoft :-).

Q7 What are new features in SSIS 2008?
explained in other post

Q8 How would you pass a variable value to Child Package?
too big to fit here so had a write other post

Q9 What is Execution Tree?
Execution trees demonstrate how package uses buffers and threads. At run time, the data flow engine breaks down Data Flow task operations into execution trees. These execution trees specify how buffers and threads are allocated in the package. Each tree creates a new buffer and may execute on a different thread. When a new buffer is created such as when a partially blocking or blocking transformation is added to the pipeline, additional memory is required to handle the data transformation and each new tree may also give you an additional worker thread.

Q10 What are the points to keep in mind for performance improvement of the package?

Q11 You may get a question stating a scenario and then asking you how would you create a package for that e.g. How would you configure a data flow task so that it can transfer data to different table based on the city name in a source table column?

Q13 Difference between Unionall and Merge Join?
a) Merge transformation can accept only two inputs whereas Union all can take more than two inputs

b) Data has to be sorted before Merge Transformation whereas Union all doesn't have any condition like that.

Q14 May get question regarding what X transformation do?Lookup, fuzzy lookup, fuzzy grouping transformation are my favorites.
For you.

Q15 How would you restart package from previous failure point?What are Checkpoints and how can we implement in SSIS?
When a package is configured to use checkpoints, information about package execution is written to a checkpoint file. When the failed package is rerun, the checkpoint file is used to restart the package from the point of failure. If the package runs successfully, the checkpoint file is deleted, and then re-created the next time that the package is run.

Q16 Where are SSIS package stored in the SQL Server?
MSDB.sysdtspackages90 stores the actual content and ssydtscategories, sysdtslog90, sysdtspackagefolders90, sysdtspackagelog, sysdtssteplog, and sysdtstasklog do the supporting roles.

Q17 How would you schedule a SSIS packages?
Using SQL Server Agent. Read about Scheduling a job on Sql server Agent

Q18 Difference between asynchronous and synchronous transformations?
Asynchronous transformation have different Input and Output buffers and it is up to the component designer in an Async component to provide a column structure to the output buffer and hook up the data from the input.

Q19 How to achieve parallelism in SSIS?
Parallelism is achieved using MaxConcurrentExecutable property of the package. Its default is -1 and is calculated as number of processors + 2.

-More questions added-Sept 2011
Q20 How do you do incremental load?

Fastest way to do incremental load is by using Timestamp column in source table and then storing last ETL timestamp, In ETL process pick all the rows having Timestamp greater than the stored Timestamp so as to pick only new and updated records

Q21 How to handle Late Arriving Dimension or Early Arriving Facts.

Late arriving dimensions sometime get unavoidable 'coz delay or error in Dimension ETL or may be due to logic of ETL. To handle Late Arriving facts, we can create dummy Dimension with natural/business key and keep rest of the attributes as null or default.  And as soon as Actual dimension arrives, the dummy dimension is updated with Type 1 change. These are also known as Inferred Dimensions.

Thursday, February 19, 2009

Selecting Excel source range in SSIS

Many times I have been asked by SSIS programmers that they face difficulties when they need to select some specified columns and rows or they need to skip first 2 /3 rows in the excel sheet.

Below I will demonstrate how we can select a specified range in an Excel Sheet.

A) Let me take below sheet as an example and select data after first 2 rows

B) Now in Data Flow task, select an Excel source
Configure Data Access Mode as SQL Command
and in SQL Command Text give query as

SELECT * FROM [Sheet1$ range]
SELECT * FROM [Sheet1$A2:B10]

C) Preview

This way user can select any range or any specific cell value in an Excel File

Tuesday, February 03, 2009

Matching in Lookup Component in SSIS

Many SSIS programmers get confused with the comparison results of Lookup transformation and feels baffled ;-| . Lookup transformation in SSIS is used to lookup for a reference in the lookup table where it performs a equi-join and returns the referenced column.

Let me explain the comparison/matching behavior of lookup transformation: It matches data differently when in cached (default) and non-cached (Enable memory Restriction) mode. In cached mode comparison will be case sensitive where as in non-cached mode comparison will be based on collation level of column being matched.

Mode :: Behavior
Cached (default) :: Case sensitive
Non-cached (Enable memory restrictions) :: Collation level of column

It basically depends where the comparision is done. In cached mode SSIS reterives data from the source and comparision is done at the client on the byte level which is CASE SENSITIVE. Whereas in Non-Cache mode ( Enable memory Restriction) comparision is done at database and is based on the collation level of the column being matched.

That's the behavior of Lookup Transformation in SSIS.