Monday, February 23, 2009

SSIS Interview Questions

Common search for new SSIS programmer looking for change is what questions to expect on SSIS. Based on the interviews I take on SSIS, I will list down my favorites and expected questions on SSIS.


Q1 Explain architecture of SSIS?

SSIS architecture consists of four key parts:
a) Integration Services service: monitors running Integration Services packages and manages the storage of packages.
b) Integration Services object model: includes managed API for accessing Integration Services tools, command-line utilities, and custom applications.
c) Integration Services runtime and run-time executables: it saves the layout of packages, runs packages, and provides support for logging, breakpoints, configuration, connections, and transactions. The Integration Services run-time executables are the package, containers, tasks, and event handlers that Integration Services includes, and custom tasks.
d) Data flow engine: provides the in-memory buffers that move data from source to destination.



Q2 How would you do Logging in SSIS?
Logging Configuration provides an inbuilt feature which can log the detail of various events like onError, onWarning etc to the various options say a flat file, SqlServer table, XML or SQL Profiler.

Q3 How would you do Error Handling?
A SSIS package could mainly have two types of errors
a) Procedure Error: Can be handled in Control flow through the precedence control and redirecting the execution flow.
b) Data Error: is handled in DATA FLOW TASK buy redirecting the data flow using Error Output of a component.


Q4 How to pass property value at Run time? How do you implement Package Configuration?
A property value like connection string for a Connection Manager can be passed to the pkg using package configurations.Package Configuration provides different options like XML File, Environment Variables, SQL Server Table, Registry Value or Parent package variable.

Q5 How would you deploy a SSIS Package on production?

 A) Through Manifest
1. Create deployment utility by setting its propery as true .
2. It will be created in the bin folder of the solution as soon as package is build.
3. Copy all the files in the utility and use manifest file to deply it on the Prod.

B) Using DtsExec.exe utility
C)Import Package directly in MSDB from SSMS by logging in Integration Services.

Q6 Difference between DTS and SSIS?
Every thing except both are product of Microsoft :-).

Q7 What are new features in SSIS 2008?
explained in other post
http://sqlserversolutions.blogspot.com/2009/01/new-improvementfeatures-in-ssis-2008.html

Q8 How would you pass a variable value to Child Package?
too big to fit here so had a write other post
http://sqlserversolutions.blogspot.com/2009/02/passing-variable-to-child-package-from.html


Q9 What is Execution Tree?
Execution trees demonstrate how package uses buffers and threads. At run time, the data flow engine breaks down Data Flow task operations into execution trees. These execution trees specify how buffers and threads are allocated in the package. Each tree creates a new buffer and may execute on a different thread. When a new buffer is created such as when a partially blocking or blocking transformation is added to the pipeline, additional memory is required to handle the data transformation and each new tree may also give you an additional worker thread.


Q10 What are the points to keep in mind for performance improvement of the package?
http://technet.microsoft.com/en-us/library/cc966529.aspx

Q11 You may get a question stating a scenario and then asking you how would you create a package for that e.g. How would you configure a data flow task so that it can transfer data to different table based on the city name in a source table column?


Q13 Difference between Unionall and Merge Join?
a) Merge transformation can accept only two inputs whereas Union all can take more than two inputs

b) Data has to be sorted before Merge Transformation whereas Union all doesn't have any condition like that.


Q14 May get question regarding what X transformation do?Lookup, fuzzy lookup, fuzzy grouping transformation are my favorites.
For you.

Q15 How would you restart package from previous failure point?What are Checkpoints and how can we implement in SSIS?
When a package is configured to use checkpoints, information about package execution is written to a checkpoint file. When the failed package is rerun, the checkpoint file is used to restart the package from the point of failure. If the package runs successfully, the checkpoint file is deleted, and then re-created the next time that the package is run.

Q16 Where are SSIS package stored in the SQL Server?
MSDB.sysdtspackages90 stores the actual content and ssydtscategories, sysdtslog90, sysdtspackagefolders90, sysdtspackagelog, sysdtssteplog, and sysdtstasklog do the supporting roles.


Q17 How would you schedule a SSIS packages?
Using SQL Server Agent. Read about Scheduling a job on Sql server Agent

Q18 Difference between asynchronous and synchronous transformations?
Asynchronous transformation have different Input and Output buffers and it is up to the component designer in an Async component to provide a column structure to the output buffer and hook up the data from the input.

Q19 How to achieve parallelism in SSIS?
Parallelism is achieved using MaxConcurrentExecutable property of the package. Its default is -1 and is calculated as number of processors + 2.

-More questions added-Sept 2011
Q20 How do you do incremental load?

Fastest way to do incremental load is by using Timestamp column in source table and then storing last ETL timestamp, In ETL process pick all the rows having Timestamp greater than the stored Timestamp so as to pick only new and updated records

Q21 How to handle Late Arriving Dimension or Early Arriving Facts.
 

Late arriving dimensions sometime get unavoidable 'coz delay or error in Dimension ETL or may be due to logic of ETL. To handle Late Arriving facts, we can create dummy Dimension with natural/business key and keep rest of the attributes as null or default.  And as soon as Actual dimension arrives, the dummy dimension is updated with Type 1 change. These are also known as Inferred Dimensions.


29 comments:

Anonymous said...

Thanks!! Its gud..

uma said...

very nice Thanks

sathish said...

its really nice article

sri said...

gud

Sridhar said...

Hey Rahul,

Thanks for such a nice Questions, can i have your email id[yahoo preferable] as i need few dobuts to be cleared in SSIS?

Thanks,
-Sastry

Rahul Kumar said...

@Sastry
mailrahul15@yahoo.com
I am online on Logtorahul@gmail.com for most the hours in the day.

BIRAMS said...

very nice. do you have any other interview question in ssis or ssas/ssrs

Anonymous said...

HI Rahul,

Your questions are too good.Thanks a lot.I have an interview tomorrow..have to see how it goes.

Rahul Kumar said...

All the best!!

SUSHANT said...

@ rahul

I have a question, i wanted to take the output from a sql table to an excel file, so i used SSIS, i created oledb source and wrote the sql command in it for fetching data.
I took its output to a excel destination.
but when i run this, it doesnt show any error but the excel file doesnt get populated with the results.
Help me
Post ur answer on
sushantkumar1984@gmail.com

Thanks,
SUSHANT

Anonymous said...

hi
i have 100 records
i have to split it into 10 excel sheet each sheet 10 records
how can i do it

Rahul Kumar said...

Answer to last question in chain
"i have 100 records.i have to split it into 10 excel sheet each sheet 10 records.how can i do it"
Answer:
1. If it requirement is kind of static.. use attach conditionalsplit to source and divide records into 10 streams depending on some key and put them to excel destination.

2. If you want to make it dynamic then proceed with these hints
Declare a CounterVariable, Use For Loop , use a DFT inside ForLoop, select records based in CounterVariable from source, hook it to one Exceldestination, in conection manager of ExcelDestination use CounterVariable to select sheets

Anonymous said...

HI Rahul,
This is Malli; recently i faced diff q's in Capgemini.
Q) on what basisc scd will work?
Q) table architecture for Lookup?
Q) who will delopy the package; weather developer or Team Lead or some other people?

Rahul Kumar said...

@Malli,
1.SCD (slowly changing dimensions) basically works on the Type of SCD(1,2 or 3) is used.Based on SCD type and changes in attribute SCD transformation provides following output
a. Changing attributes b. fixed attributes c. Historical attributes d. Inferred memebers e. new f. Unchanged output

2. Only requirement is that tables used in lookup should have a common joining key between them

3. Nothing hardfix, depends on organization, but usually there is a code promoter and it is his responsibility to deploy code to production.

Hope i have answered your questions.. let me know if you have any more doubts.Thanks

RRK said...

Rohidas Kumbharkar

Good nice article,

sasi said...

Thank you its very nice

Anshu said...

Very helpful, thanks a ton :)

Anonymous said...

Could you please post some scenario/real time probs of SSIS..

Thanks

Rahul Kumar said...

Sure.. I will add them in couple of days.. lemme get some time off..

Anonymous said...

yeah please do post some scenarios and real time issues

Anonymous said...

Thanks Rahul ,nice set of question .

Anonymous said...

very nice rahullllllllllllll

dinesh said...

DTS:-
1) Limited Error Hnadle
2) Message boxes archieve scripts
3)Limited set of Transformation
4)NO Buiness Functionality

SSIS:-
1)Complex and Powerful error handling
2)Message boxes is not scripting
3)good no of Transformations
4)It have complex Functionality

Anonymous said...

sdfasdfasdf

Anonymous said...

it's a miracle

Purna Ch. Sasmal said...

In excel sheet, records information started from 15th row and there is mixed data in sheet in column. How can I fetch data from excel sheet?

Anonymous said...

May get question regarding what X transformation do?Lookup, fuzzy lookup, fuzzy grouping transformation are my favorites.

Shall i knw wat answer is corret for this

Raja Sekhar said...

Fuzzy group is used to group data based on a column like name where we can have different spelings or phonetics for a name. We need to group them to get the names corrected. This is primarily used for data clansing and is not 100% accurate we need to set it up for best fit in our case.

Lookup is used to find an exact match.
Where as, Fuzzy lookup is used similar to lookup transform only difference is instead of exact match it looks for similar data like spelling error or phonetic difference.

Anonymous said...

Thanks..for the questoions .please add some more scenario based questions.

Post a Comment