So while working with a client recently I was going some performance tuning on SSIS for them. While SSIS isn’t exactly my normal workload, the problem wasn’t really in SSIS, but just normal SQL Statements being called from SSIS.
While attacking this problem, the first thing that I did was look into the SSIS package and see what I had gotten myself into. There were hundreds of objects within SSIS, most of them T-SQL Scripts of Data Pump Tasks. So simply going through them one by one wasn’t going to be all that helpful. The client didn’t have any logging to SQL Server or to a text file setup, but they did have the SSIS package setup to log to the Windows event log.
Opening the Application log greated me with one event for the start of each task, and one for the completion of each task. Not exactly the most useful information ever, but better than nothing, so I’ll take what I can get. The first thing that I did was export the Application Event log to a CSV file and download it to my desktop so that I could do the processing on my local machine. The CSV file has a few hundred thousand lines, so I needed to figure out what I had via scripting. The first problem that I had was that for each SSIS event there were multiple lines within the csv when opened in Excel which you can see below.
Thankfully this made the rows pretty unique looking, so an Excel Macro was able to clean this up pretty easily. Before I got started I added some column headers into the excel sheet. The default columns from Event Viewer are Date, Time, Source, Severity, Category, EventID, User, Computer, Description. This covered columns A through I. After that I put columns J through P as being Operator, SourceName, SourceID, Execution, StartTime, EndTime and DataCode as these are the specific items that SSIS is logging that you can see above. Then it was time to bust out a little VBA and flatten out all this data. The macro shown below looped through the data and flattened everything out so that all the data was on a single line. Then is went through and removed all the white lines. This macro took about 3 hours to run.
Sub CleanUpData()
Dim Row As Long
Dim DataRow As Long
Row = 32768
Do While Range("A" & Row).Value = ""
If Range("B" & Row).Value = "" Then
DataRow = Row
Range("A" & Row).Select
Else
If Left(Range("A" & Row).Value, 4) = " Mes" Then
Range("A" & Row).Value = ""
End If
If Left(Range("A" & Row).Value, 4) = " Ope" Then
Range("J" & DataRow).Value = Range("A" & Row).Value
Range("A" & Row).Value = ""
End If
If Left(Range("A" & Row).Value, 9) = " Source N" Then
Range("K" & DataRow).Value = Range("A" & Row).Value
Range("A" & Row).Value = ""
End If
If Left(Range("A" & Row).Value, 10) = " Source ID" Then
Range("L" & DataRow).Value = Range("A" & Row).Value
Range("A" & Row).Value = ""
End If
If Left(Range("A" & Row).Value, 4) = " Exe" Then
Range("M" & DataRow).Value = Range("A" & Row).Value
Range("A" & Row).Value = ""
End If
If Left(Range("A" & Row).Value, 4) = " Sta" Then
Range("N" & DataRow).Value = Range("A" & Row).Value
Range("A" & Row).Value = ""
End If
If Left(Range("A" & Row).Value, 4) = " End" Then
Range("O" & DataRow).Value = Range("A" & Row).Value
Range("A" & Row).Value = ""
End If
If Left(Range("A" & Row).Value, 4) = " Dat" Then
Range("P" & DataRow).Value = Range("A" & Row).Value
Range("A" & Row).Value = ""
End If
End If
Row = Row + 1
Loop
Do While Row 2
If Range(“A” & Row).Value = “” Then
Range(Row & “:” & Row).Select
Selection.Delete
End If
Row = Row – 1
Loop
End Sub
Once that was done I needed to set the start times for each event on the same line as the end time for each event. That way I wasn’t scrolling through looking for stuff while trying to do my data analysis. This I was also able to do with a little Macro. In this case I started on row 14 as that was the first entry that had SSIS data. You may need to change the EndRow=14 line to what ever line your data starts on. This puts the start time and the end time into Columns R and S, but only on the line which is the OnPostExecute line.
Sub FindRunTimes()
Dim EndRow As Long
Dim StartRow As Long
Dim SourceName As String
EndRow = 14
Do While Range("A" & EndRow).Value = ""
If Range("I" & EndRow).Value = " Event Name: OnPostExecute" Then
StartRow = EndRow + 1
Do Until Range("K" & EndRow).Value = Range("K" & StartRow).Value And Range("I" & StartRow).Value = " Event Name: OnPreExecute"
If Range("I" & StartRow).Value = "" Then
MsgBox "Ran out of rows to process for ending row " & EndRow
Exit Sub
End If
StartRow = StartRow + 1
Loop
Range("S" & EndRow).Value = Range("B" & EndRow).Value
Range("R" & EndRow).Value = Range("B" & StartRow).Value
'Range("T" & EndRow).Value = DateDiff("mi", Range("A" & StartRow).Value & " " & Range("B" & StartRow).Value, Range("A" & EndRow).Value & " " & Range("B" & EndRow).Value)
Range("R" & EndRow).Select
End If
EndRow = EndRow + 1
Loop
End Sub
Once that Macro was finished I needed to figure out which of the statements was taking to long to run. Putting a formula into the column Q marked the rows that I needed to look into a little more. In this case I was using anything with a run time of 20 minutes or longer as needing to be looked at.
=IF(HOUR(R595)=HOUR(S595)*60+MINUTE(S595),"", "X"), IF(HOUR(R595)*60+MINUTE(R595)+19>=HOUR(S595)*60+MINUTE(S595)+1440,"", "X"))
Once that formula was in there all the rows that were SSIS objects that took a long time to run had an X next to them. Now I still needed to ignore the rows which were containers, but those were pretty easy to figure out from the names of the objects as most of them had container in them (in the SourceName column which was column K of my Worksheet). At this point I was able to move back to the SSIS package and start digging through the package finding the objects which had long runtimes and getting those looked at.
The whole reason I went through this process instead of just adding logging to the SSIS package was for a few different reasons.
- I didn’t want to make ANY changes to production if I didn’t have to. This SSIS package loads a production data warehouse which the business uses to run their company, so making changes to it wasn’t something I wanted to try.
- If I had simply put logging into place I would have had to have waited another day to being looking at the problems as the ETL process only runs once a day.
- The SSIS package takes 8+ hours to run, so sitting around waiting for it to run wasn’t exactly the best use of my time, or the clients money.
If you get stuck in this same sort of problem, hopefully this approach will help you out.
Denny