Create SQL Scripts for Data in a table

On occasion I want to get data from a table and into a script – particularly when I want to create a deployment script for standing data that needs to be migrated between servers. Rather than hand-craft it this last time I’ve created the script below which, when given the name of a table, generates a union’d list of select statements that represent the data. With a slight modification this list can then be selected into a temporary table then joined back to the original in an update/insert to create a script that automatically maintains the data.

There are some caveats to this script though :

  • You have to delete the first ‘union’ in the resulting SQL.
  • It doesn’t handle dates correctly and potentially some other data types – I did write this in a hurry!

1 declare @tablename nvarchar(max) 2 set @tablename = 'testtable' 3 4 if not exists ( select [TABLE_NAME] from INFORMATION_SCHEMA.TABLES where [TABLE_NAME] = @tablename ) 5 begin 6 declare @message nvarchar(max) 7 set @message = 'The table ' + @tablename + ' does not exist' 8 raiserror ( @message, 18, 0 ); 9 return; 10 end 11 12 declare @sql nvarchar(max) 13 set @sql = '' 14 15 declare @newline nvarchar(2) 16 set @newline = char(13) + char(10) 17 18 select @sql = @sql 19 + case when ORDINAL_POSITION > 1 then ' + '', '' + ' else '' end 20 + case when IS_NULLABLE = 'YES' then 'isnull(' else '' end 21 + case when NUMERIC_SCALE is not null or DATA_TYPE = 'bit' then 'convert(nvarchar, ' else '' end 22 + case when CHARACTER_MAXIMUM_LENGTH is not null then ' '''''''' + ' else '' end 23 + '[' + isc.COLUMN_NAME + ']' 24 + case when CHARACTER_MAXIMUM_LENGTH is not null then ' + ''''''''' else '' end 25 + case when NUMERIC_SCALE is not null or DATA_TYPE = 'bit' then ')' else '' end 26 + case when IS_NULLABLE = 'YES' then ', ''null'')' else '' end 27 + ' + '' as [' + COLUMN_NAME + ']' + '''' 28 + @newline 29 from INFORMATION_SCHEMA.COLUMNS isc 30 where 31 isc.TABLE_NAME = @tablename 32 order by 33 ORDINAL_POSITION 34 35 set @sql = 'select ''union select '' + ' + @sql + ' from [' + @tablename + ']' 36 exec sp_executesql @sql

Randomising data in a table using T-sql

I needed to obfuscate some data for a demo server but rather than re-invent all the data I simply wanted to randomly swap data so associations could not be made between a store and pay rates for example. After a bit of head scratching I came to the conclusion that I simply needed to be able to correlate one set of data with a random set. The simplest way I thought of was to use a row_number() to correlate one list of data with another where one set of row numbers were ordered by a primary key whilst the other used a little trick of ordering by newid() which is recalculated for each row and therefore randomly distributes the data. Here’s the code :

select *
--update s1 set s1.StoreName = s2.StoreName, s1.StoreAddress1 = s2.StoreAddress1, s1.County = s2.County, s1.Town = s2.Town, s1.PostCode = s2.PostCode, s1.Telephone = s2.Telephone
from (select row_number() over(order by StoreID) as RowNumber, StoreId, StoreName, StoreAddress1, County, Town, PostCode, Telephone from Stores s1) as s1 
join (select row_number() over(order by newid()) as RowNumber, StoreId, StoreName, StoreAddress1, County, Town, PostCode, Telephone from Stores s2) as s2 on s1.RowNumber = s2.RowNumber 

Grouping Multiple Consecutive Days using T-sql

I have a timesheet table that contains one entry per day per employee and against each entry is a reason flag which indicates whether the entry is for a holiday or sickness etc. I’ve written a SQL Server report that generates a list of the holidays that drills-down several levels to the actual holidays but I don’t want to list individual days but rather the holiday periods as a date from/to and numbers of days. With some inspirational thinking from Rob Farley I’ve created must the same query as his to merge the consecutive days and provide a total. Here’s the initial version :

set nocount on

declare @Holidays as table (
    PayrollNumber int not null,
    DateFrom datetime not null,
    DateTo datetime not null,
    NoOfDays int not null
)

insert into @Holidays values(14, '2010-02-08', '2010-02-08', 1.00)
insert into @Holidays values(14, '2010-02-09', '2010-02-09', 1.00)
insert into @Holidays values(14, '2010-02-10', '2010-02-10', 1.00)
insert into @Holidays values(14, '2010-02-11', '2010-02-11', 1.00)
insert into @Holidays values(14, '2010-02-12', '2010-02-12', 1.00)
insert into @Holidays values(14, '2010-03-18', '2010-03-18', 1.00)
insert into @Holidays values(14, '2010-03-19', '2010-03-19', 1.00)
insert into @Holidays values(15, '2010-05-05', '2010-05-05', 1.00)
insert into @Holidays values(15, '2010-05-06', '2010-05-06', 1.00)
insert into @Holidays values(15, '2010-05-07', '2010-05-07', 1.00)
insert into @Holidays values(15, '2010-06-10', '2010-05-10', 1.00)
insert into @Holidays values(15, '2010-06-11', '2010-05-11', 1.00)

select 
    PayrollNumber, 
    min(DateFrom) as DateFrom, 
    max(DateTo) as DateTo, 
    sum(NoOfDays) as DayCount
from (
    select 
        *, 
        row_number() over (partition by PayrollNumber order by DateFrom) 
            - cast(DateFrom  as int) as ConsecutiveOffset
    from @Holidays
) x
group by PayrollNumber, ConsecutiveOffset
order by PayrollNumber, DateFrom

This produces the following result set :

PayrollNumber DateFrom                DateTo                  DayCount
------------- ----------------------- ----------------------- -----------
14            2010-02-08 00:00:00.000 2010-02-12 00:00:00.000 5
14            2010-03-14 00:00:00.000 2010-03-17 00:00:00.000 4
14            2010-03-18 00:00:00.000 2010-03-19 00:00:00.000 2
15            2010-05-05 00:00:00.000 2010-05-07 00:00:00.000 3
15            2010-06-05 00:00:00.000 2010-05-09 00:00:00.000 5
15            2010-06-10 00:00:00.000 2010-05-11 00:00:00.000 2

Which is almost perfect. However, there is a slight complexity in the system I’m working on. Holidays are initially planned and appear in a different table. When the timesheet for today is generated the planned holiday will be copied to the timesheet where it can be altered if reality is slightly different. This means that when I generate a holiday report for the current holiday year I have to merge planned and actual records – the problem is actuals are booked as date ranges. However, the code above has a pleasant side effect and already merges single days appearing consecutively before a date range holiday. Since timesheets can never appear after planned holiday I don’t need to worry about merging single days that appear after a date range. For example, this sql :

insert into @Holidays values(14, '2010-02-08', '2010-02-08', 1.00)
insert into @Holidays values(14, '2010-02-09', '2010-02-09', 1.00)
insert into @Holidays values(14, '2010-02-10', '2010-02-10', 1.00)
insert into @Holidays values(14, '2010-02-11', '2010-02-11', 1.00)
insert into @Holidays values(14, '2010-02-12', '2010-02-12', 1.00)
insert into @Holidays values(14, '2010-03-14', '2010-03-14', 1.00)
-- Here's a date range that came from planned holidays and therefore should not be merged
insert into @Holidays values(14, '2010-03-15', '2010-03-17', 4.00)
insert into @Holidays values(14, '2010-03-18', '2010-03-18', 1.00)
insert into @Holidays values(14, '2010-03-19', '2010-03-19', 1.00)
insert into @Holidays values(15, '2010-05-05', '2010-05-05', 1.00)
insert into @Holidays values(15, '2010-05-06', '2010-05-06', 1.00)
insert into @Holidays values(15, '2010-05-07', '2010-05-07', 1.00)
-- Here's a date range that came from planned holidays and therefore should not be merged
insert into @Holidays values(15, '2010-06-05', '2010-05-09', 5.00)
insert into @Holidays values(15, '2010-06-10', '2010-05-10', 1.00)
insert into @Holidays values(15, '2010-06-11', '2010-05-11', 1.00)

produces this result :

PayrollNumber DateFrom                DateTo                  DayCount
------------- ----------------------- ----------------------- -----------
14            2010-02-08 00:00:00.000 2010-02-12 00:00:00.000 5
14            2010-03-14 00:00:00.000 2010-03-17 00:00:00.000 5
14            2010-03-18 00:00:00.000 2010-03-19 00:00:00.000 2
15            2010-05-05 00:00:00.000 2010-05-07 00:00:00.000 3
15            2010-06-05 00:00:00.000 2010-05-09 00:00:00.000 5
15            2010-06-10 00:00:00.000 2010-05-11 00:00:00.000 2

Where the date range 2010-03-15 to 2010-03-17 has been correctly merge with 2010-03-14 but not with 2010-03-18.

Persisting Object Graphs/Collections to one or more SQL Server Tables in one Stored Procedure Call

One of the methods I’ve used on a number of occasions to solve performance issues in an application is to reduce a number of calls to the database into one call. Using the classic order header and order detail example, rather than looping through each order and call the database for each item, I would prefer to pass the entire set of order detail to the database and persist it to the database with one T-Sql call. Taking it a step further, and possibly unnecessarily in this example, it would also be possible to pass the order header in the same call.

Now before I continue with the solution below which works in SQL Server 2005 and above, there is a new feature introduced in SQL Server 2008 allowing for table valued parameters to be added to stored procedures which accomplish similar solution.

A Solution

In my particular data object I have a method, SaveDetail, that is passed a graph of objects which is then converted into XML.

   public static void SaveDetail(ScheduleDto schedule)
   {
       StoredProcedureDAL.LM_Schedule_SaveXML(ConvertToXml(schedule).ToString(SaveOptions.DisableFormatting));
   }

   private static XElement ConvertToXml(ScheduleDto schedule)
   {
       return
           new XElement("Schedule",
               new XAttribute("Date", schedule.Date),
               from jobArea in schedule.JobAreas
               from employee in jobArea.Employees
               select new XElement("Emp",
                   new XAttribute("No", employee.Number),
                   employee.Schedule.Select((item, index) =>
                       new XElement("W", // WorkTpe
                           new XAttribute("N", item),
                           new XAttribute("T", ConvertFloatTo24HrTimeString(index * schedule.StepSize)))
                       )
               )
           );
   }

The call to StoredProcedureDAL.LM_Schedule_SaveXML is not shown here and is automatically generated from the stored procedure definitions in the database. The ScheduleDto is a hierarchical object model which, in part, is converted to xml by the ConvertToXml method using Linq to Xml. Notice that a lot of the element names are very short and attributes are used heavily since the xml will be much shorter and it’s important to reduce the size of the data being posted to the database server.

The stored procedure that is called above is shown below although it isn’t complete but highlights the details required to ‘shred’ the xml into a recordset. The key to ‘shredding’ the data is taking the xml typed parameter and using the nodes method to return relational data. The parameter to nodes is XQuery and it’s best to drill down to the deepest elements in the XML since it’s easier to use parent paths to access data higher in the hierarchy… and that neatly leads on to accessing the data which is shown in the select below. The T(c) syntax provisions Table and Columns names which are then used in the Select. The T.c.value method accepts another XQuery (and here I’ve used parent paths, attributes and indexes) and the type to convert the xml into. I used to join directly to @xml.nodes() but as the comment in the code indicates, I’ve found it’s quicker to use an inner query or, if you prefer, a CTE – I’m afraid I’ve not tried to get to the bottom of why this is so slow.

CREATE PROCEDURE [dbo].[LM_Schedule_SaveXML]
    @xml XML
AS
begin
    insert into [LM_ScheduleDetails]( [Date], [EmployeeID], [Time], [WorkType] )
    select
        @date,
        e.EmployeeId,
        x.WorkTypeTime,
        spwt.Value
    from (
        -- Fetch the data from the xml in the inner select. This seems to
        -- be significantly quicker than trying to do everything in one select.    
        select
            T.c.value('../@No[1]', 'int') as EmployeeNumber,
            T.c.value('@T[1]', 'datetime') as WorkTypeTime,
            T.c.value('@N[1]', 'nvarchar(1)') as WorkTypeName
        from @xml.nodes('/Schedule/Emp/W') as T(c)
    ) x
    join LM_Employees e on e.PayrollNo = x.EmployeeNumber
    join LM_ShiftPatternsWorkTypes spwt on spwt.Name = x.WorkTypeName
end
GO

And that’s it. In the particular scenario I used this very effectively to provide near instant saving of data for something that would have taken a lot, lot longer to save with multiple calls. I like this solution too since the technology needed to facilitate this is simple to understand and very compact and therefore quick and easy to maintain.

Calling SQL Server Stored Procedure Synonym fails with parameter error

I’ve just tried to call a stored procedure in another database via a synonym but received an error ‘Procedure or function ‘LM_Employees_HasNewStoreTempsInStore’ expect parameter ‘@OrganisationLevelValueID’, which was not supplied.

Here’s how I defined the synonym and called it :

CREATE SYNONYM [dbo].[LM_Employees_HasNewStoreTempsInStore] 
FOR [PHENDRY].[LabourManager_DEV].[dbo].[LM_Employees_HasNewStoreTempsInStore]
GO

exec [dbo].[LM_Employees_HasNewStoreTempsInStore] 1119
GO

As you can see, I’m definitely passing the parameter!!

The problem appears to be how the synonym is defined when the two database exist on the same server. The machine name, here ‘[PHENDRY].’ can be removed completely and everything works fine!

Numbering XML Elements in T-SQL

I had a problem where I wanted to convert a chunk of XML passed to a T-SQL stored procedure into a relational model but at the same time number the elements. I used row_number() but that required an order by clause and I actually wanted to maintain the original order of the elements and not apply any other ordering. Initially I tried adding order by 1 to try and fool it into apply ‘no order’ but that didn’t work, instead I replaced the 1 with rand(). Because rand() is not recomputed for each row it effectively orders by a single number which is the equivalent of order by 1. Here’s the code :

declare @xml xml
set @xml = '
<root>
    <item>first</item>
    <item>second</item>
    <item>third</item>
    <item>fourth</item>
</root>'

select 
    T.c.value('.', 'nvarchar(25)') as ItemName,
    row_number() over (order by rand()) as ItemIndex
from @xml.nodes('/root/item') as T(c)

Which returns :

image

The use of the rand() function is a little strange but the code below should highlight what it’s doing :

select rand(), newid(), checksum(newid()) 
from master..spt_values 
where [type] = 'P'

Which returns :

image

As you can see the rand() returns the same number for every row since it is computed at the start of the query execution. However, newid() is recomputed for every row and by wrapping it with checksum() you can also use it to calculate random numbers for every row.

Dynamic Order By in T-SQL

I’ve been looking at some code that looked like this:

CREATE PROCEDURE [dbo]., 
    @DraftStatus int,
    @Surname nvarchar(50),
    @PayrollNo nvarchar(50),
    @StartRow int,
    @PageSize int,
    @OrderBy varchar(50),
    @OrgLevelValueID int = null
AS
BEGIN
    SET NOCOUNT ON;
    DECLARE @sql nvarchar(4000)
    set @sql = 'SELECT * FROM '+
            '(SELECT *, ROW_NUMBER() OVER (ORDER BY ' + @OrderBy + ') AS RowNum '+
            'FROM LMv_EmployeeDrafts '+
            'WHERE DraftStatus = ' + CONVERT(nvarchar(10), @DraftStatus) +
                ' AND Surname LIKE ''%'+@Surname+'%'' '+
                ' AND CONVERT(nvarchar(10), PayrollNo) LIKE ''%'+@PayrollNo+'%'') AS tmp '+
        'WHERE RowNum-1 BETWEEN ' + CONVERT(nvarchar(10), @StartRow)+
            ' AND (' + CONVERT(nvarchar(10), @StartRow) + ' + ' + CONVERT(nvarchar(10), @PageSize) + ') - 1'
        --IF NOT(@OrgLevelValueID IS NULL) SET @sql = @sql + ' AND OrganisationLevelValueID = ' + CONVERT(varchar(max), @OrgLevelValueID)
        --IF NOT (@OrgLevelValueID IS NULL) SET @sql = @sql + ' AND (' + CONVERT(varchar(max), @OrgLevelValueID) + ' IN ' +
        SET @sql = @sql + ' AND (' + CONVERT(varchar(max), @OrgLevelValueID) + ' IN ' +
        '(SELECT OrgLvlID FROM RMR_GetPreOrgLvlValues(OrganisationLevelValueID)))'

    EXEC sp_executesql @sql
    PRINT @sql
END

Eeeek… not the most user-friendly (and I’ve certainly written dynamic SQL like this before) but it was written like this to allow for different page ordering. There’s also a bug in this code where the ROW_NUMBER() needs to be conditionally partitioned before being filtered.

The solution isn’t necessarily the prettiest either. However, it’s now not a sql string being built up so the execution planner can pre-compile it and it is easier to read! The one disadvantage is if a new order by is introduced the SQL needs to be updated – however, I think this is a reasonable exchange:

SELECT * FROM (
    SELECT *, ROW_NUMBER() OVER (
        PARTITION BY
            CASE WHEN @OrgLevelValueID is not null THEN OrganisationLevelValueID END
        ORDER BY 
            CASE WHEN @OrderBy = 'DateAdded Desc' THEN DateAdded END DESC,
            CASE WHEN @OrderBy = 'Title Desc' THEN Title END DESC,
            CASE WHEN @OrderBy = 'Forename Desc' THEN Forename END DESC,
            CASE WHEN @OrderBy = 'Surname Desc' THEN Surname END DESC,
            CASE WHEN @OrderBy = 'PayrollNo Desc' THEN PayrollNo END DESC,
            CASE WHEN @OrderBy = 'DateAdded' THEN DateAdded END,
            CASE WHEN @OrderBy = 'Title' THEN Title END,
            CASE WHEN @OrderBy = 'Forename' THEN Forename END,
            CASE WHEN @OrderBy = 'Surname' THEN Surname END,
            CASE WHEN @OrderBy = 'PayrollNo' THEN PayrollNo END        
        ) AS RowNum  
    FROM LMv_EmployeeDrafts  
    WHERE 
        DraftStatus = @DraftStatus
        AND Surname LIKE '%' + @Surname + '%'
        AND PayrollNo LIKE '%' + @PayrollNo + '%' 
) AS tmp  
WHERE 
    RowNum - 1 BETWEEN @StartRow AND @StartRow + @PageSize - 1 
    AND (@OrgLevelValueID is null OR @OrgLevelValueID 
        IN (SELECT OrgLvlID FROM RMR_GetPreOrgLvlValues(OrganisationLevelValueID)))

A couple of final caveats:

  • I’ve not checked the final query in the execution planner to make sure it’s efficient!
  • The surname and payroll filtering are being performed using a prefixed ‘%’ which means a table scan – pretty inefficient!

Generating a list of numbers in T-SQL

I wanted to create a list of numbers so I could cross join them to another set of data to generate some test data quickly. There are ways to generate numbers using a CTE (Common Table Expression) or if you’re in a real hurry you can make us of a fairly undocumented table called master..spt_values. Seeing as this table isn’t well documented don’t go using it for production code since it might change but for quick one-off tasks it seems fine.

The following SQL creates a list of numbers from 0 to 2047 :

select [number]
from master..spt_values
where type = 'P'

Importing from Excel into SQL Server

I was going through a mountain of pain trying to get an import working from an Excel spreadsheet working into SQL Server 2005 but I finally managed it and here’s how…

Here’s the SQL I’m using (slightly abridged!) :

set transaction isolation level read committed

declare @sql nvarchar(1024)
set @sql = 'select * into ##Temp from opendatasource(''Microsoft.Jet.OLEDB.4.0'', ''Extended Properties=Excel 8.0;Data Source=' + @ExcelFileToImport + ''')...' + @ExcelWorkSheet
exec sp_executesql @sql

Points to note :

  • I’m using dynamic sql because the opendatasource() doesn’t like being passed variables… so be sure to validate the contents of @ExcelFileToImport and @ExcelWorkSheet.
  • Because I was forced to use dynamic sql I had to use a global temporary table. I wanted to load the data into a temporary table so that the columns were generated on-the-fly and I could check the correct columns were supplied by examining tempdb.sys.columns. It therefore had to be a global temporary because a standard one would have been thrown away after sp_executesql() had finished! For my situation the global temporary table is ok… but you’ll have to consider your own situation and whether concurrency allows.
  • I kept getting the error below on the subsequent re-tries of uploading but worked around this by adding ‘set transaction isolation level read committed’ in the sproc.

    The requested operation could not be performed because OLE DB provider "Microsoft.Jet.OLEDB.4.0" for linked server "(null)" does not support the required transaction interface.

I was having trouble executing within a transaction but with the changes made above the sql could be called with the following :

using (TransactionScope scope = new TransactionScope())
{
    UploadExcelIntoDatabase(importTicket, importFilename);
    scope.Complete();
}

Conditional Aggregate Totals in SQL

I’ve been working on a SQL Server Reporting Services report which had to display several counts on each row. Each row represented a project and each project could have several tasks that were either ordinary tasks or milestones and could also be escalated. The report need to show something like this :

Project Name Milestone Counts Task Count Escalated Task Count
1st Week 2nd Week 3rd Week >=4th Week Total 1st Week 2nd Week 3rd Week >=4th Week Total

As you can see a task initially has to break down into two major colours of milestone and ordinary task counts but then is broken down into counts in the 1st, 2nd, 3rd and greater than 4th week totals including the total itself.

Initially this seemed tricky when thinking in terms of using count() but there’s an incredibly simple technique that solves this but which isn’t immediately obvious. The Sum() aggregate function can contain an expression and in this example that expression could compare the task date against late week dates and return either a 1 or 0 :

select ProjectName,
   sum(case when IsMilestone = 1 
      and TaskDate < @LateDate 
      and Taskdate >@OneWeekLateDate               
      then 1 else 0 end) as Week1LateMilestoneCount
from Project
group by ProjectName

I’ve only included one example there but I think it’s enough to get the idea.