Insufficient data from Andrew Fryer

The place where I page to when my brain is full up of stuff about the Microsoft platform

I didn’t really get the SQL Server 2008 CLR until today.

By a strange quirk of fate I have not really had to do too much with CLR functions in SQL Server until I started hanging out on Twitter.  Then I saw this question..

“How do I remove non-numeric characters from a VARCHAR?”

Easy I thought until I realised that T-SQL isn’t really that good with string functions. Anyway I have cracked it but it isn’t going to be fast against a huge table, because I have a loop inside a scalar function ..

ALTER FUNCTION production.ufnChartoNumeric
@String nvarchar(50)
    DECLARE @CleanNumber int   — the final product
    DECLARE @CleanString nvarchar(50) =” — the inputstring with only numbers in
    DECLARE @StringLength int = len(@String)+1 — the length of the string +1
    DECLARE @Loop int = 1 — Loop counter
    — If the string is already a number we can assign it to the result and finish
    IF isNumeric(@String) = 1
        SET @CleanString = @String
    — If itsn’t a number we need to loop through the string a character at a time
    — and test if each character is a number.
    — if it is append it to our clean string if it isn’t move on

        WHILE @Loop < @StringLength  — loop for each character in the string
        IF isNumeric(Substring(@String,@Loop,1)) = 1 SET @CleanString += Substring(@String,@Loop,1)
        SET @Loop += 1 
   — if there’s a decimal point and you want to keep it, (but which one if there are 2!)
    — you’ll need more code to detect it
    — and maybe return a float, but the first test will keep it
    — so for this example I want to eliminate it.

    SET @CleanString = Replace(@CleanString, ‘-‘,”)
    SET @CleanString = Replace(@CleanString, ‘+’,”)
    SET @CleanString = Replace(@CleanString,’.’,”)
    SET @CleanNumber = Cast(@CleanString as int)
    RETURN @CleanNumber

As you can see I also had to deal with decimal points and  plus and minus signs as these could occur many times in a string. BTW there are little bits of T-SQL 2008 in here such as ‘+=’ and DECLARE .. = .. which you’ll need to change for older versions.

It does at least work, (I tested it against a few random string and against production.product.productnumber in AdventureWorks), but the only way I can think of to improve the speed is to rewrite it as a CLR which is much better at handling in row string manipulation. 

But I leave that for another day as I am not sure what this is exactly needed for, and often something that works now is better than something fast tomorrow.