Database Programming: Denis’ Prime Number Challenge


Denis the SQL Menace‘ read Wednesday night’s post on “the million GUID march” and offered up a challenge of his own:



How about the next challenge is to return all 78498 prime numbers between 1 and 1000000?


What an interesting idea!  Crazy, in a sense (I’ll get to that in a second), like most interesting ideas, but it’s an idea I’m just crazy enough to explore for a couple of evenings.  In the process of that exploration I’ve so far learned about:



  • calculated columns, and when to persist them (in general, apply the PERSISTED directive to a calculated column only if you need to create an index on it)
  • INSTEAD OF INSERT triggers, a cool new SQL Server 2005 feature
  • I re-used Shaun’s Common Table Expression, which I still think is groovy.  I’m using it even though it’s not the fastest syntax available because I want to keep this code small from a physical resource/transaction log perspective, and I believe Shuan’s code best accomplishes that goal.
  • I wrote the first GROUP BY.. HAVING I’ve written in about three years.

Do NOT Try This At Home


Before we even start, I’d be remiss if I didn’t strongly state that SQL Server is exactly the wrong tool to use if you want to solve this issue in the real world.  I’m not a C# guy, but several reliable sources tell me that there’s a prime number function in most nGL languages.  Even if there’s not, all of the database access in the test is simply about performing the calculations necessary to determine primacy as quickly as possible.


This very easily could and absolutely should be written independent of the database tier.  This is a purely academic exercise (at least until I get two orders of magnitude performance better than I’ve got now!); to implement this code in its current state in any production environment would be highly irresponsible.


I’m also not a mathemetician, so it’s possible that there’s a far more elegant method that I could employ.  If you’ve got one, Denis and I would love to hear it.


Finally, in the words of David Letterman, “Please remember, ladies and gentlemen: this is only an exhibition.  No wagering.”


The results I’m sharing should be considered preliminary in that I’ve yet to run the job to completion (that’s a polite way of saying, “it’s still running.”).  This will take more than a day to run on my laptop (HP nc8430; 2gB RAM).


The performance scale is actually a little distressing (the further it gets into the problem, the slower it goes).  I’ll keep tweaking it (and I’ve got an alternative to the cursor I’m working on as well), but I wanted to get this up on the blog, because I’ve seen enough of the results to know that it will be correct when (if J) it finishes.  I reserve the right to kill the run (seven hours in as of this writing) and restart.


On With The Show


We’re going to apply the lessons learned in the million GUID march, which taught us that we should avoid procedural code wherever possible.  I’ve already gotten this down from where it would’ve probably taken ten days to run to the current 1-2 days (estimated; I’ll update here when the results are complete).  


At any rate, the first thing we’ve got to do is build a database.  The CREATE DATABASE statement that SQLMS scripted out for me is pretty verbose; the full text is in a comment to this post here.  Basically what we’ve got here is a 3gB database with a 5gB log.  If you’ve got multiple spindles, I’d use them were I you:


CREATE DATABASE [PrimeNumberTest] ON  PRIMARY (


    NAME        = N‘PrimeNumberTest’,


    FILENAME    = N‘C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\DATA\PrimeNumberTest.mdf’,


    SIZE        = 3GB,


    MAXSIZE     = UNLIMITED,


    FILEGROWTH  = 1GB )


 LOG ON (


    NAME        = N‘PrimeNumberTest_log’,


    FILENAME    = N‘C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\DATA\PrimeNumberTest_log.ldf’,


    SIZE        = 5GB,


    MAXSIZE     = 2048GB,


    FILEGROWTH  = 1GB )


Once the database is created, we’re ready to do the work.  This is a pretty long script; I’ve interspersed comments throughout the indicate what my thought process is.


This is a multiple-batch solution which might lend itself to folding into a UDF to determine the primacy of a particular number (my next project, and probably of more actual utility than the current actual exercise).


I made several attempts to duplicatie Shaun’s approach, but because we have a different task here — we have to essentially do math on half of a 1M x 1M Cartesian product, rather than simply generate 1M rows to drive an INSERT — memory and disk space simply became an issue.  So, while we can be set-oriented in a lot of this work, due to hardware limitations I’m going to have to get procedural at one point in the process (if any of you reading this have big enough iron to test the Cartesian product method, please let me know!).


Without further ado, here’s the cursor-based script.  It’s extensively commented; the comments should serve as an extension of the narrative of this post.


It’s really late and I’m going to tweak the cursor alternative a little more (it’s looking like a dry well in its current state, but you never know..) and then let whichever alternative strikes me as currently more performant run overnight.  I’ll report back on my progress on Monday at the latest.


Thanks for your interest!


     -wp


The script:


USE PrimeNumberTest


GO


 


  log a status message


SELECT GETDATE() AS [Starting..]


SET NOCOUNT ON


GO


 


  remove the three tables we’re going to use


  if they’re present from a previous run


IF EXISTS (SELECT 1 FROM sys.objects WHERE name = ‘SetBuilderDecimal’)


    DROP TABLE SetBuilderDecimal


GO


IF EXISTS (SELECT 1 FROM sys.objects WHERE name = ‘Candidate’)


    DROP TABLE Candidate


GO


IF EXISTS (SELECT 1 FROM sys.objects WHERE NAME = ‘ComparisonMatrix’)


    DROP TABLE ComparisonMatrix


GO


 


  create the table to hold the set of values between 1 and 1M


CREATE TABLE SetBuilderDecimal (


    Id DECIMAL(12,2) PRIMARY KEY


)


go


 


  create the table to hold the set of values to test from primacy


  (we are going to remove some low-hanging fruit for performance reasons,


      so we need a distinct table for this purpose) 


CREATE TABLE Candidate (


    CandidateValue  DECIMAL(12,2)


)   


GO


 


  create the table where the calculations will be performed


      Base is the number we’re testing for primacy


      Factor is the number we’re dividing it by


      Quotient is the result of the division


      IsQuotientInteger is a bit flag indicating whether the Quotient is an integer


CREATE TABLE ComparisonMatrix (


    Base DECIMAL(12,2),


    Factor DECIMAL(12,2),


    Quotient AS Base/Factor,


    IsQuotientInteger AS CASE


        WHEN Base/Factor = CAST(Base/Factor AS int) THEN CAST(1 AS int)


        ELSE CAST(0 AS int)


    END           PERSISTED


)   


 


  build indexes on the ComparisonMatrix table to improve perfomance


CREATE UNIQUE CLUSTERED INDEX ComparisonMatrix_PK ON ComparisonMatrix(Base, Factor)


CREATE INDEX ComparisonMatrix_ID1 ON ComparisonMatrix(IsQuotientInteger)


GO


 


    build an INSTEAD OF INSERT trigger so we only save the integer factors


      THIS IS PURELY A PERFORMANCE ISSUE;


      as discussed below 


  this syntax is new in SQL Server 2005; the BOL reference may be found here:


  ms-help://MS.SQLCC.v9/MS.SQLSVR.v9.en/udb9/html/0001f651-fcfa-401b-baf2-0ec6b1671116.htm



  A BRIEF DISCUSSION:


  what we’re doing here is tuning our INSERT statement for the table CompressionMatrix below to ignore


      the transactions which do not result in an integer quotient.  This will make the commiting of


      these INSERT transactions orders of magnitude faster, since only a small fraction of the volume


      will result in integer quotients


CREATE TRIGGER IOI_ComparisonMatrix


ON    ComparisonMatrix


INSTEAD OF INSERT


AS


INSERT      ComparisonMatrix (


            Base,


            Factor


)


SELECT      Base,


            Factor


FROM  inserted


WHERE IsQuotientInteger = 1


GO


 


  log a status message


SELECT GETDATE() AS [Tables Created..]


GO


 


  using Shaun’s syntax (because it doesn’t have a database footprint),


  populate SetBuilder


;WITH digits(i) AS (


    SELECT 1.0 AS I UNION ALL SELECT 2.0 UNION ALL SELECT 3.0 UNION ALL


    SELECT 4.0 UNION ALL SELECT 5.0 UNION ALL SELECT 6.0 UNION ALL


    SELECT 7.0 UNION ALL SELECT 8.0 UNION ALL SELECT 9.0 UNION ALL


    SELECT 0.0


),


      generate 1M rows each with a unique row number, i


sequence(i) AS (


    SELECT d1.i + (10*d2.i) + (100*d3.i) + (1000*d4.i) + (10000*d5.i) + (100000*d6.i)


      FROM digits AS d1,


           digits AS d2,


           digits AS d3,


           digits AS d4,


           digits AS d5,


           digits AS d6


)


INSERT  SetBuilderDecimal (Id)


SELECT  i+1 — CTE gives us 0-999999; we want 1-1000000


FROM    sequence


 


  populate Candidate


INSERT  Candidate (CandidateValue)


SELECT  Id


FROM    SetBuilderDecimal


GO


 


  now delete the low-hanging fruit from Candidate


    WE ARE ONLY DONG THIS FOR PERFORMANCE; the algorhytm can handle them


    NOTE that these fifteen DELETE statements, which encompass the first 15 prime numbers and


      thier multiples, removes 86.13% of the candidate values


DELETE  Candidate WHERE CandidateValue/2 = CAST(CandidateValue/2 AS int) AND CandidateValue > 2


DELETE  Candidate WHERE CandidateValue/3 = CAST(CandidateValue/3 AS int) AND CandidateValue > 3


DELETE  Candidate WHERE CandidateValue/5 = CAST(CandidateValue/5 AS int) AND CandidateValue > 5


DELETE  Candidate WHERE CandidateValue/7 = CAST(CandidateValue/7 AS int) AND CandidateValue > 7


DELETE  Candidate WHERE CandidateValue/11 = CAST(CandidateValue/11 AS int) AND CandidateValue > 11


DELETE  Candidate WHERE CandidateValue/13 = CAST(CandidateValue/13 AS int) AND CandidateValue > 13


DELETE  Candidate WHERE CandidateValue/17 = CAST(CandidateValue/17 AS int) AND CandidateValue > 17


DELETE  Candidate WHERE CandidateValue/19 = CAST(CandidateValue/19 AS int) AND CandidateValue > 19


DELETE  Candidate WHERE CandidateValue/23 = CAST(CandidateValue/23 AS int) AND CandidateValue > 23


DELETE  Candidate WHERE CandidateValue/29 = CAST(CandidateValue/29 AS int) AND CandidateValue > 29


DELETE  Candidate WHERE CandidateValue/31 = CAST(CandidateValue/31 AS int) AND CandidateValue > 31


DELETE  Candidate WHERE CandidateValue/37 = CAST(CandidateValue/37 AS int) AND CandidateValue > 37


DELETE  Candidate WHERE CandidateValue/41 = CAST(CandidateValue/41 AS int) AND CandidateValue > 41


DELETE  Candidate WHERE CandidateValue/43 = CAST(CandidateValue/43 AS int) AND CandidateValue > 43


DELETE  Candidate WHERE CandidateValue/47 = CAST(CandidateValue/47 AS int) AND CandidateValue > 47


 


  uncomment this SELECT statement to confirm that we have cut 1M candidates down to 138,760


  SELECT COUNT(*) FROM Candidate


 


GO


  log a status message


SELECT GETDATE() AS [Tables Populated..]


GO


 


  keep some house for what comes..


DECLARE @CandidateValue DECIMAL(12,2),


        @Counter DECIMAL (12,2),


        @IntCandidateValue INT


 


  initalize a counter


SET     @Counter = 1


 


  declare a cursor to traverse the Candidate table


DECLARE TraverseCandidateTable CURSOR FAST_FORWARD READ_ONLY FOR


SELECT  CandidateValue,


        CAST(CandidateValue AS INT)


FROM    Candidate


ORDER BY CandidateValue


 


  open the cursor and get the first record


OPEN    TraverseCandidateTable


 


FETCH NEXT


FROM  TraverseCandidateTable


INTO  @CandidateValue,


      @IntCandidateValue


 


  log a status message


SELECT GETDATE() AS [Cursor Loop Starts..]


 


  open a transaction


      (careful transaction control is necessary to avoid rampant growth of the transaction logs)


BEGIN TRAN


 


  start a loop which will run as long as the cursor finds a record


WHILE @@FETCH_STATUS = 0


BEGIN


 


  insert the full range of possible factors for the candidate value


      into the comparison matrix


      (use the SetBuilderDecimal table to aid this work)


    INSERT  ComparisonMatrix (


        Base,


        Factor


    )


    SELECT  @CandidateValue,


            Id


    FROM    SetBuilderDecimal


    WHERE   Id <= @CandidateValue


 


  log a database commit anf start a new transaction


  if it’s been 200 transactions since we did so


    IF  @Counter/200.0 = CAST(@Counter/200 AS int)


    BEGIN


        COMMIT TRAN


 


            CHECKPOINT


 


  log a status message if it’s been 1000 transactions since we did so


        IF  @Counter/1000.0 = CAST(@Counter/1000 AS int)


            SELECT GETDATE() AS [One Thousand More..], @Counter AS [Current Count..], @CandidateValue AS [Current Candidate..]


 


        BEGIN TRAN


    END


 


  get the next candidate value


    FETCH NEXT


    FROM  TraverseCandidateTable


    INTO  @CandidateValue,


          @IntCandidateValue


 


  increment the final transaction


    SET   @Counter = @Counter + 1


END


 


  we’re out of the loop;


  we still need to commit the final transaction


COMMIT TRAN


 


  clost and deallocate the cursor


CLOSE TraverseCandidateTable


 


DEALLOCATE TraverseCandidateTable


 


  log a status message


SELECT GETDATE() AS [Cursor Loop Complete..]


 


  generate the list of prime numbers


      if the SUM(IsQuotientInteger)=2 (where IsQuotientInteger=1 when a particular member of the set


      is a precise integer factor of the candidate value), then that candidate is a prime number,


      because IsQuotientInteger will equal 1 for 1 and the candidate itself.  All candidates with


      additional factors will thus be filtered out by the HAVING cluase.


SELECT  Base


FROM    ComparisonMatrix


GROUP BY Base


HAVING SUM(IsQuotientInteger) = 2


ORDER BY Base
GO

Comments (11)

  1. Anonymous says:

    Well, folks, there’s a new sheriff in town..&amp;nbsp; this is a continuation of a conversation here and…

  2. Anonymous says:

    No sane person would even consider using SQL Server to construct a list of prime numbers. So just to…

  3. Anonymous says:

    One more PDA-incompatible post: lessons learned from "discredited" prime number syntax

  4. Anonymous says:

    Our blogging site is going to go dark in an hour or so for a software upgrade, so I won’t be able to…

  5. Anonymous says:

    No less a figure than Louis Davidson has weighed in on the Prime Number Challenge . I left a comment

  6. Anonymous says:

    Denis’ Prime Number Challenge just won’t die.&amp;nbsp; I think this topic has spurred more dialog than any&amp;nbsp;other…

  7. Anonymous says:

    One more PDA-incompatible post: lessons learned from &quot;discredited&quot; prime number syntax

  8. Ward Pond says:

    CREATE DATABASE [PrimeNumberTest] ON  PRIMARY (

       NAME        = N’PrimeNumberTest’,

       FILENAME    = N’C:Program FilesMicrosoft SQL ServerMSSQL.1MSSQLDATAPrimeNumberTest.mdf’,

       SIZE        = 3GB,

       MAXSIZE     = UNLIMITED,

       FILEGROWTH  = 1GB )

    LOG ON (

       NAME        = N’PrimeNumberTest_log’,

       FILENAME    = N’C:Program FilesMicrosoft SQL ServerMSSQL.1MSSQLDATAPrimeNumberTest_log.ldf’,

       SIZE        = 5GB,

       MAXSIZE     = 2048GB,

       FILEGROWTH  = 1GB )

    GO

    EXEC dbo.sp_dbcmptlevel @dbname=N’PrimeNumberTest’, @new_cmptlevel=90

    GO

    IF (1 = FULLTEXTSERVICEPROPERTY(‘IsFullTextInstalled’))

    BEGIN

       EXEC [PrimeNumberTest].[dbo].[sp_fulltext_database] @action = ‘disable’

    END

    GO

    ALTER DATABASE [PrimeNumberTest] SET ANSI_NULL_DEFAULT OFF

    GO

    ALTER DATABASE [PrimeNumberTest] SET ANSI_NULLS OFF

    GO

    ALTER DATABASE [PrimeNumberTest] SET ANSI_PADDING OFF

    GO

    ALTER DATABASE [PrimeNumberTest] SET ANSI_WARNINGS OFF

    GO

    ALTER DATABASE [PrimeNumberTest] SET ARITHABORT OFF

    GO

    ALTER DATABASE [PrimeNumberTest] SET AUTO_CLOSE OFF

    GO

    ALTER DATABASE [PrimeNumberTest] SET AUTO_CREATE_STATISTICS ON

    GO

    ALTER DATABASE [PrimeNumberTest] SET AUTO_SHRINK OFF

    GO

    ALTER DATABASE [PrimeNumberTest] SET AUTO_UPDATE_STATISTICS ON

    GO

    ALTER DATABASE [PrimeNumberTest] SET CURSOR_CLOSE_ON_COMMIT OFF

    GO

    ALTER DATABASE [PrimeNumberTest] SET CURSOR_DEFAULT  GLOBAL

    GO

    ALTER DATABASE [PrimeNumberTest] SET CONCAT_NULL_YIELDS_NULL OFF

    GO

    ALTER DATABASE [PrimeNumberTest] SET NUMERIC_ROUNDABORT OFF

    GO

    ALTER DATABASE [PrimeNumberTest] SET QUOTED_IDENTIFIER OFF

    GO

    ALTER DATABASE [PrimeNumberTest] SET RECURSIVE_TRIGGERS OFF

    GO

    ALTER DATABASE [PrimeNumberTest] SET  ENABLE_BROKER

    GO

    ALTER DATABASE [PrimeNumberTest] SET AUTO_UPDATE_STATISTICS_ASYNC OFF

    GO

    ALTER DATABASE [PrimeNumberTest] SET DATE_CORRELATION_OPTIMIZATION OFF

    GO

    ALTER DATABASE [PrimeNumberTest] SET TRUSTWORTHY OFF

    GO

    ALTER DATABASE [PrimeNumberTest] SET ALLOW_SNAPSHOT_ISOLATION OFF

    GO

    ALTER DATABASE [PrimeNumberTest] SET PARAMETERIZATION SIMPLE

    GO

    ALTER DATABASE [PrimeNumberTest] SET  READ_WRITE

    GO

    ALTER DATABASE [PrimeNumberTest] SET RECOVERY FULL

    GO

    ALTER DATABASE [PrimeNumberTest] SET  MULTI_USER

    GO

    ALTER DATABASE [PrimeNumberTest] SET PAGE_VERIFY CHECKSUM  

    GO

    ALTER DATABASE [PrimeNumberTest] SET DB_CHAINING OFF

    GO

  9. Denis The SQL Menace says:

    Well here is a Sieve of Eratosthenes version, runs in 6 seconds on 2nd run, first run will be around 16 seconds on my laptop (SQL 2000)

    SET NOCOUNT ON

    DECLARE @i INT

    — Create a 10-digit table

    DECLARE @D TABLE (N INT)

    INSERT INTO @D (N)

    SELECT 0 UNION ALL

    SELECT 1 UNION ALL

    SELECT 2 UNION ALL

    SELECT 3 UNION ALL

    SELECT 4

    INSERT INTO @D (N)

    SELECT N+5 FROM @D

    — build a small sieve table between 2 and 1000

    DECLARE @T TABLE (N INT)

    INSERT INTO @T( N )

    SELECT 1+A.N+10*(B.N+10*C.N)

    FROM @D A, @D B, @D C

    DELETE FROM @T WHERE N = 1

    SET @I = 2

    WHILE @I <= SQRT(1000) BEGIN     DELETE FROM @T WHERE N % @I = 0 AND N > @I

       SET @I = @I + 1

    END

    — Create large table between 1001 and 1000000

    SELECT A+10*(B+10*(C+10*(D+10*(E+ 10*F)))) AS N

    INTO #P

    FROM

    (    SELECT A.N AS A, B.N AS B, C.N AS C, D.N AS D, E.N AS E, F.N AS F

       FROM @D A, @D B, @D C, @D D, @D E, @D F

       WHERE A.N in (1, 3, 7, 9)  — Not divisible by 2 or 5

    ) bNum

    WHERE (A+B+C+D+E+F) % 3 <> 0 — Or 3

       AND (A+3*B+2*C-D-3*E-2*F) % 7 <> 0 — Or 7

       AND (B-A+D-C+F-E) % 11 <> 0 — Or 11

       and D|E|F <> 0 — Don’t include the first 1000 numbers,

    –we already have these in the small sieve table

    UNION ALL SELECT 1000000

    — sieve the big table with smaller one

    SELECT @I = 2

    WHILE @I IS NOT NULL

    BEGIN

       DELETE FROM #P WHERE N% @I = 0

       SELECT @I = MIN(N) FROM @T WHERE N > @I

    END

    — add primes up to 1000

    INSERT INTO #P SELECT N FROM @T

    — Here are the results

    –78498 rows

    SELECT  * FROM #P ORDER BY 1

    drop table #P

    go

  10. Denis The SQL Menace says:

    Let me know how many seconds this one takes?

    SET NOCOUNT ON

    DECLARE @i INT

    — Create a 10-digit table

    DECLARE @D TABLE (N INT)

    INSERT INTO @D (N)

    SELECT 0 UNION ALL

    SELECT 1 UNION ALL

    SELECT 2 UNION ALL

    SELECT 3 UNION ALL

    SELECT 4

    INSERT INTO @D (N)

    SELECT N+5 FROM @D

    — build a small sieve table between 2 and 1000

    DECLARE @T TABLE (N INT)

    INSERT INTO @T( N )

    SELECT 1+A.N+10*(B.N+10*C.N)

    FROM @D A, @D B, @D C

    DELETE FROM @T WHERE N = 1

    SET @I = 2

    WHILE @I <= SQRT(1000) BEGIN     DELETE FROM @T WHERE N % @I = 0 AND N > @I

       SET @I = @I + 1

    END

    — Create large table between 1001 and 1000000

    SELECT A+10*(B+10*(C+10*(D+10*(E+ 10*F)))) AS N

    INTO #P

    FROM

    (    SELECT A.N AS A, B.N AS B, C.N AS C, D.N AS D, E.N AS E, F.N AS F

       FROM @D A, @D B, @D C, @D D, @D E, @D F

       WHERE A.N in (1, 3, 7, 9)  — Not divisible by 2 or 5

    ) blah

    WHERE (A+B+C+D+E+F) % 3 <> 0 — Or 3

       AND (A+3*B+2*C-D-3*E-2*F) % 7 <> 0 — Or 7

       AND (B-A+D-C+F-E) % 11 <> 0 — Or 11

       AND D|E|F <> 0 — Don’t include the first 1000 numbers,

    –we already have these in the small sieve table

    UNION ALL SELECT 1000000

    — sieve the big table with smaller one

    SELECT @I = 2

    WHILE @I IS NOT NULL

    BEGIN

       DELETE FROM #P WHERE N% @I = 0

       SELECT @I = MIN(N) FROM @T WHERE N > @I

    END

    — add primes up to 1000

    INSERT INTO #P SELECT N FROM @T

    — Here are the results

    –78498 rows

    SELECT  * FROM #P ORDER BY 1

    drop table #P

    go

  11. Hi…Think simple …here code for find Prime Number in sql server : declare @a int declare @b int select @a = 10 select @b = 10000 while @a < @b BEGIN If ( (convert(float, @a/2.0) <> (@a/2)) AND (convert(float, @a/3.0) <> @a/3) AND (convert(float, @a/5.0)
    <> @a/5) AND (convert(float, @a/7.0) <> @a/7)) print convert(varchar(5),@a) + ‘ is a Prime Number’ Set @a = @a + 1 END