Pond's Seventh Law Inspires a Question: Elegance Serving Randomness

Marc left a great question on a Pond's Laws post from July of 2006:

Hi,

I have a flashcard system that randomly pulls a word from the database. I also have a testing module that allows a user to test him or her self on words. For each word the user answers incorrectly it is flagged.

I want to have an option in the flashcard system that is a checkbox that says "Test on words answered incorrectly more frequently" when the checkbox is checked the words answered incorrectly will appear more frequently. Any idea how to simulate the "frequently". Remember that the flashcard grabs a word randomly too. Thanks, Marc

Great question, Marc. I think I can name this tune in two tables and a randomization function. Here's where a little elegance will serve us well from a functional perspective -- no Laws are being violated!

First let's build a table for your flashcards..

create table FlashCard (
      FlashCardId int identity(1,1),
      FlashCardText nvarchar(200)
      )

Hopefully this makes sense.. each flashcard has an associated integer key.

Then let's build a table that we'll use to decide which card to flash; we'll call it the Candidate table. It'll have an id of its own as well as a foreign key to the FlashCard table..

create table Candidate (
      CandidateId int identity(1,1),
      FlashCardId int
      )

Every time you add a record to the FlashCard table, you also add a record to the Candidate table. Every time the user misses a word, add another record (or two, or three.. however many you need to make the missed words appear with the desired frequency) to the Candidate table, and run your SELECT RAND (SQL2K/SQL2K5/SQL2K8) until it matches a CandidateId value in Candidate, and use the foreign key back to FlashCard to retrieve the word. With more Candidate records for more frequently missed words, those words will be more likely to be selected.

This approach strikes me as a very elegant, yet simple, highly functional, mostly set-based solution to Marc's challenge. Properly indexed and coded, it should perform well with tens of millions of flashcards. Given the number of randomly generated numbers this approach will throw away, your code may actually run faster with a greater number of Candidates; there's likely a slick set-based way to restrict the output of the RAND function that I haven't thought of yet. If it comes to me I'll post an update.

I hope this adequately answers your question, Marc; please let me know if there's anything more you need.. or if you've got a better mousetrap (this last, of course, goes for anyone..!

Thanks for your comment!

     -wp