There's a lot of interesting data sets out there on the “intarwebs”. One that I came across a few weeks ago was The CMU Pronouncing Dictionary. You can download the 0.7a version from here.
Part of the dictionary looks like this:
SILBAUGH S IH1 L B AO2 SILBER S IH1 L B ER0 SILBERBERG S IH1 L B ER0 B ER0 G SILBERG S IH1 L B ER0 G SILBERGELD S IH1 L B ER0 G EH2 L D SILBERMAN S IH1 L B ER0 M AH0 N SILBERNAGEL S IH1 L B ER0 N AH0 G AH0 L SILBERNER S IH0 L B ER1 N ER0 SILBERNER'S S IH0 L B ER1 N ER0 Z SILBERSTEIN S IH1 L B ER0 S T IY2 N SILBERSTEIN(1) S IH1 L B ER0 S T AY2 N
There are a number of phonemes defined (39 in total) and three stress levels (numbered from 0 to 2). See the CMU page linked above for details. The dictionary includes some 130,000 odd words, names and symbols.
Devise an algorithm to tell if any two words in the dictionary rhyme or not
Once you have your algorithm then you should be able to answer some simple ancillary questions about the data set:
Can you devise an algorithm to spot near rhymes?
If you use this new algorithm how does that affect the answers to the ancillary questions?
Bonus question: How many rhyme endings are there in the dictionary?