Rhymes

Created 24th September, 2009 07:25 (UTC), last edited 24th September, 2009 07:35 (UTC)

There's a lot of interesting data sets out there on the “intarwebs”. One that I came across a few weeks ago was The CMU Pronouncing Dictionary. You can download the 0.7a version from here.

Part of the dictionary looks like this:

SILBAUGH  S IH1 L B AO2
SILBER  S IH1 L B ER0
SILBERBERG  S IH1 L B ER0 B ER0 G
SILBERG  S IH1 L B ER0 G
SILBERGELD  S IH1 L B ER0 G EH2 L D
SILBERMAN  S IH1 L B ER0 M AH0 N
SILBERNAGEL  S IH1 L B ER0 N AH0 G AH0 L
SILBERNER  S IH0 L B ER1 N ER0
SILBERNER'S  S IH0 L B ER1 N ER0 Z
SILBERSTEIN  S IH1 L B ER0 S T IY2 N
SILBERSTEIN(1)  S IH1 L B ER0 S T AY2 N

There are a number of phonemes defined (39 in total) and three stress levels (numbered from 0 to 2). See the CMU page linked above for details. The dictionary includes some 130,000 odd words, names and symbols.

Devise an algorithm to tell if any two words in the dictionary rhyme or not

Once you have your algorithm then you should be able to answer some simple ancillary questions about the data set:

  1. What is the most common rhyme?
  2. How many words don't rhyme with any other words?
  3. How many words rhyme with exactly one other word?

Can you devise an algorithm to spot near rhymes?

If you use this new algorithm how does that affect the answers to the ancillary questions?

Bonus question: How many rhyme endings are there in the dictionary?


Categories: