thesaurus

Thesaurus computations

Today I just started computing the relations between groups. It’s been chugging along on my local machine for about 8 hours now, I’m 1% complete and have a table with 5 million entries (at 210mb).

If a word is shared between two groups, it forms an edge between those two groups. I’m storing that edge as the two group ids and the “overlap” value (the number of words shared between those two groups).

My algorithm is iterating through every word, finding the groups that the word is a part of, and creating an edge between them—computing overlap as it goes. I expect the rate to fall off as duplicate edges are thrown out…. but we’ll see.

More thoughts on an interesting thesaurus

My associate, Rebecca, and I have been starting to think critically about Panlexicon.com, the unique, tag-cloud based thesaurus I’ve written about previously. We’re hoping to put some more time and effort into the project and in the process, learn some more about what’s happening with the language and the underlying structure of the thesaurus taxonomy.

Panlexicon.com - Thesaurus Visualization

The thesaurus data we’re working with is the Moby Thesaurus from the Project Gutenburg library of free electronic texts. Like many thesauruses, it’s structure in an interesting way. Every word is assigned to one or more groups based on it’s general meaning or idea. Each group has a keyword, also known as a headword, that is a general encapsulation that idea—this is why, for example in Roget’s, you must first look up a word in the index to acquire its keywords. Each group has only one keyword, but a keyword can exist in other groups (but as an ordinary word). …read more

Introducing Panlexicon.com

Panlexicon.com

I’m very proud to be officially launching Panlexicon.com: a unique thesaurus. Using intuitive “tag clouds” to represent synonyms, Panlexicon makes discovering the word you want quick, easy and explorational.

Panlexicon’s current functions allow you to:

  • First, perform a lookup on a single word and receive a weighted cloud of synonyms.
  • Second, view synonyms that overlap across multiple words either by entering the words manually, or clicking on words already in the cloud to further refine your search.

For example, performing a search on “cool” provides a wide variety of synonyms from “chilly”, to “unimpassioned”, to “groovy”. Refining the search using cool and nifty provides more refined synonyms.

By varying the size of the typeface, like tag clouds do, the most relevant terms pop out at you allowing you to quickly scan through large lists of words. Also, because the algorithm is a little fuzzy, you may run across related words that provide better context.

Panlexicon was developed jointly with Rebecca who originally proposed the project and did much of the research on thesauri and helped develop the word relevance algorithms. …read more