Search

Stemming in Zend Search Lucene

Tuesday, February 19. 2008

One of the most difficult parts of making a search engine, whether it is a small search for a single website or something as large as a web search engine, is tweaking the ability to get as much relevant results from a user's query as possible. The best way to tweak this with Lucene and Zend_Search_Lucene is the correct use of analyzers for both indexing and querying an index.

Analyzers prepare and normalize text to be indexed and are also used to parse queries. For example, unless you want your search engine to be case sensitive you would want your analyzer to have a filter that lowercases everything as it goes into the index; that same analyzer needs to be used against the query to lowercase any terms in order for it to match the terms in the index. That way a search for "query" will match "Query, qUery, query, etc..", and a search for "Query" will match the same.

Now we come to the subject of Stemming. Stemming is the act of getting related words to "stem" to the same word. For example "nationalize, nationalization, nations" should all stem to "nation". Why is this important? say for example you are searching for Ipods, if you aren't making use of stemming, your search engine will match only "ipods", but not the singular "ipod". By making use of stemming you will get more relevant results. Stemming doesn't actually have to return a term that is spelled correctly. For example "ponies" and "pony" both stem to poni. The important thing is that all related words stem to the same thing, spelled right or not.

Read More(1 page)...


Codeangel 0.5

Sunday, February 10. 2008

Well it's been a long time since I've done an update. Mostly because my interests have drifted to video game making, AI, and guitars. I'll probably have posts soon relating to those. For now though I finally finished a bit update!

Codeangel 0.5 has the following new features:

Probably won't notice much on my next planned release: Splitting my admin and the main part of the site into modules. Also going to make a better article manager. Then I'm going to add published/unpublished article support. After I get done with that I will implement Skin support and implement skins with Zend_Layout. The big thing you'll notice is a new look for codeangel.org!

Read More(1 page)...