CodeAngel.org
Faith in Knowledge.Search
Archives
Categories
Syndication
Stemming in Zend Search Lucene
One of the most difficult parts of making a search engine, whether it is a small search for a single website or something as large as a web search engine, is tweaking the ability to get as much relevant results from a user's query as possible. The best way to tweak this with Lucene and Zend_Search_Lucene is the correct use of analyzers for both indexing and querying an index.
Analyzers prepare and normalize text to be indexed and are also used to parse queries. For example, unless you want your search engine to be case sensitive you would want your analyzer to have a filter that lowercases everything as it goes into the index; that same analyzer needs to be used against the query to lowercase any terms in order for it to match the terms in the index. That way a search for "query" will match "Query, qUery, query, etc..", and a search for "Query" will match the same.
Now we come to the subject of Stemming. Stemming is the act of getting related words to "stem" to the same word. For example "nationalize, nationalization, nations" should all stem to "nation". Why is this important? say for example you are searching for Ipods, if you aren't making use of stemming, your search engine will match only "ipods", but not the singular "ipod". By making use of stemming you will get more relevant results. Stemming doesn't actually have to return a term that is spelled correctly. For example "ponies" and "pony" both stem to poni. The important thing is that all related words stem to the same thing, spelled right or not.
Codeangel 0.5
Well it's been a long time since I've done an update. Mostly because my interests have drifted to video game making, AI, and guitars. I'll probably have posts soon relating to those. For now though I finally finished a bit update!
Codeangel 0.5 has the following new features:
- Trackback and Pingback support (both recieve and sending).
- Feeds are reimplemented, Also notice the syndication symbol under each article links to this comment feed
- Reimplementation of the Search Engine. Articles are searched with better granularity, Comments are now indexed as well. Also I made a Porter Stemmer so one should get more relevant full results. I'll post more about this Porter Stemmer and release it soon.
Probably won't notice much on my next planned release: Splitting my admin and the main part of the site into modules. Also going to make a better article manager. Then I'm going to add published/unpublished article support. After I get done with that I will implement Skin support and implement skins with Zend_Layout. The big thing you'll notice is a new look for codeangel.org!
Codeangel 0.4
Codeangel is now running on version 0.4!
- Now Powered By Zend Framework 1.0.1!
- Improved Security Auditing and logging
- 404 and 500 error pages
- Cacheing, Front Page now twice as fast!
- Other Refactor using more features of Zend Framework, and code cleanup for faster execution.
Codeangel 0.5 the user will finally notice some feature updates, including linkback support, comment subscription, and a better searching experience.
Custom Zend Log Format: Security Logging
The default file logging format for Zend_Log File Writer is as folows:
%timestamp% %priorityName% (%priority%): %message%
Which is fine for error logging. For other sorts of logging like security auditing, We need more, like an IP of the visitor and a hostname. Other sorts of logging you might want to log things like the request where the error occurred. This is very easy to do with Zend_Log, however this really isn't documented and I've found people doing weird things like extending Zend_Log to achieve this. Let's look how to do this right.
Misconceptions about Exceptions
I was surprised the other day when someone came into ##php on freenode wanted a strange feature when he hit a wall designing his app with exceptions. He wanted something like the following:
//NOTE THIS IS NOT POSSIBLE, IT'S BULLCRAP!! try { throw new Exception("throw an exception"); } catch (Exception $e) { //do something with a thrown exception. } else { //do something if there wasn't an exception thrown. }
That's right, he wanted an 'else' for a try/catch block. My mind was blown, surely he was using exceptions wrongly. After I did some inquiring on why he wanted such a ridiculous feature, I found out he was indeed misusing exceptions. He was using exceptions to control his code execution flow. Using them to drive his application by recoverable errors. He was quite defensive about it too! I decided if there weren't enough blog articles about correct exception usage, there needed to be one more...
Codeangel 0.3
Well this was a fast release! I promise 0.4 will be a lot longer wait, and probably the experience will inspire me to write more regular articles. Changes for 0.3 include:
- Added links for digg, furl, stumbleupon, del.icio.us, technorati, etc... on each article page.
- Changed code highlighting class from Pear Text::Highlight to GeSHi
- Refactored the models a little bit with better error reporting. Not completely the way I want them yet, mostly due to the limitations of the older Zend Framework I'm working on.
- Fixed various small bugs
0.4 well be a big change, upgrading to Zend Framework 1.0. Because of the large differences between 1.0 and 0.8, there will be alot of code refactored (for the better).
Codeangel 0.2
I am calling this release 0.2, It's mainly bug fixes and a few features additions I have been putting off. Changes can be summerized:
- Changed homegrown js to the jquery library
- fixed a bug with md5 files on the download page(thanks to David for pointing it out)
- Completely changed my article linking method, the old way was a failed experiment.
- Made warnings to check when my posts weren't valid XML.
- truncated lists of tags and archives in the sidebar.
plans for 0.3 include adding links for dig this, technorati, delicious and what not for the articles and replacing the code highlighting lib with GeSHi. In 0.5 I plan on refactoring the app to work with Zend Framework 1.0 and to include things like cacheing and logging. Also I hope to replace some of my home-grown components with Zend Framework's components.
Zend Framework 1.0 is released
The Zend Framework 1.0 has been released! I have been keeping an eye on/involved in this project since I first heard about it almost 2 years ago. My involvement started out learning the framework and providing feedback. Lately I have joined in the cause to help out with documentation and will continue to do so from now on.
This very blog has been written in Zend Framework. It's 0.8 and I need to upgrade it and refactor it, and I don't have time with my current projects. It will happen soon though!
If your a web application developer, you really have to try Zend Framework out, it has many libraries for fast application development, but allows for much more degree of control than most other frameworks, making it more enterprise family.
The Zend Framework is more a collection of libraries ready for your use, It also gives plenty of abstract classes and interfaces to add on or change the framework to meet your needs. The most notable components of the Zend Framework are it's MVC implementation, a PHP port of Lucene, and a stand alone PDF class. And that's not all!
So give the Zend Framework a look and try it out for your next project! A word before you judge to early, one of the missing components that we couldn't get in to the 1.0 release was a partial view renderer, If you see this as a loss, don't worry it'll most likely be out by the next realease.
Custom Zend Framework Router
One could tweak Zend Framework's Router_Route to meet almost all your routing needs. But what if you want something beyond what that package can offer? You can make your routing dreams come true with making your own custom router, all you need to do is implement Zend_Controller_Router_Route_Interface (that's a mouth full).
Case example, I needed a website that could have an arbitrary amount of hierarchal categories. I wanted my URI path to reflect the full path to that category. For example:
http://www.example.org/category/clothing/hats/dress/fedora
This URL would represent the category controller. Each category is a child of the category to the left of it, clothing being the top most parent.
Adding APR support for Tomcat 6
Tomcat 6 has some neat new features. The one feature I am eyeballing the most is it's advanced IO features like the Comet Servlet Interface. In an upcoming article I'll be explaining the Comet Servlet Interface some more, Plus a neat proof of concept!
In order to get these advanced IO features, one has to enable APR listener support. APR is Apache's portable runtime that gives developers an API which behavior is consistent across many platforms. This runtime is the magic that makes the latest versions of Apache's httpd so darn portable. It's also needed for some advanced features in Tomcat!
RSS 1.0 Feed