A clustering engine that will search the Web with www.etools.ch and automatically organize the results into thematic categories

By at December 1, 2007 | 11:28 pm | Print

What Company is Offering:
A clustering engine that will search the Web with www.etools.ch and automatically organize the results into thematic categories.
Carrot2 is an Open Source Search Results Clustering Engine. It can automatically organize (cluster) search results into thematic categories.Carrot2 provides an architecture for acquiring search results from various sources (YahooAPI, GoogleAPI, MSN Search API, eTools Meta Search, Alexa Web Search, PubMed, OpenSearch, Lucene index, SOLR), clustering the results and visualising the clusters. Currently, 5 clustering algorithms are available that are suitable for different kinds of document clustering tasks.

Thanks to its flexible architecture, high quality and a friendly BSD-like license, Carrot2 has been successfully used in a number of commercial and research applications and resulted in a number of interesting publications.
How It Works:

Carrot2 can add clustering of search results to an existing search engine. You can use an Open Source project called Nutch to crawl your website. Nutch has a Carrot2-based search clustering plugin, so you’ll get all crawling, searching and clustering in one piece.
How can I integrate Carrot2 with my Web site/ software?

Such an integration depends on what existing infrastructure is already available in your project. Carrot2 requires a feed of documents (search results), so typically you’ll need a search engine that crawls your site. Such an engine can be indeed local to your Web site (proprietary solutions in intranets, search engines built on top of Nutch or ht://dig), but it can as well be a global search engine with searches restricted to your domain (Google, Yahoo).

Once a search engine is available, the integration depends on the technology your site/ software uses for rendering the user interface (or more accurately: for implementing application logic). Software written in Java can use Carrot2 directly in a way that is shown in the end-to-end example code (JavaDoc). Sites written in Perl, PHP, .NET and other languages can use the Carrot2 Document Clustering Server, for more details see the dedicated FAQ. Finally, in some cases you might want to re-use and customize (through XSLT) some bits of Carrot2’s web application (located in the carrot2/applications/carrot2-demo-webapp folder of the source repository) to e.g. visualize clusters.
More at:http://demo.carrot2.org/

Uncategorized

Related Posts


  1. Buy the best HD IPTV, 4

    Lovely material, Regards.