请输入您要查询的百科知识:

 

词条 Lemur Project
释义

  1. Features

  2. Components

  3. Latest Version

  4. Indri Search Engine

      Features of Indri Search Engine  

  5. See also

  6. External links

{{primary sources|date=August 2011}}

The Lemur Project is a collaboration between the Center for Intelligent Information Retrieval at the University of Massachusetts Amherst and the Language Technologies Institute at Carnegie Mellon University. It develops the Lemur Toolkit, an open-source (BSD license) software framework for building language modeling and information retrieval software, and the Indri search engine. This toolkit is used for developing search engines, text analysis tools, browser toolbars, and data resources in the area of IR.

The programming languages used to create Lemur are C and C++ and it comes along with the source files and a make file. The provided source code can be modified for the purpose of developing new libraries. It is compatible with various operating systems which include UNIX (Linux and Solaris) and Windows XP.

Features

Lemur supports the following features:

  • Indexing:
    • English, Chinese, and Arabic text
    • Word stemming
    • Stop words
    • Tokenization
    • Passage and incremental indexing
  • Retrieval:
    • Ad hoc retrieval (TF-IDF and InQuery)
    • Passage and cross-lingual retrieval
    • Language modeling
    • Query model updating
    • Two stage smoothing
    • Relevance feedback
    • Structured query language
    • Wildcard term matching
  • Distributed IR:
    • Query-based sampling
    • Database based ranking (CORI)
    • Results merging
  • Document clustering
  • Summarization
  • Simple text processing

Components

Lemur Project has the following components:

  • Lemur Toolkit
  • Indri
  • Galago
  • Lemur Query Log Toolbar
  • ClueWeb09 Dataset

Latest Version

The latest version of the Indri search engine is 5.8.

The final and latest available version of the Lemur Toolkit is version 4.12.

Indri Search Engine

The Indri search engine is one of the components of the Lemur toolkit. It is open source as well. The query language that is used in Indri allows researchers to index data or structure documents using simple command line instructions. Indri offers flexibility in terms of adaptation to various current applications. It also can be distributed across a cluster of nodes for high performance. Indri search engine can handle large collections of data and can understand various data formats like HTML and XML.

The Indri API supports various programming and scripting languages like C++, Java, C#, and PHP. The latest available version of Indri is 5.6.

Features of Indri Search Engine

  • Can make use of multiple document representations
  • Explicit term weighting
  • Robust query language
  • Formally well-grounded
  • Highly effective
  • Can be efficiently implemented

See also

  • List of information retrieval libraries

External links

  • The Lemur Project website
{{Free-software-stub}}

1 : Free software projects

随便看

 

开放百科全书收录14589846条英语、德语、日语等多语种百科知识,基本涵盖了大多数领域的百科知识,是一部内容自由、开放的电子版国际百科全书。

 

Copyright © 2023 OENC.NET All Rights Reserved
京ICP备2021023879号 更新时间:2024/11/11 18:02:36