Saturday 26 January 2013

Solr is the popular, blazing fast open source enterprise search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, rich document (e.g., Word, PDF) handling, and geospatial search. Solr is highly scalable, providing distributed search and index replication, and it powers the search and navigation features of many of the world's largest internet sites.

Solr is written in Java and runs as a standalone full-text search server within a servlet container such as Tomcat. Solr uses the Lucene Java search library at its core for full-text indexing and search, and has REST-like HTTP/XML and JSON APIs that make it easy to use from virtually any programming language. Solr's powerful external configuration allows it to be tailored to almost any type of application without Java coding, and it has an extensive plugin architecture when more advanced customization is required.



Features




Solr is a standalone enterprise search server with a REST-like API. You put documents in it (called "indexing") via XML, JSON or binary over HTTP. You query it via HTTP GET and receive XML, JSON, or binary results.

  • Advanced Full-Text Search Capabilities
  • Optimized for High Volume Web Traffic
  • Standards Based Open Interfaces - XML,JSON and HTTP
  • Comprehensive HTML Administration Interfaces
  • Server statistics exposed over JMX for monitoring
  • Scalability - Efficient Replication to other Solr Search Servers
  • Flexible and Adaptable with XML configuration
  • Extensible Plugin Architecture

Architecture



Solr Uses the Lucene Search Library and Extends it!




  • A Real Data Schema, with Numeric Types, Dynamic Fields, Unique Keys
  • Powerful Extensions to the Lucene Query Language
  • Faceted Search and Filtering
  • Geospatial Search
  • Advanced, Configurable Text Analysis
  • Highly Configurable and User Extensible Caching
  • Performance Optimizations
  • External Configuration via XML
  • An Administration Interface
  • Monitorable Logging
  • Fast Incremental Updates and Index Replication
  • Highly Scalable Distributed search with sharded index across multiple hosts
  • JSON, XML, CSV/delimited-text, and binary update formats
  • Easy ways to pull in data from databases and XML files from local disk and HTTP sources
  • Rich Document Parsing and Indexing (PDF, Word, HTML, etc) using Apache Tika
  • Apache UIMA integration for configurable metadata extraction
  • Multiple search indices

0 comments:

Post a Comment

Find me on Facebook! Follow me on Twitter!