Saturday, 26 January 2013

Apache Solr - Introduction

Solr is the popular, blazing fast open source enterprise search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, rich document (e.g., Word, PDF) handling, and geospatial search. Solr is highly scalable, providing distributed search and index replication, and it powers the search and navigation features of many of the world's largest internet sites.

Solr is written in Java and runs as a standalone full-text search server within a servlet container such as Tomcat. Solr uses the Lucene Java search library at its core for full-text indexing and search, and has REST-like HTTP/XML and JSON APIs that make it easy to use from virtually any programming language. Solr's powerful external configuration allows it to be tailored to almost any type of application without Java coding, and it has an extensive plugin architecture when more advanced customization is required.

Features

Solr is a standalone enterprise search server with a REST-like API. You put documents in it (called "indexing") via XML, JSON or binary over HTTP. You query it via HTTP GET and receive XML, JSON, or binary results.

Advanced Full-Text Search Capabilities
Optimized for High Volume Web Traffic
Standards Based Open Interfaces - XML,JSON and HTTP
Comprehensive HTML Administration Interfaces
Server statistics exposed over JMX for monitoring
Scalability - Efficient Replication to other Solr Search Servers
Flexible and Adaptable with XML configuration
Extensible Plugin Architecture

Architecture

Solr Uses the Lucene Search Library and Extends it!

A Real Data Schema, with Numeric Types, Dynamic Fields, Unique Keys
Powerful Extensions to the Lucene Query Language
Faceted Search and Filtering
Geospatial Search
Advanced, Configurable Text Analysis
Highly Configurable and User Extensible Caching
Performance Optimizations
External Configuration via XML
An Administration Interface
Monitorable Logging
Fast Incremental Updates and Index Replication
Highly Scalable Distributed search with sharded index across multiple hosts
JSON, XML, CSV/delimited-text, and binary update formats
Easy ways to pull in data from databases and XML files from local disk and HTTP sources
Rich Document Parsing and Indexing (PDF, Word, HTML, etc) using Apache Tika
Apache UIMA integration for configurable metadata extraction
Multiple search indices

Categories: apache, document, indexing, j2ee, java, jee, search engine, solr, solrj

Sunil Gulabani

Software Engineer (Java) | Author

Saturday, 26 January 2013

Apache Solr - Introduction

Features

Architecture

Solr Uses the Lucene Search Library and Extends it!

0 comments:

Post a Comment

About Me

Posts

Popular Posts

Visitors

Total Pageviews

Alexa Site Stats

Friend's Blog

Labels