Sphinx Search ~ Sunil Gulabani

Saturday, 26 January 2013

Sphinx Search

Sphinx is a free software search engine designed with indexing database content in mind. By design, Sphinx databases can be gracefully integrated with SQL databases.
Sphinx can be used via one of the following ways:

as a stand-alone server (just like other DBMS's);
it can communicate with other DBMS's:

using native protocols of MySQL, MariaDB and PostgreSQL;
using ODBC with ODBC-compliant DBMS's;

using a Storage Engine for MySQL and its forks, called SphinxSE. MariaDB is distributed with that Storage Engine.

If Sphinx is executed as a stand-alone server, it is possible to use SphinxAPI to connect an application to it. Official implementations of the API are available for PHP, Java, Perl and Ruby languages. They all are distributed along with Sphinx.

Other data sources can be indexed via pipe in a custom XML format. It is distributed under the terms of the GNU General Public License version two or a proprietary license.

Features

Batch and incremental (soft real-time) full-text indexing.
Support for non-text attributes (scalars, strings, sets).
Direct indexing of SQL databases. Native support for MySQL, MariaDB, PostgreSQL, MSSQL, plus ODBC connectivity.
XML documents indexing support.
Distributed searching support out of the box.
Integration via access APIs.
SQL-like syntax support via MySQL protocol (since 0.9.9)
Full-text searching syntax.
Database-like result set processing.
Relevance ranking utilizing additional factors besides standard BM25.
Text processing support for SBCS and UTF-8 encodings, stopwords, indexing of words known not to appear in the database ("hitless"), stemming, word forms, tokenizing exceptions, and "blended characters" (dual-indexing as both a real character and a word separator).
Supports UDF (since 2.0.1).

Performance and scalability

Indexing speed of up to 10-15 MB/sec per core and HDD.
Searching speed of over 500 queries/sec against 1,000,000-document using a 2-core desktop system with 2 GB of RAM.
The biggest known installation using Sphinx, Boardreader.com, indexes 16 billion documents.
The busiest known installation, Craigslist, is rumored to serve over 200,000,000 queries/day.

Categories: j2ee, java, jee, search engine, sphinx

Sunil Gulabani

Software Engineer (Java) | Author

Saturday, 26 January 2013

Sphinx Search

Features

Performance and scalability

0 comments:

Post a Comment

About Me

Posts

Popular Posts

Visitors

Total Pageviews

Alexa Site Stats

Friend's Blog

Labels