Saturday, 26 January 2013

1) Open Terminal and first become root user:

sudo su

2) Now we need to download the package sphinxsearch 2.0.5 from the Sphinx Search site.

wget http://sphinxsearch.com/files/sphinxsearch_2.0.5-release-0ubuntu11~precise2_amd64.deb

3) Sphinx Search 2.0.5 requires a dependency package called libpq5 that you might not have yet on your server so run the following to install the deb package:

# install dependency  
apt-get install libpq5  
 
# install deb package  
dpkg -i sphinxsearch_2.0.5-release-0ubuntu11~precise2_amd64.deb

NOTE: Now we have Sphinx Search 2.0.5 installed on our Ubuntu Server installation and we are ready to index some of our mysql data. Here are the programs we need to get the job done: (don’t run this commands)

/usr/bin/searchd  
/usr/bin/indexer  
/usr/bin/search

For example purpose, we have a folder structure of

/home/indianic/web/sphinx/etc
/home/indianic/web/logs

4) The above line shows that we have an etc in a sphinx directory which is were we store our config file

cat /home/indianic/web/sphinx/etc/sphinx.conf

Below is our configuration file that Sphinx Search will use to index the data we want indexed. Our purpose is to allow full text search so the configuration below covers what such a setup might look like.

    source indianiccom  
    {  
       type                            = mysql      
       sql_host                        = localhost  
       sql_user                        = root  
       sql_pass                        = root  
       sql_db                          = mysqldatabase_sphinx  
       sql_port                        = 3306  
     
       sql_query_range = SELECT MIN(id), MAX(id) FROM posts  
       sql_range_step  = 128  
       sql_query       = SELECT id, created, modified, title, content, tags, short_description, author_id FROM posts WHERE id>=$start AND id<=$end  
    }  
     
    index indianiccom {  
       source = indianiccom  
       path = /home/indianic/web/sphinx/sphinx  
       morphology = stem_en  
       min_word_len = 3  
       min_prefix_len = 0  
    }  
     
    searchd {  
       compat_sphinxql_magics = 0  
       port = 3313  
       log = /home/indianic/web/logs/searchd.log  
       query_log = /home/indianic/web/logs/query.log  
       pid_file = /home/indianic/web/logs/searchd.pid  
       max_matches = 10000  
    }

5) Now that our configuration file is ready and points out exactly what we want sphinxsearch to index, we can move on to the actual indexing. Still as root user do the following to index your data:

/usr/bin/indexer --config /home/indianic/web/sphinx/etc/sphinx.conf --all

6) Now your mysql query in the config file is indexed. We can now move forward and set up sphinxsearch to start at startup of our server in case we need to reboot or something happens that restarts your server. As root do the following:

nano /etc/rc.local

7) Now we need to add the following command right before the last line which displays exit 0 so that our file looks like:

#!/bin/sh -e  
#  
# rc.local  
#  
# This script is executed at the end of each multiuser runlevel.  
# Make sure that the script will "exit 0" on success or any other  
# value on error.  
#  
# In order to enable or disable this script just change the execution  
# bits.  
#  
# By default this script does nothing.  
 
/usr/bin/searchd --config /home/indianic/web/sphinx/etc/sphinx.config  

exit 0

8) Make sure to replace indianic with your directory. What the above script does is it starts the search daemon using the configuration file we used earlier.
Now let us test our installation by starting the sphinxsearch daemon. Run the following to start searchd:

/usr/bin/searchd --config /home/indianic/web/sphinx/etc/sphinx.conf

9) Once the we have searchd running we can test if the index work by doing the following:
eg: /usr/bin/search -c /home/indianic/web/sphinx/etc/sphinx.conf <search key>

/usr/bin/search -c /home/indianic/web/sphinx/etc/sphinx.conf mysql

10) The result should look similar to what I have below. Obviously if your data has different content yours would look different but the below display is just so you can see how it is supposed to look like.

Sphinx 2.0.5-id64-release (r3308)  
Copyright (c) 2001-2012, Andrew Aksyonoff  
Copyright (c) 2008-2012, Sphinx Technologies Inc (http://sphinxsearch.com)  
 
using config file '/home/indianic/web/sphinx/etc/sphinx.conf'...  
index 'indianiccom': query 'mysql ': returned 15 matches of 15 total in 0.024 sec  
 
displaying matches:  
1. document=120, weight=4660  
2. document=125, weight=4655  
3. document=6, weight=4645  
4. document=115, weight=4645  
5. document=100, weight=4634  
6. document=93, weight=3660  
7. document=99, weight=3645  
8. document=60, weight=2609  
9. document=118, weight=1645  
10. document=105, weight=1634  
11. document=10, weight=1624  
12. document=7, weight=1609  
13. document=117, weight=1609  
14. document=119, weight=1609  
15. document=108, weight=1579  
 
words:  
1. 'mysql': 15 documents, 82 hits

11) When you want to index the mysql’s updated data, then you have to run the following command:

/usr/bin/indexer --rotate --config /home/indianic/web/sphinx/etc/sphinx.conf --all

12) In Java:

a) Go to the path where Sphinx has been installed:
cd /usr/share/sphinxsearch/api/java

Creating sphinxapi.jar file:
make

The above “make” command will create sphinxapi.jar file.
   
    b) Create project in eclipse and add this sphinxapi.jar in classpath.
    c) Create a new class SphinxMain.java as follow:

    package com.demo;

import java.util.Date;

import org.sphx.api.SphinxClient;
import org.sphx.api.SphinxException;
import org.sphx.api.SphinxMatch;
import org.sphx.api.SphinxResult;
import org.sphx.api.SphinxWordInfo;
public class SphinxMain {
   public SphinxClient client = new SphinxClient();
   public String index = "*" ;
   
   public SphinxMain() {
       try {
           client.SetServer("localhost", 3313);
       } catch (SphinxException e) {
           e.printStackTrace();
       }
   }
   
   public static void main(String[] args) {
       SphinxMain object = new SphinxMain();
       object.searchValue("sunil");
       object.searchValue("tag");
       object.searchValue("asdasdasdasd");
      }
   public void searchValue(String query)
   {
       System.out.println("***********************************************");
       System.out.println("Search Key: " + query);
       System.out.println("***********************************************");
       try {
           SphinxResult res = client.Query(query, index);
           if ( res==null )
           {
               System.err.println ( "Error: " + client.GetLastError() );
               System.exit ( 1 );
           }else
           {
               printResult(query, res);
           }
       } catch (SphinxException e) {
           e.printStackTrace();
       }        
   }
   
   public void printResult(String query,SphinxResult res)
   {
       /* print me out */
       System.out.println ( "Query '" + query + "' retrieved " + res.total + " of " + res.totalFound + " matches in " + res.time + " sec." );
       System.out.println ( "Query stats:" );
       for ( int i=0; i<res.words.length; i++ )
       {
           SphinxWordInfo wordInfo = res.words[i];
           System.out.println ( "\t'" + wordInfo.word + "' found " + wordInfo.hits + " times in " + wordInfo.docs + " documents" );
       }

       System.out.println ( "\nMatches:" );
       for ( int i=0; i<res.matches.length; i++ )
       {
           SphinxMatch info = res.matches[i];
           System.out.print ( (i+1) + ". id=" + info.docId + ", weight=" + info.weight );
           if ( res.attrNames==null || res.attrTypes==null )
               continue;

           for ( int a=0; a<res.attrNames.length; a++ )
           {
               System.out.print ( ", " + res.attrNames[a] + "=" );

               if ( res.attrTypes[a]==SphinxClient.SPH_ATTR_MULTI || res.attrTypes[a]==SphinxClient.SPH_ATTR_MULTI64 )
               {
                   System.out.print ( "(" );
                   long[] attrM = (long[]) info.attrValues.get(a);
                   if ( attrM!=null )
                       for ( int j=0; j<attrM.length; j++ )
                   {
                       if ( j!=0 )
                           System.out.print ( "," );
                       System.out.print ( attrM[j] );
                   }
                   System.out.print ( ")" );

               } else
               {
                   switch ( res.attrTypes[a] )
                   {
                       case SphinxClient.SPH_ATTR_INTEGER:
                       case SphinxClient.SPH_ATTR_ORDINAL:
                       case SphinxClient.SPH_ATTR_FLOAT:
                       case SphinxClient.SPH_ATTR_BIGINT:
                       case SphinxClient.SPH_ATTR_STRING:
                           /* ints, longs, floats, strings.. print as is */
                           System.out.print ( info.attrValues.get(a) );
                           break;

                       case SphinxClient.SPH_ATTR_TIMESTAMP:
                           Long iStamp = (Long) info.attrValues.get(a);
                           Date date = new Date ( iStamp.longValue()*1000 );
                           System.out.print ( date.toString() );
                           break;

                       default:
                           System.out.print ( "(unknown-attr-type=" + res.attrTypes[a] + ")" );
                   }
               }
           }

           System.out.println();
       }
   }
   
}




Sphinx is a free software search engine designed with indexing database content in mind. By design, Sphinx databases can be gracefully integrated with SQL databases.
Sphinx can be used via one of the following ways:
  • as a stand-alone server (just like other DBMS's);
  • it can communicate with other DBMS's:
    • using native protocols of MySQL, MariaDB and PostgreSQL;
    • using ODBC with ODBC-compliant DBMS's;

  • using a Storage Engine for MySQL and its forks, called SphinxSE. MariaDB is distributed with that Storage Engine.

If Sphinx is executed as a stand-alone server, it is possible to use SphinxAPI to connect an application to it. Official implementations of the API are available for PHP, Java, Perl and Ruby languages. They all are distributed along with Sphinx.

Other data sources can be indexed via pipe in a custom XML format. It is distributed under the terms of the GNU General Public License version two or a proprietary license.


Features


  • Batch and incremental (soft real-time) full-text indexing.
  • Support for non-text attributes (scalars, strings, sets).
  • Direct indexing of SQL databases. Native support for MySQL, MariaDB, PostgreSQL, MSSQL, plus ODBC connectivity.
  • XML documents indexing support.
  • Distributed searching support out of the box.
  • Integration via access APIs.
  • SQL-like syntax support via MySQL protocol (since 0.9.9)
  • Full-text searching syntax.
  • Database-like result set processing.
  • Relevance ranking utilizing additional factors besides standard BM25.
  • Text processing support for SBCS and UTF-8 encodings, stopwords, indexing of words known not to appear in the database ("hitless"), stemming, word forms, tokenizing exceptions, and "blended characters" (dual-indexing as both a real character and a word separator).
  • Supports UDF (since 2.0.1).


Performance and scalability


  • Indexing speed of up to 10-15 MB/sec per core and HDD.
  • Searching speed of over 500 queries/sec against 1,000,000-document using a 2-core desktop system with 2 GB of RAM.
  • The biggest known installation using Sphinx, Boardreader.com, indexes 16 billion documents.
  • The busiest known installation, Craigslist, is rumored to serve over 200,000,000 queries/day.

Find me on Facebook! Follow me on Twitter!