Apache Solr Clustering ~ Sunil Gulabani

Example A: Simple two shard cluster

This example simply creates a cluster consisting of two solr servers representing two different shards of a collection.

Step 1: Download Solr 4-Beta or greater: http://lucene.apache.org/solr/downloads.html

Step 2: Since we'll need two solr servers for this example, simply make a copy of the example directory for the second server -- making sure you don't have any data already indexed.

rm -r example/solr/collection1/data/*
cp -r example example2

Step 3: This command starts up a Solr server and bootstraps a new solr cluster.

cd example
java -Dbootstrap_confdir=./solr/collection1/conf -Dcollection.configName=myconf -DzkRun -DnumShards=2 -jar start.jar

-DzkRun causes an embedded zookeeper server to be run as part of this Solr server.
-Dbootstrap_confdir=./solr/collection1/conf Since we don't yet have a config in zookeeper, this parameter causes the local configuration directory ./solr/conf to be uploaded as the "myconf" config. The name "myconf" is taken from the "collection.configName" param below.
-Dcollection.configName=myconf sets the config to use for the new collection. Omitting this param will cause the config name to default to "configuration1".
-DnumShards=2 the number of logical partitions we plan on splitting the index into.

Verify: Browse to http://localhost:8983/solr/#/~cloud to see the state of the cluster (the zookeeper distributed filesystem). You can see from the zookeeper browser that the Solr configuration files were uploaded under "myconf", and that a new document collection called "collection1" was created. Under collection1 is a list of shards, the pieces that make up the complete collection.
Step 4: Now we want to start up our second server - it will automatically be assigned to shard2 because we don't explicitly set the shard id. Then start the second server, pointing it at the cluster:

cd example2
java -Djetty.port=7574 -DzkHost=localhost:9983 -jar start.jar

-Djetty.port=7574 is just one way to tell the Jetty servlet container to use a different port.
-DzkHost=localhost:9983 points to the Zookeeper ensemble containing the cluster state. In this example we're running a single Zookeeper server embedded in the first Solr server. By default, an embedded Zookeeper server runs at the Solr port plus 1000, so 9983.

Verify: If you refresh the zookeeper browser, you should now see both shard1 and shard2 in collection1. View http://localhost:8983/solr/#/~cloud.

Step 5: Next, index some documents:

cd exampledocs
java -Durl=http://localhost:8983/solr/collection1/update -jar post.jar ipod_video.xml
java -Durl=http://localhost:8983/solr/collection1/update -jar post.jar monitor.xml
java -Durl=http://localhost:8983/solr/collection1/update -jar post.jar mem.xml

Verify: And now, a request to either server results in a distributed search that covers the entire collection:
http://localhost:8983/solr/collection1/select?q=*:*

Example B: Simple two shard cluster with shard replicas

This example will simply build off of the previous example by creating another copy of shard1 and shard2. Extra shard copies can be used for high availability and fault tolerance, or simply for increasing the query capacity of the cluster.

Step 1: First, run through the previous example so we already have two shards and some documents indexed into each. Then simply make a copy of those two servers:

cp -r example exampleB
cp -r example2 example2B

Step 2 : Then start the two new servers on different ports, each in its own window:

cd exampleB
java -Djetty.port=8900 -DzkHost=localhost:9983 -jar start.jar
cd example2B
java -Djetty.port=7500 -DzkHost=localhost:9983 -jar start.jar

Verify: Refresh the zookeeper browser page Solr Zookeeper Admin UI and verify that 4 solr nodes are up, and that each shard is present at 2 nodes.
Because we have been telling Solr that we want two logical shards, starting instances 3 and 4 are assigned to be replicas of instances one and two automatically.

Step 3 : Now send a query to any of the servers to query the cluster:
http://localhost:7500/solr/collection1/select?q=*:*
Note: To demonstrate fail over for high availability, go ahead and kill any one of the Solr servers (just press CTRL-C in the window running the server) and and send another query request to any of the remaining servers that are up.
To demonstrate graceful degraded behavior, kill all but one of the Solr servers (just press CTRL-C in the window running the server) and and send another query request to the remaining server. By default this will return 0 documents. To return just the documents that are available in the shards that are still alive, add the following query parameter: shards.tolerant=true

SolrCloud uses leaders and an overseer as an implementation detail. This means that some shards/replicas will play special roles. You don't need to worry if the instance you kill is a leader or the cluster overseer - if you happen to kill one of these, automatic fail over will choose new leaders or a new overseer transparently to the user and they will seamlessly takeover their respective jobs. Any Solr instance can be promoted to one of these roles.

Example C: Two shard cluster with shard replicas and zookeeper ensemble

The problem with example B is that while there are enough Solr servers to survive any one of them crashing, there is only one zookeeper server that contains the state of the cluster. If that zookeeper server crashes, distributed queries will still work since the solr servers remember the state of the cluster last reported by zookeeper. The problem is that no new servers or clients will be able to discover the cluster state, and no changes to the cluster state will be possible.
Running multiple zookeeper servers in concert (a zookeeper ensemble) allows for high availability of the zookeeper service. Every zookeeper server needs to know about every other zookeeper server in the ensemble, and a majority of servers are needed to provide service. For example, a zookeeper ensemble of 3 servers allows any one to fail with the remaining 2 constituting a majority to continue providing service. 5 zookeeper servers are needed to allow for the failure of up to 2 servers at a time.

For production, it's recommended that you run an external zookeeper ensemble rather than having Solr run embedded zookeeper servers. You can read more about setting up a zookeeper ensemble here. For this example, we'll use the embedded servers for simplicity.

Step 1: First, stop all 4 servers and then clean up the zookeeper data directories for a fresh start.

rm -r example*/solr/zoo_data

We will be running the servers again at ports 8983,7574,8900,7500. The default is to run an embedded zookeeper server at hostPort+1000, so if we run an embedded zookeeper on the first three servers, the ensemble address will be localhost:9983,localhost:8574,localhost:9900.

As a convenience, we'll have the first server upload the solr config to the cluster. You will notice it block until you have actually started the second server. This is due to zookeeper needing a quorum before it can operate.

cd example
java -Dbootstrap_confdir=./solr/collection1/conf -Dcollection.configName=myconf -DzkRun -DzkHost=localhost:9983,localhost:8574,localhost:9900 -DnumShards=2 -jar start.jar

cd example2
java -Djetty.port=7574 -DzkRun -DzkHost=localhost:9983,localhost:8574,localhost:9900 -jar start.jar

cd exampleB
java -Djetty.port=8900 -DzkRun -DzkHost=localhost:9983,localhost:8574,localhost:9900 -jar start.jar

cd example2B
java -Djetty.port=7500 -DzkHost=localhost:9983,localhost:8574,localhost:9900 -jar start.jar

Now since we are running three embedded zookeeper servers as an ensemble, everything can keep working even if a server is lost. To demonstrate this, kill the exampleB server by pressing CTRL+C in it's window and then browse to the Solr Zookeeper Admin UI to verify that the zookeeper service still works.

5 comments:

Unknown13 March 2017 at 22:40
Nice blog Thanks for sharing it
Hadoop Training in Chennai
Unknown17 March 2017 at 05:48
Very good informative blog...Keep instantly sharing these types of useful updates....https://goo.gl/Wcgrbp
Unknown20 March 2017 at 02:14
The information about the apache solar clustering is really much informative...Thanks for your update...
Python Training in Chennai
Unknown29 May 2017 at 04:05
really you have shared very informative blog. it will be really helpful to many peoples. thank you for sharing this blog. before i read this blog i didn't have any knowledge about this but now i got some knowledge.
softwaretesting training in chennai
Mani6 March 2018 at 20:27
Excellent Artical.Thank you very much for your hard work.

Informatica Online Training|ETL Testing Online Training|Hadoop online Training

Sunil Gulabani

Software Engineer (Java) | Author

Saturday, 26 January 2013