_posts/2014-06-25-scaling_accumulo_with_multi_volume.html (229 lines of code) (raw):
---
layout: post
status: PUBLISHED
published: true
title: Scaling Accumulo With Multi-Volume Support
id: d626db91-3a12-4ed4-b43d-c05ed80f6c80
date: '2014-06-25 01:10:31 -0400'
categories: accumulo
tags: []
permalink: accumulo/entry/scaling_accumulo_with_multi_volume
---
<p> This post was moved <a href="https://accumulo.apache.org/blog/2014/06/25/scaling-accumulo-with-multivolume-support.html" title="Updated location">to the Accumulo project site</a>. </p>
<p class="western">MapReduce is a commonly used approach to<br />
querying or analyzing large amounts of data. Typically MapReduce jobs<br />
are created using using some set of files in HDFS to produce a<br />
result. When new files come in, they get added to the set, and the<br />
job gets run again. A common Accumulo approach to this scenario is to<br />
load all of the data into a single instance of Accumulo.</p>
<p class="western"> A single instance of Accumulo can scale quite<br />
largely[1,2] to accommodate high levels of ingest and query. The manner<br />
in which ingest is performed typically depends on latency<br />
requirements. When the desired latency is small, inserts are<br />
performed directly into Accumulo. When the desired latency is allowed<br />
to be large, then a bulk style of ingest[3] can be used. There are<br />
other factors to consider as well, but they are outside the scope of<br />
this article.</p>
<p class="western"> On large clusters using the bulk style of ingest<br />
input files are typically batched into MapReduce jobs to create a set<br />
of output RFiles for import into Accumulo. The number of files per<br />
job is typically determined by the required latency and the number of<br />
MapReduce tasks that the cluster can complete in the given<br />
time-frame. The resulting RFiles, when imported into Accumulo, are<br />
added to the list of files for their associated tablets. Depending on<br />
the configuration this will cause Accumulo to major compact these<br />
tablets. If the configuration is tweaked to allow more files per<br />
tablet, to reduce the major compactions, then more files need to be<br />
opened at query time when performing scans on the tablet. Note that<br />
no single node is burdened by the file management; but, the number of<br />
file operations in aggregate is very large. If each server has<br />
several hundred tablets, and there are a thousand tablet servers, and<br />
each tablet compacts some files every few imports, we easily have<br />
50,000 file operations (create, allocate a block, rename and delete)<br />
every ingest cycle.</p>
<p class="western"> In addition to the NameNode operations caused by<br />
bulk ingest, other Accumulo processes (e.g. master, gc) require<br />
interaction with the NameNode. Single processes, like the garbage<br />
collector, can be starved of responses from the NameNode as the NameNode is<br />
limited on the number of concurrent operations. It is not unusual for<br />
an operator's request for “<font face="Courier 10 Pitch, MS Mincho">hadoop<br />
fs -ls /accumulo</font>” to take a minute before returning results<br />
during the peak file-management periods. In particular, the file<br />
garbage collector can fall behind, not finishing a cycle of<br />
unreferenced file removal before the next ingest cycle creates a new<br />
batch of files to be deleted.</p>
<p class="western"> The Hadoop community addressed the NameNode<br />
bottleneck issue with HDFS federation[4] which allows a datanode to<br />
serve up blocks for multiple namenodes. Additionally, ViewFS allows<br />
clients to communicate with multiple namenodes through the use of a<br />
client-side mount table. This functionality was insufficient for<br />
Accumulo in the 1.6.0 release as ViewFS works at a directory level; as an example, /dirA is mapped to<br />
one NameNode and /dirB is mapped to another, and Accumulo uses a<br />
single HDFS directory for its storage.</p>
<p class="western"> Multi-Volume support (MVS), included in 1.6.0,<br />
includes the changes that allow Accumulo to work across multiple HDFS<br />
clusters (called volumes in Accumulo) while continuing to use a<br />
single HDFS directory. A new property, instance.volumes, can be<br />
configured with multiple HDFS nameservices and Accumulo will use them<br />
all to balance out NameNode operations. The nameservices configured<br />
in instance.volumes may optionally use the High Availability NameNode feature as it is transparent<br />
to Accumulo. With MVS you have two options to horizontally scale your<br />
Accumulo instance. You can use an HDFS cluster with Federation and<br />
multiple NameNodes or you can use separate HDFS clusters.</p>
<p class="western"> By default Accumulo will perform round-robin file<br />
allocation for each tablet, spreading the files across the different<br />
volumes. The file balancer is pluggable, allowing for custom<br />
implementations. For example, if you don't use Federation and use<br />
multiple HDFS clusters, you may want to allocate all files for a<br />
particular table to one volume.</p>
<p class="western"> Comments in the JIRA[5] regarding backups could<br />
lead to follow-on work. With the inclusion of snapshots in HDFS, you<br />
could easily envision an application that quiesces the database or<br />
some set of tables, flushes their entries from memory, and snapshots<br />
their directories. These snapshots could then be copied to another<br />
HDFS instance either for an on-disk backup, or bulk-imported into<br />
another instance of Accumulo for testing or some other use.</p>
<p class="western"> The example configuration below shows how to<br />
set up Accumulo with HA NameNodes and Federation, as it is likely the<br />
most complex. We had to reference several web sites, one of the HDFS<br />
mailing lists, and the source code to find all of the configuration<br />
parameters that were needed. The configuration below includes two<br />
sets of HA namenodes, each set servicing an HDFS nameservice in a<br />
single HDFS cluster. In the example below, nameserviceA is serviced<br />
by name nodes 1 and 2, and nameserviceB is serviced by name nodes 3<br />
and 4.</p>
<p class="western">[1]<br />
<font color="#000080"><span lang="zxx"><u><a class="western" href="http://ieeexplore.ieee.org/zpl/login.jsp?arnumber=6597155">http://ieeexplore.ieee.org/zpl/login.jsp?arnumber=6597155</a></u><span style="text-decoration: none;"><span style="color: #000000;"></span></span></span></font></p>
<p class="western"><font color="#000080"><span lang="zxx"><span style="text-decoration: none;"><span style="color: #000000;">[2</span>]<br />
</span></span></font><font color="#000080"><span lang="zxx"><u><a class="western" href="http://www.pdl.cmu.edu/SDI/2013/slides/big_graph_nsa_rd_2013_56002v1.pdf">http://www.pdl.cmu.edu/SDI/2013/slides/big_graph_nsa_rd_2013_56002v1.pdf</a></u></span></font></p></p>
<p class="western">[3]<br />
<font color="#000080"><span lang="zxx"><u><a class="western" href="http://accumulo.apache.org/1.5/examples/bulkIngest.html">http://accumulo.apache.org/1.<font color="#000080">6</font>/examples/bulkIngest.html</a></u></span></font></p></p>
<p class="western">[4]<br />
<font color="#000080"><span lang="zxx"><u><a class="western" href="https://issues.apache.org/jira/browse/HDFS-1052">https://issues.apache.org/jira/browse/HDFS-1052</a></u></span></font></p></p>
<p class="western">[5]<br />
<font color="#000080"><span lang="zxx"><u><a class="western" href="https://issues.apache.org/jira/browse/ACCUMULO-118">https://issues.apache.org/jira/browse/ACCUMULO-118</a></u></span></font></p></p>
<p class="western"><em>- By Dave Marion and Eric Newton</em></p>
<h2>core-site.xml:<br />
</h2>
<pre class="text-body-indent-western"> <font style="font-size: 11pt" size="2"><property></font>
<font style="font-size: 11pt" size="2"><name>fs.defaultFS</name></font><font style="font-size: 11pt" size="2">
</font> <font style="font-size: 11pt" size="2"><value>viewfs:///</value></font>
<font style="font-size: 11pt" size="2"></property></font><font style="font-size: 11pt" size="2">
</font> <font style="font-size: 11pt" size="2"><property></font>
<font style="font-size: 11pt" size="2"> <name>fs.viewfs.mounttable.default.link./nameserviceA</name></font>
<font style="font-size: 11pt" size="2"> <value>hdfs://nameserviceA</value></font>
<font style="font-size: 11pt" size="2"></property></font><font style="font-size: 11pt" size="2">
</font> <font style="font-size: 11pt" size="2"><property></font><font style="font-size: 11pt" size="2">
</font><font style="font-size: 11pt" size="2"> <name>fs.viewfs.mounttable.default.link./nameserviceB</name></font><font style="font-size: 11pt" size="2">
</font><font size="2"> </font><font style="font-size: 11pt" size="2"><value>hdfs://nameserviceB</value></font><font style="font-size: 11pt" size="2">
</font> <font style="font-size: 11pt" size="2"></property></font><font style="font-size: 11pt" size="2">
</font> <font style="font-size: 11pt" size="2"><property></font><font style="font-size: 11pt" size="2">
</font><font style="font-size: 11pt" size="2"> <name>fs.viewfs.mounttable.default.link./nameserviceA/accumulo/instance_id</name></font><font style="font-size: 11pt" size="2">
</font><font size="2"> </font><font style="font-size: 11pt" size="2"><value>hdfs://nameserviceA/accumulo/instance_id</value></font>
<font style="font-size: 11pt" size="2"> <description>Workaround for ACCUMULO-2719</description></font><font style="font-size: 11pt" size="2">
</font> <font style="font-size: 11pt" size="2"></property></font><font style="font-size: 11pt" size="2">
</font> <font style="font-size: 11pt" size="2"><property></font><font style="font-size: 11pt" size="2">
</font><font style="font-size: 11pt" size="2"> <name>dfs.ha.fencing.methods</name></font><font style="font-size: 11pt" size="2">
</font><font style="font-size: 11pt" size="2"> <value>sshfence(hdfs:22)</font>
<font style="font-size: 11pt" size="2"> shell(/bin/true)</value></font><font style="font-size: 11pt" size="2">
</font> <font style="font-size: 11pt" size="2"></property></font><font style="font-size: 11pt" size="2">
</font> <font style="font-size: 11pt" size="2"><property></font>
<font style="font-size: 11pt" size="2"> <name>dfs.ha.fencing.ssh.private-key-files</name></font><font style="font-size: 11pt" size="2">
</font><font size="2"> </font><font style="font-size: 11pt" size="2"><value><PRIVATE_KEY_LOCATION></value></font>
<font style="font-size: 11pt" size="2"></property></font><font style="font-size: 11pt" size="2">
</font> <font style="font-size: 11pt" size="2"><property></font>
<font style="font-size: 11pt" size="2"> <name>dfs.ha.fencing.ssh.connect-timeout</name></font>
<font style="font-size: 11pt" size="2"> <value>30000</value></font><font style="font-size: 11pt" size="2">
</font><font style="font-size: 11pt" size="2"> </property></font>
<font style="font-size: 11pt" size="2"> <property></font><font style="font-size: 11pt" size="2">
</font><font style="font-size: 11pt" size="2"> <name>ha.zookeeper.quorum</name></font>
<font style="font-size: 11pt" size="2"> <value>zkHost1:2181,zkHost2:2181,zkHost3:2181</value></font>
<font style="font-size: 11pt" size="2"> </property></font><font style="font-size: 11pt" size="2">
</font>
</pre>
<h2>hdfs-site.xml:<br />
</h2>
<pre class="text-body-indent-western"><font style="font-size: 11pt" size="2"> <property></font>
<font style="font-size: 11pt" size="2"> <name>dfs.nameservices</name></font>
<font style="font-size: 11pt" size="2"> <value>nameserviceA,nameserviceB</value></font><font style="font-size: 11pt" size="2">
</font> <font style="font-size: 11pt" size="2"></property></font><font style="font-size: 11pt" size="2">
</font> <font style="font-size: 11pt" size="2"><property></font><font style="font-size: 11pt" size="2">
</font><font style="font-size: 11pt" size="2"> <name>dfs.ha.namenodes.nameserviceA</name></font>
<font style="font-size: 11pt" size="2"> <value>nn1,nn2</value></font><font style="font-size: 11pt" size="2">
</font> <font style="font-size: 11pt" size="2"></property></font><font style="font-size: 11pt" size="2">
</font> <font style="font-size: 11pt" size="2"><property></font>
<font style="font-size: 11pt" size="2"> <name>dfs.ha.namenodes.nameserviceB</name></font><font style="font-size: 11pt" size="2"></font>
<font style="font-size: 11pt" size="2"> <value>nn3,nn4</value></font>
<font style="font-size: 11pt" size="2"></property></font><font style="font-size: 11pt" size="2">
</font> <font style="font-size: 11pt" size="2"><property></font>
<font style="font-size: 11pt" size="2"> <name>dfs.namenode.rpc-address.nameserviceA.nn1</name></font><font style="font-size: 11pt" size="2"></font>
<font style="font-size: 11pt" size="2"> <value>host1:8020</value></font><font style="font-size: 11pt" size="2">
</font> <font style="font-size: 11pt" size="2"></property></font>
<font style="font-size: 11pt" size="2"><property></font>
<font style="font-size: 11pt" size="2"> <name>dfs.namenode.rpc-address.nameserviceA.nn2</name></font>
<font style="font-size: 11pt" size="2"> <value>host2:8020</value></font><font style="font-size: 11pt" size="2">
</font> <font style="font-size: 11pt" size="2"></property></font><font style="font-size: 11pt" size="2">
</font> <font style="font-size: 11pt" size="2"><property></font>
<font style="font-size: 11pt" size="2"> <name>dfs.namenode.http-address.nameserviceA.nn1</name></font>
<font style="font-size: 11pt" size="2"> <value>host1:50070</value></font><font style="font-size: 11pt" size="2">
</font> <font style="font-size: 11pt" size="2"></property></font><font style="font-size: 11pt" size="2">
</font> <font style="font-size: 11pt" size="2"><property></font>
<font style="font-size: 11pt" size="2"> <name>dfs.namenode.http-address.nameserviceA.nn2</name></font>
<font style="font-size: 11pt" size="2"> <value>host2:50070</value></font><font style="font-size: 11pt" size="2">
</font> <font style="font-size: 11pt" size="2"></property></font><font style="font-size: 11pt" size="2">
</font> <font style="font-size: 11pt" size="2"><property></font>
<font style="font-size: 11pt" size="2"> <name>dfs.namenode.rpc-address.nameserviceB.nn3</name></font>
<font style="font-size: 11pt" size="2"> <value>host3:8020</value></font><font style="font-size: 11pt" size="2">
</font> <font style="font-size: 11pt" size="2"></property></font><font style="font-size: 11pt" size="2">
</font> <font style="font-size: 11pt" size="2"><property></font>
<font style="font-size: 11pt" size="2"> <name>dfs.namenode.rpc-address.nameserviceB.nn4</name></font>
<font style="font-size: 11pt" size="2"> <value>host4:8020</value></font>
<font style="font-size: 11pt" size="2"> </property></font>
<font style="font-size: 11pt" size="2"> <property></font>
<font style="font-size: 11pt" size="2"> <name>dfs.namenode.http-address.nameserviceB.nn3</name></font>
<font style="font-size: 11pt" size="2"> <value>host3:50070</value></font><font style="font-size: 11pt" size="2">
</font> <font style="font-size: 11pt" size="2"></property></font><font style="font-size: 11pt" size="2">
</font> <font style="font-size: 11pt" size="2"><property></font>
<font style="font-size: 11pt" size="2"> <name>dfs.namenode.http-address.nameserviceB.nn4</name></font>
<font style="font-size: 11pt" size="2"> <value>host4:50070</value></font>
<font style="font-size: 11pt" size="2"> </property></font>
<font style="font-size: 11pt" size="2"> <property></font>
<font style="font-size: 11pt" size="2"> <name>dfs.namenode.shared.edits.dir.nameserviceA.nn1</name></font>
<font style="font-size: 11pt" size="2"> <value>qjournal://jHost1:8485;jHost2:8485;jHost3:8485/nameserviceA</value></font>
<font style="font-size: 11pt" size="2"> </property></font>
<font style="font-size: 11pt" size="2"> <property></font>
<font style="font-size: 11pt" size="2"> <name>dfs.namenode.shared.edits.dir.nameserviceA.nn2</name></font>
<font style="font-size: 11pt" size="2"> <value>qjournal://jHost1:8485;jHost2:8485;jHost3:8485/nameserviceA</value></font>
<font style="font-size: 11pt" size="2"> </property></font>
<font style="font-size: 11pt" size="2"> <property></font>
<font style="font-size: 11pt" size="2"> <name>dfs.namenode.shared.edits.dir.nameserviceB.nn3</name></font>
<font style="font-size: 11pt" size="2"> <value>qjournal://jHost1:8485;jHost2:8485;jHost3:8485/nameserviceB</value></font>
<font style="font-size: 11pt" size="2"> </property></font>
<font style="font-size: 11pt" size="2"> <property></font>
<font style="font-size: 11pt" size="2"> <name>dfs.namenode.shared.edits.dir.nameserviceB.nn4</name></font>
<font style="font-size: 11pt" size="2"> <value>qjournal://jHost1:8485;jHost2:8485;jHost3:8485/nameserviceB</value></font>
<font style="font-size: 11pt" size="2"> </property></font>
<font style="font-size: 11pt" size="2"> <property></font>
<font style="font-size: 11pt" size="2"> <name>dfs.client.failover.proxy.provider.nameserviceA</name></font>
<font style="font-size: 11pt" size="2"> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value></font>
<font style="font-size: 11pt" size="2"> </property></font>
<font style="font-size: 11pt" size="2"> <property></font>
<font style="font-size: 11pt" size="2"> <name>dfs.client.failover.proxy.provider.nameserviceB</name></font>
<font style="font-size: 11pt" size="2"> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value></font>
<font style="font-size: 11pt" size="2"> </property></font>
<font style="font-size: 11pt" size="2"> <property></font>
<font style="font-size: 11pt" size="2"> <name>dfs.ha.automatic-failover.enabled.nameserviceA</name></font>
<font style="font-size: 11pt" size="2"> <value>true</value></font>
<font style="font-size: 11pt" size="2"> </property></font>
<font style="font-size: 11pt" size="2"><property></font>
<font style="font-size: 11pt" size="2"> <name>dfs.ha.automatic-failover.enabled.nameserviceB</name></font>
<font style="font-size: 11pt" size="2"> <value>true</value></font>
<font style="font-size: 11pt" size="2"> </property></font><font style="font-size: 11pt" size="2">
</font>
</pre>
<pre></pre>
<h2 class="text-body-indent-western">accumulo-site.xml:</h2>
<pre class="text-body-indent-western"><font style="font-size: 11pt" size="2"> <property></font>
<font style="font-size: 11pt" size="2"> <name>instance.volumes</name></font>
<font style="font-size: 11pt" size="2"> <value>hdfs://nameserviceA/accumulo,hdfs://nameserviceB/accumulo</value></font>
<font style="font-size: 11pt" size="2"> </property></font></pre>