2016/06/24/multi-master-1-0-0.html (246 lines of code) (raw):

<!DOCTYPE html> <html lang="en"> <head> <meta charset="utf-8" /> <meta http-equiv="X-UA-Compatible" content="IE=edge" /> <meta name="viewport" content="width=device-width, initial-scale=1" /> <!-- The above 3 meta tags *must* come first in the head; any other head content must come *after* these tags --> <meta name="description" content="A new open source Apache Hadoop ecosystem project, Apache Kudu completes Hadoop's storage layer to enable fast analytics on fast data" /> <meta name="author" content="Cloudera" /> <title>Apache Kudu - Master fault tolerance in Kudu 1.0</title> <!-- Bootstrap core CSS --> <link rel="stylesheet" href="/css/bootstrap.min.css"/> <!-- Custom styles for this template --> <link href="/css/kudu.css" rel="stylesheet"/> <link href="/css/asciidoc.css" rel="stylesheet"/> <link rel="shortcut icon" href="/img/logo-favicon.ico" /> <link rel="stylesheet" href="/css/font-awesome.min.css" /> <link rel="alternate" type="application/atom+xml" title="RSS Feed for Apache Kudu blog" href="/feed.xml" /> </head> <body> <div class="kudu-site container-fluid"> <!-- Static navbar --> <nav class="navbar navbar-default"> <div class="container-fluid"> <div class="navbar-header"> <button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#navbar" aria-expanded="false" aria-controls="navbar"> <span class="sr-only">Toggle navigation</span> <span class="icon-bar"></span> <span class="icon-bar"></span> <span class="icon-bar"></span> </button> <a class="logo" href="/"><img src="/img/apachekudu_logo_0716_80px.png" srcset="/img/apachekudu_logo_0716_80px.png 1x, /img/apachekudu_logo_0716_160px.png 2x" alt="Apache Kudu"/></a> </div> <div id="navbar" class="collapse navbar-collapse"> <ul class="nav navbar-nav navbar-right"> <li > <a href="/">Home</a> </li> <li > <a href="/overview.html">Overview</a> </li> <li > <a href="/docs/">Documentation</a> </li> <li > <a href="/releases/">Releases</a> </li> <li class="active"> <a href="/blog/">Blog</a> </li> <!-- NOTE: this dropdown menu does not appear on Mobile, so don't add anything here that doesn't also appear elsewhere on the site. --> <li class="dropdown"> <a href="/community.html" role="button" aria-haspopup="true" aria-expanded="false">Community <span class="caret"></span></a> <ul class="dropdown-menu"> <li class="dropdown-header">GET IN TOUCH</li> <li><a class="icon email" href="/community.html">Mailing Lists</a></li> <li><a class="icon slack" href="https://join.slack.com/t/getkudu/shared_invite/zt-244b4zvki-hB1q9IbAk6CqHNMZHvUALA">Slack Channel</a></li> <li role="separator" class="divider"></li> <li><a href="/community.html#meetups-user-groups-and-conference-presentations">Events and Meetups</a></li> <li><a href="/committers.html">Project Committers</a></li> <li><a href="/ecosystem.html">Ecosystem</a></li> <!--<li><a href="/roadmap.html">Roadmap</a></li>--> <li><a href="/community.html#contributions">How to Contribute</a></li> <li role="separator" class="divider"></li> <li class="dropdown-header">DEVELOPER RESOURCES</li> <li><a class="icon github" href="https://github.com/apache/incubator-kudu">GitHub</a></li> <li><a class="icon gerrit" href="http://gerrit.cloudera.org:8080/#/q/status:open+project:kudu">Gerrit Code Review</a></li> <li><a class="icon jira" href="https://issues.apache.org/jira/browse/KUDU">JIRA Issue Tracker</a></li> <li role="separator" class="divider"></li> <li class="dropdown-header">SOCIAL MEDIA</li> <li><a class="icon twitter" href="https://twitter.com/ApacheKudu">Twitter</a></li> <li><a href="https://www.reddit.com/r/kudu/">Reddit</a></li> <li role="separator" class="divider"></li> <li class="dropdown-header">APACHE SOFTWARE FOUNDATION</li> <li><a href="https://www.apache.org/security/" target="_blank">Security</a></li> <li><a href="https://www.apache.org/foundation/sponsorship.html" target="_blank">Sponsorship</a></li> <li><a href="https://www.apache.org/foundation/thanks.html" target="_blank">Thanks</a></li> <li><a href="https://www.apache.org/licenses/" target="_blank">License</a></li> </ul> </li> <li > <a href="/faq.html">FAQ</a> </li> </ul><!-- /.nav --> </div><!-- /#navbar --> </div><!-- /.container-fluid --> </nav> <div class="row header"> <div class="col-lg-12"> <h2><a href="/blog">Apache Kudu Blog</a></h2> </div> </div> <div class="row-fluid"> <div class="col-lg-9"> <article> <header> <h1 class="entry-title">Master fault tolerance in Kudu 1.0</h1> <p class="meta">Posted 24 Jun 2016 by Adar Dembo</p> </header> <div class="entry-content"> <p>This blog post describes how the 1.0 release of Apache Kudu (incubating) will support fault tolerance for the Kudu master, finally eliminating Kudu’s last single point of failure.</p> <!--more--> <p>As those of you who follow this blog know by now, replication is a signature feature in Kudu. Replication is used to provide fault tolerance for all loaded data. By implementing the Raft consensus protocol, Kudu guarantees that a tablet replicated <strong>2N+1</strong> times can tolerate up to <strong>N</strong> failures.</p> <p>What you may not know is that Kudu replicates its metadata, too. That is, the Kudu master stores all table and tablet metadata in a single “master” tablet. As a regular Kudu tablet itself, this master tablet may be replicated with Raft. As such, the Kudu master is a special kind of tablet server whose primary job is to host a single replica of the master tablet.</p> <p>When we launched Kudu’s first beta, support for replicated masters had been implemented but was too fragile to be anything but experimental. One of our goals for Kudu’s 1.0 release is to improve replicated master support so that it can be safely enabled in production clusters.</p> <h1 id="how-master-replication-works">How master replication works</h1> <p>To use replicated masters, a Kudu operator must deploy some number of Kudu masters, providing the hostname and port number of each master in the group via the <code class="language-plaintext highlighter-rouge">--master_address</code> command line option. For example, each master in a three-node deployment should be started with <code class="language-plaintext highlighter-rouge">--master_address=&lt;host1:port1&gt;,&lt;host2:port2&gt;&lt;host3:port3&gt;</code>. In Raft parlance, this group of masters is known as a <em>Raft configuration</em>.</p> <p>At startup, a Raft configuration of masters will hold a leader election and elect one master as the leader. The leader master is responsible for servicing both tablet server heartbeats as well as client requests. The remaining masters are followers: they participate in Raft consensus and replicate writes sent by the leader, but are otherwise idle. Any client requests they receive are rejected. Likewise, all tablet server heartbeats they receive are ignored. If the leader master ever dies or steps down, the remaining replicas hold an election to determine the new leader.</p> <p>All persistent master metadata is stored in the single replicated “master” tablet. Every row in this tablet represents either a table or a tablet. Table records include unique table identifiers, the table’s schema, and other bits of information. Tablet records include a unique identifier, the tablet’s Raft configuration, and other information.</p> <p>What master metadata is replicated?</p> <ol> <li>Table and tablet existence, via <strong>CreateTable()</strong> and <strong>DeleteTable()</strong>. Every new tablet record also includes an initial Raft configuration.</li> <li>Schema changes, via <strong>AlterTable()</strong> and tablet server heartbeats.</li> <li>Tablet server Raft configuration changes, via tablet server heartbeats. These include both the list of Raft peers (may have changed due to under-replication) as well as the current leader (may have changed due to an election).</li> </ol> <p>Scanning the master tablet to service every heartbeat or client request would be slow, so the leader master caches all master metadata in memory. The caches are only updated after a metadata change is successfully replicated; in this way they are always consistent with the on-disk tablet. When a new leader master is elected, it scans the entire master tablet and uses the metadata to rebuild its in-memory caches.</p> <h1 id="communication-with-replicated-masters">Communication with replicated masters</h1> <p>All tablet servers start up with location information for the entire master Raft configuration and will periodically heartbeat to every master. Similarly, clients are also configured with the locations of all masters. Unlike tablet servers, they always communicate with the leader master as follower masters will reject client requests. To do this, clients must determine which master is the leader before sending the first request as well as whenever any request fails with a <code class="language-plaintext highlighter-rouge">NOT_THE_LEADER</code> error.</p> <h1 id="remaining-work-for-kudu-10">Remaining work for Kudu 1.0</h1> <p><a href="https://issues.apache.org/jira/browse/KUDU-422">KUDU-422</a> tracks the remaining master replication work. The guts of this feature have been implemented as far back as early 2015; the remaining work has been focused on fixing bugs that manifest only under specific conditions. For example, we’ve observed failures in DDL operations (e.g. <strong>CreateTable()</strong>) that only materialize upon the completion of a master leader election. These failures highlight some of the gaps in our testing regimen: we need a robust stress test that repeatedly performs such operations while holding master leader elections.</p> <p>That said, there is one remaining work item of larger scope: there’s no mechanism with which to perform a Raft configuration change for replicated masters. Such a mechanism would have multiple uses:</p> <ol> <li>Migrating from a single-node master deployment to a fully replicated three-node (or five-node) deployment.</li> <li>Replacing a failed master with a new one.</li> </ol> <p>This is being tracked by <a href="https://issues.apache.org/jira/browse/KUDU-1474">KUDU-1474</a>, and there’s been <a href="http://gerrit.cloudera.org:8080/3393">some discussion</a> around a design, but nothing has been implemented yet. Stay tuned!</p> </div> </article> </div> <div class="col-lg-3 recent-posts"> <h3>Recent posts</h3> <ul> <li> <a href="/2024/11/13/apache-kudu-1-17-1-release.html">Apache Kudu 1.17.1 Released</a> </li> <li> <a href="/2024/03/07/introducing-auto-incrementing-column.html">Introducing Auto-incrementing Column in Kudu</a> </li> <li> <a href="/2023/09/07/apache-kudu-1-17-0-released.html">Apache Kudu 1.17.0 Released</a> </li> <li> <a href="/2022/06/17/apache-kudu-1-16-0-released.html">Apache Kudu 1.16.0 Released</a> </li> <li> <a href="/2021/06/22/apache-kudu-1-15-0-released.html">Apache Kudu 1.15.0 Released</a> </li> <li> <a href="/2021/01/28/apache-kudu-1-14-0-release.html">Apache Kudu 1.14.0 Released</a> </li> <li> <a href="/2021/01/15/bloom-filter-predicate.html">Optimized joins & filtering with Bloom filter predicate in Kudu</a> </li> <li> <a href="/2020/09/21/apache-kudu-1-13-0-release.html">Apache Kudu 1.13.0 released</a> </li> <li> <a href="/2020/08/11/fine-grained-authz-ranger.html">Fine-Grained Authorization with Apache Kudu and Apache Ranger</a> </li> <li> <a href="/2020/07/30/building-near-real-time-big-data-lake.html">Building Near Real-time Big Data Lake</a> </li> <li> <a href="/2020/05/18/apache-kudu-1-12-0-release.html">Apache Kudu 1.12.0 released</a> </li> <li> <a href="/2019/11/20/apache-kudu-1-11-1-release.html">Apache Kudu 1.11.1 released</a> </li> <li> <a href="/2019/11/20/apache-kudu-1-10-1-release.html">Apache Kudu 1.10.1 released</a> </li> <li> <a href="/2019/07/09/apache-kudu-1-10-0-release.html">Apache Kudu 1.10.0 Released</a> </li> <li> <a href="/2019/04/30/location-awareness.html">Location Awareness in Kudu</a> </li> </ul> </div> </div> <footer class="footer"> <div class="row"> <div class="col-md-9"> <p class="small"> Copyright &copy; 2023 The Apache Software Foundation. </p> <p class="small"> Apache Kudu, Kudu, Apache, the Apache feather logo, and the Apache Kudu project logo are either registered trademarks or trademarks of The Apache Software Foundation in the United States and other countries. </p> </div> <div class="col-md-3"> <a class="pull-right" href="https://www.apache.org/events/current-event.html"> <img src="https://www.apache.org/events/current-event-234x60.png"/> </a> </div> </div> </footer> </div> <script src="/js/jquery.min.js"></script> <script> // Try to detect touch-screen devices. Note: Many laptops have touch screens. $(document).ready(function() { if ("ontouchstart" in document.documentElement) { $(document.documentElement).addClass("touch"); } else { $(document.documentElement).addClass("no-touch"); } }); </script> <script src="/js/bootstrap.min.js"></script> <script src="/js/anchor.js"></script> <script> anchors.options = { placement: 'right', visible: 'touch', }; anchors.add(); </script> </body> </html>