2016/04/19/kudu-0-8-0-predicate-improvements.html (218 lines of code) (raw):

<!DOCTYPE html> <html lang="en"> <head> <meta charset="utf-8" /> <meta http-equiv="X-UA-Compatible" content="IE=edge" /> <meta name="viewport" content="width=device-width, initial-scale=1" /> <!-- The above 3 meta tags *must* come first in the head; any other head content must come *after* these tags --> <meta name="description" content="A new open source Apache Hadoop ecosystem project, Apache Kudu completes Hadoop's storage layer to enable fast analytics on fast data" /> <meta name="author" content="Cloudera" /> <title>Apache Kudu - Predicate Improvements in Kudu 0.8</title> <!-- Bootstrap core CSS --> <link rel="stylesheet" href="/css/bootstrap.min.css"/> <!-- Custom styles for this template --> <link href="/css/kudu.css" rel="stylesheet"/> <link href="/css/asciidoc.css" rel="stylesheet"/> <link rel="shortcut icon" href="/img/logo-favicon.ico" /> <link rel="stylesheet" href="/css/font-awesome.min.css" /> <link rel="alternate" type="application/atom+xml" title="RSS Feed for Apache Kudu blog" href="/feed.xml" /> </head> <body> <div class="kudu-site container-fluid"> <!-- Static navbar --> <nav class="navbar navbar-default"> <div class="container-fluid"> <div class="navbar-header"> <button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#navbar" aria-expanded="false" aria-controls="navbar"> <span class="sr-only">Toggle navigation</span> <span class="icon-bar"></span> <span class="icon-bar"></span> <span class="icon-bar"></span> </button> <a class="logo" href="/"><img src="/img/apachekudu_logo_0716_80px.png" srcset="/img/apachekudu_logo_0716_80px.png 1x, /img/apachekudu_logo_0716_160px.png 2x" alt="Apache Kudu"/></a> </div> <div id="navbar" class="collapse navbar-collapse"> <ul class="nav navbar-nav navbar-right"> <li > <a href="/">Home</a> </li> <li > <a href="/overview.html">Overview</a> </li> <li > <a href="/docs/">Documentation</a> </li> <li > <a href="/releases/">Releases</a> </li> <li class="active"> <a href="/blog/">Blog</a> </li> <!-- NOTE: this dropdown menu does not appear on Mobile, so don't add anything here that doesn't also appear elsewhere on the site. --> <li class="dropdown"> <a href="/community.html" role="button" aria-haspopup="true" aria-expanded="false">Community <span class="caret"></span></a> <ul class="dropdown-menu"> <li class="dropdown-header">GET IN TOUCH</li> <li><a class="icon email" href="/community.html">Mailing Lists</a></li> <li><a class="icon slack" href="https://join.slack.com/t/getkudu/shared_invite/zt-244b4zvki-hB1q9IbAk6CqHNMZHvUALA">Slack Channel</a></li> <li role="separator" class="divider"></li> <li><a href="/community.html#meetups-user-groups-and-conference-presentations">Events and Meetups</a></li> <li><a href="/committers.html">Project Committers</a></li> <li><a href="/ecosystem.html">Ecosystem</a></li> <!--<li><a href="/roadmap.html">Roadmap</a></li>--> <li><a href="/community.html#contributions">How to Contribute</a></li> <li role="separator" class="divider"></li> <li class="dropdown-header">DEVELOPER RESOURCES</li> <li><a class="icon github" href="https://github.com/apache/incubator-kudu">GitHub</a></li> <li><a class="icon gerrit" href="http://gerrit.cloudera.org:8080/#/q/status:open+project:kudu">Gerrit Code Review</a></li> <li><a class="icon jira" href="https://issues.apache.org/jira/browse/KUDU">JIRA Issue Tracker</a></li> <li role="separator" class="divider"></li> <li class="dropdown-header">SOCIAL MEDIA</li> <li><a class="icon twitter" href="https://twitter.com/ApacheKudu">Twitter</a></li> <li><a href="https://www.reddit.com/r/kudu/">Reddit</a></li> <li role="separator" class="divider"></li> <li class="dropdown-header">APACHE SOFTWARE FOUNDATION</li> <li><a href="https://www.apache.org/security/" target="_blank">Security</a></li> <li><a href="https://www.apache.org/foundation/sponsorship.html" target="_blank">Sponsorship</a></li> <li><a href="https://www.apache.org/foundation/thanks.html" target="_blank">Thanks</a></li> <li><a href="https://www.apache.org/licenses/" target="_blank">License</a></li> </ul> </li> <li > <a href="/faq.html">FAQ</a> </li> </ul><!-- /.nav --> </div><!-- /#navbar --> </div><!-- /.container-fluid --> </nav> <div class="row header"> <div class="col-lg-12"> <h2><a href="/blog">Apache Kudu Blog</a></h2> </div> </div> <div class="row-fluid"> <div class="col-lg-9"> <article> <header> <h1 class="entry-title">Predicate Improvements in Kudu 0.8</h1> <p class="meta">Posted 19 Apr 2016 by Dan Burkert</p> </header> <div class="entry-content"> <p>The recently released Kudu version 0.8 ships with a host of new improvements to scan predicates. Performance and usability have been improved, especially for tables taking advantage of <a href="http://kudu.apache.org/docs/schema_design.html#data-distribution">advanced partitioning options</a>.</p> <!--more--> <h2 id="scan-optimizations-in-the-server-and-c-client">Scan Optimizations in the Server and C++ Client</h2> <p>The server and C++ client have gotten more sophisticated in how they handle and optimize scan constraints. Constraints specified in the predicates and lower and upper bound primary keys are better unified, resulting in more predicates being pushed into primary key bounds, which can turn full table scans with predicates into much more efficient bounded scans.</p> <p>Additionally, the server and C++ client now recognize more opportunities to prune entire tablets during scans. For example, for the following schema and query Kudu will now be able to skip scanning 15 out of the 16 tablets in the table:</p> <figure class="highlight"><pre><code class="language-sql" data-lang="sql"><span class="c1">-- create a table with 16 tablets</span> <span class="k">CREATE</span> <span class="k">TABLE</span> <span class="n">users</span> <span class="p">(</span><span class="n">id</span> <span class="n">INT64</span><span class="p">,</span> <span class="n">name</span> <span class="n">STRING</span><span class="p">,</span> <span class="n">address</span> <span class="n">STRING</span><span class="p">)</span> <span class="n">DISTRIBUTE</span> <span class="k">BY</span> <span class="n">HASH</span> <span class="p">(</span><span class="n">id</span><span class="p">)</span> <span class="k">INTO</span> <span class="mi">16</span> <span class="n">BUCKETS</span><span class="p">;</span> <span class="c1">-- scan over a single tablet</span> <span class="k">SELECT</span> <span class="n">id</span><span class="p">,</span> <span class="n">name</span><span class="p">,</span> <span class="n">address</span> <span class="k">FROM</span> <span class="n">users</span> <span class="k">WHERE</span> <span class="n">id</span> <span class="o">=</span> <span class="mi">876932</span><span class="p">;</span></code></pre></figure> <p>For a deeper look at the newly implemented scan and partition pruning optimizations, see the associated <a href="https://github.com/apache/incubator-kudu/blob/master/docs/design-docs/scan-optimization-partition-pruning.md">design document</a>. These optimizations will eventually be incorporated into the Java client as well, but until that time they are still used on the server side for scans initiated by Java clients. If you would like to help with this effort, let us know on the <a href="https://issues.apache.org/jira/browse/KUDU-1065">JIRA issue</a>.</p> <h2 id="redesigned-predicate-api-in-the-java-client">Redesigned Predicate API in the Java Client</h2> <p>The Java client has a new way to express scan predicates: the <a href="http://kudu.apache.org/apidocs/org/kududb/client/KuduPredicate.html"><code class="language-plaintext highlighter-rouge">KuduPredicate</code></a>. The API matches the corresponding C++ API more closely, and adds support for specifying exclusive, as well as inclusive, range predicates. The existing <a href="http://kudu.apache.org/apidocs/org/kududb/client/ColumnRangePredicate.html"><code class="language-plaintext highlighter-rouge">ColumnRangePredicate</code></a> API has been deprecated, and will be removed soon. Example of transitioning from the old to new API:</p> <figure class="highlight"><pre><code class="language-java" data-lang="java"><span class="nc">ColumnSchema</span> <span class="n">myIntColumnSchema</span> <span class="o">=</span> <span class="o">...;</span> <span class="nc">KuduScanner</span><span class="o">.</span><span class="na">KuduScannerBuilder</span> <span class="n">scannerBuilder</span> <span class="o">=</span> <span class="o">...;</span> <span class="c1">// Old predicate API</span> <span class="nc">ColumnRangePredicate</span> <span class="n">predicate</span> <span class="o">=</span> <span class="k">new</span> <span class="nc">ColumnRangePredicate</span><span class="o">(</span><span class="n">myIntColumnSchema</span><span class="o">);</span> <span class="n">predicate</span><span class="o">.</span><span class="na">setLowerBound</span><span class="o">(</span><span class="mi">20</span><span class="o">);</span> <span class="n">scannerBuilder</span><span class="o">.</span><span class="na">addColumnRangePredicate</span><span class="o">(</span><span class="n">predicate</span><span class="o">);</span> <span class="c1">// New predicate API</span> <span class="n">scannerBuilder</span><span class="o">.</span><span class="na">newPredicate</span><span class="o">(</span> <span class="nc">KuduPredicate</span><span class="o">.</span><span class="na">newComparisonPredicate</span><span class="o">(</span><span class="n">myIntColumnSchema</span><span class="o">,</span> <span class="nc">ComparisonOp</span><span class="o">.</span><span class="na">GREATER_EQUAL</span><span class="o">,</span> <span class="mi">20</span><span class="o">));</span></code></pre></figure> <h2 id="under-the-covers-changes">Under the Covers Changes</h2> <p>The scan optimizations in the server and C++ client, and the new <code class="language-plaintext highlighter-rouge">KuduPredicate</code> API in the Java client are made possible by an overhaul of how predicates are handled internally. A new protobuf message type, <a href="https://github.com/apache/incubator-kudu/blob/master/src/kudu/common/common.proto#L273"><code class="language-plaintext highlighter-rouge">ColumnPredicatePB</code></a> has been introduced, and will allow more column predicate types to be introduced in the future. If you are interested in contributing to Kudu but don’t know where to start, consider adding a new predicate type; for example the <code class="language-plaintext highlighter-rouge">IS NULL</code>, <code class="language-plaintext highlighter-rouge">IS NOT NULL</code>, <code class="language-plaintext highlighter-rouge">IN</code>, and <code class="language-plaintext highlighter-rouge">LIKE</code> predicates types are currently not implemented.</p> </div> </article> </div> <div class="col-lg-3 recent-posts"> <h3>Recent posts</h3> <ul> <li> <a href="/2024/11/13/apache-kudu-1-17-1-release.html">Apache Kudu 1.17.1 Released</a> </li> <li> <a href="/2024/03/07/introducing-auto-incrementing-column.html">Introducing Auto-incrementing Column in Kudu</a> </li> <li> <a href="/2023/09/07/apache-kudu-1-17-0-released.html">Apache Kudu 1.17.0 Released</a> </li> <li> <a href="/2022/06/17/apache-kudu-1-16-0-released.html">Apache Kudu 1.16.0 Released</a> </li> <li> <a href="/2021/06/22/apache-kudu-1-15-0-released.html">Apache Kudu 1.15.0 Released</a> </li> <li> <a href="/2021/01/28/apache-kudu-1-14-0-release.html">Apache Kudu 1.14.0 Released</a> </li> <li> <a href="/2021/01/15/bloom-filter-predicate.html">Optimized joins & filtering with Bloom filter predicate in Kudu</a> </li> <li> <a href="/2020/09/21/apache-kudu-1-13-0-release.html">Apache Kudu 1.13.0 released</a> </li> <li> <a href="/2020/08/11/fine-grained-authz-ranger.html">Fine-Grained Authorization with Apache Kudu and Apache Ranger</a> </li> <li> <a href="/2020/07/30/building-near-real-time-big-data-lake.html">Building Near Real-time Big Data Lake</a> </li> <li> <a href="/2020/05/18/apache-kudu-1-12-0-release.html">Apache Kudu 1.12.0 released</a> </li> <li> <a href="/2019/11/20/apache-kudu-1-11-1-release.html">Apache Kudu 1.11.1 released</a> </li> <li> <a href="/2019/11/20/apache-kudu-1-10-1-release.html">Apache Kudu 1.10.1 released</a> </li> <li> <a href="/2019/07/09/apache-kudu-1-10-0-release.html">Apache Kudu 1.10.0 Released</a> </li> <li> <a href="/2019/04/30/location-awareness.html">Location Awareness in Kudu</a> </li> </ul> </div> </div> <footer class="footer"> <div class="row"> <div class="col-md-9"> <p class="small"> Copyright &copy; 2023 The Apache Software Foundation. </p> <p class="small"> Apache Kudu, Kudu, Apache, the Apache feather logo, and the Apache Kudu project logo are either registered trademarks or trademarks of The Apache Software Foundation in the United States and other countries. </p> </div> <div class="col-md-3"> <a class="pull-right" href="https://www.apache.org/events/current-event.html"> <img src="https://www.apache.org/events/current-event-234x60.png"/> </a> </div> </div> </footer> </div> <script src="/js/jquery.min.js"></script> <script> // Try to detect touch-screen devices. Note: Many laptops have touch screens. $(document).ready(function() { if ("ontouchstart" in document.documentElement) { $(document.documentElement).addClass("touch"); } else { $(document.documentElement).addClass("no-touch"); } }); </script> <script src="/js/bootstrap.min.js"></script> <script src="/js/anchor.js"></script> <script> anchors.options = { placement: 'right', visible: 'touch', }; anchors.add(); </script> </body> </html>