2019/04/22/fine-grained-authorization-with-apache-kudu-and-impala.html (264 lines of code) (raw):

<!DOCTYPE html> <html lang="en"> <head> <meta charset="utf-8" /> <meta http-equiv="X-UA-Compatible" content="IE=edge" /> <meta name="viewport" content="width=device-width, initial-scale=1" /> <!-- The above 3 meta tags *must* come first in the head; any other head content must come *after* these tags --> <meta name="description" content="A new open source Apache Hadoop ecosystem project, Apache Kudu completes Hadoop's storage layer to enable fast analytics on fast data" /> <meta name="author" content="Cloudera" /> <title>Apache Kudu - Fine-Grained Authorization with Apache Kudu and Impala</title> <!-- Bootstrap core CSS --> <link rel="stylesheet" href="/css/bootstrap.min.css"/> <!-- Custom styles for this template --> <link href="/css/kudu.css" rel="stylesheet"/> <link href="/css/asciidoc.css" rel="stylesheet"/> <link rel="shortcut icon" href="/img/logo-favicon.ico" /> <link rel="stylesheet" href="/css/font-awesome.min.css" /> <link rel="alternate" type="application/atom+xml" title="RSS Feed for Apache Kudu blog" href="/feed.xml" /> </head> <body> <div class="kudu-site container-fluid"> <!-- Static navbar --> <nav class="navbar navbar-default"> <div class="container-fluid"> <div class="navbar-header"> <button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#navbar" aria-expanded="false" aria-controls="navbar"> <span class="sr-only">Toggle navigation</span> <span class="icon-bar"></span> <span class="icon-bar"></span> <span class="icon-bar"></span> </button> <a class="logo" href="/"><img src="/img/apachekudu_logo_0716_80px.png" srcset="/img/apachekudu_logo_0716_80px.png 1x, /img/apachekudu_logo_0716_160px.png 2x" alt="Apache Kudu"/></a> </div> <div id="navbar" class="collapse navbar-collapse"> <ul class="nav navbar-nav navbar-right"> <li > <a href="/">Home</a> </li> <li > <a href="/overview.html">Overview</a> </li> <li > <a href="/docs/">Documentation</a> </li> <li > <a href="/releases/">Releases</a> </li> <li class="active"> <a href="/blog/">Blog</a> </li> <!-- NOTE: this dropdown menu does not appear on Mobile, so don't add anything here that doesn't also appear elsewhere on the site. --> <li class="dropdown"> <a href="/community.html" role="button" aria-haspopup="true" aria-expanded="false">Community <span class="caret"></span></a> <ul class="dropdown-menu"> <li class="dropdown-header">GET IN TOUCH</li> <li><a class="icon email" href="/community.html">Mailing Lists</a></li> <li><a class="icon slack" href="https://join.slack.com/t/getkudu/shared_invite/zt-244b4zvki-hB1q9IbAk6CqHNMZHvUALA">Slack Channel</a></li> <li role="separator" class="divider"></li> <li><a href="/community.html#meetups-user-groups-and-conference-presentations">Events and Meetups</a></li> <li><a href="/committers.html">Project Committers</a></li> <li><a href="/ecosystem.html">Ecosystem</a></li> <!--<li><a href="/roadmap.html">Roadmap</a></li>--> <li><a href="/community.html#contributions">How to Contribute</a></li> <li role="separator" class="divider"></li> <li class="dropdown-header">DEVELOPER RESOURCES</li> <li><a class="icon github" href="https://github.com/apache/incubator-kudu">GitHub</a></li> <li><a class="icon gerrit" href="http://gerrit.cloudera.org:8080/#/q/status:open+project:kudu">Gerrit Code Review</a></li> <li><a class="icon jira" href="https://issues.apache.org/jira/browse/KUDU">JIRA Issue Tracker</a></li> <li role="separator" class="divider"></li> <li class="dropdown-header">SOCIAL MEDIA</li> <li><a class="icon twitter" href="https://twitter.com/ApacheKudu">Twitter</a></li> <li><a href="https://www.reddit.com/r/kudu/">Reddit</a></li> <li role="separator" class="divider"></li> <li class="dropdown-header">APACHE SOFTWARE FOUNDATION</li> <li><a href="https://www.apache.org/security/" target="_blank">Security</a></li> <li><a href="https://www.apache.org/foundation/sponsorship.html" target="_blank">Sponsorship</a></li> <li><a href="https://www.apache.org/foundation/thanks.html" target="_blank">Thanks</a></li> <li><a href="https://www.apache.org/licenses/" target="_blank">License</a></li> </ul> </li> <li > <a href="/faq.html">FAQ</a> </li> </ul><!-- /.nav --> </div><!-- /#navbar --> </div><!-- /.container-fluid --> </nav> <div class="row header"> <div class="col-lg-12"> <h2><a href="/blog">Apache Kudu Blog</a></h2> </div> </div> <div class="row-fluid"> <div class="col-lg-9"> <article> <header> <h1 class="entry-title">Fine-Grained Authorization with Apache Kudu and Impala</h1> <p class="meta">Posted 22 Apr 2019 by Grant Henke</p> </header> <div class="entry-content"> <p>Note: This is a cross-post from the Cloudera Engineering Blog <a href="https://blog.cloudera.com/blog/2019/04/fine-grained-authorization-with-apache-kudu-and-impala/">Fine-Grained Authorization with Apache Kudu and Impala</a></p> <p>Apache Impala supports fine-grained authorization via Apache Sentry on all of the tables it manages including Apache Kudu tables. Given Impala is a very common way to access the data stored in Kudu, this capability allows users deploying Impala and Kudu to fully secure the Kudu data in multi-tenant clusters even though Kudu does not yet have native fine-grained authorization of its own. This solution works because Kudu natively supports coarse-grained (all or nothing) authorization which enables blocking all access to Kudu directly except for the impala user and an optional whitelist of other trusted users. This post will describe how to use Apache Impala’s fine-grained authorization support along with Apache Kudu’s coarse-grained authorization to achieve a secure multi-tenant deployment.</p> <!--more--> <h2 id="sample-workflow">Sample Workflow</h2> <p>The examples in this post enable a workflow that uses Apache Spark to ingest data directly into Kudu and Impala to run analytic queries on that data. The Spark job, run as the <code class="language-plaintext highlighter-rouge">etl_service</code> user, is permitted to access the Kudu data via coarse-grained authorization. Even though this gives access to all the data in Kudu, the <code class="language-plaintext highlighter-rouge">etl_service</code> user is only used for scheduled jobs or by an administrator. All queries on the data, from a wide array of users, will use Impala and leverage Impala’s fine-grained authorization. Impala’s <a href="https://impala.apache.org/docs/build/html/topics/impala_grant.html"><code class="language-plaintext highlighter-rouge">GRANT</code> statements</a> allow you to flexibly control the privileges on the Kudu storage tables. Impala’s fine-grained privileges along with support for <a href="https://impala.apache.org/docs/build/html/topics/impala_select.html"><code class="language-plaintext highlighter-rouge">SELECT</code></a>, <a href="https://impala.apache.org/docs/build/html/topics/impala_insert.html"><code class="language-plaintext highlighter-rouge">INSERT</code></a>, <a href="https://impala.apache.org/docs/build/html/topics/impala_update.html"><code class="language-plaintext highlighter-rouge">UPDATE</code></a>, <a href="https://impala.apache.org/docs/build/html/topics/impala_upsert.html"><code class="language-plaintext highlighter-rouge">UPSERT</code></a>, and <a href="https://impala.apache.org/docs/build/html/topics/impala_delete.html"><code class="language-plaintext highlighter-rouge">DELETE</code></a> statements, allow you to finely control who can read and write data to your Kudu tables while using Impala. Below is a diagram showing the workflow described:</p> <p><img src="/img/fine-grained-authorization-with-apache-kudu.png" alt="png" class="img-responsive" /></p> <p><em>Note</em>: The examples below assume that Authorization has already been configured for Kudu, Impala, and Spark. For help configuring authorization see the Cloudera <a href="https://www.cloudera.com/documentation/enterprise/latest/topics/sg_auth_overview.html">authorization documentation</a>.</p> <h2 id="configuring-kudus-coarse-grained-authorization">Configuring Kudu’s Coarse-Grained Authorization</h2> <p>Kudu supports coarse-grained authorization of client requests based on the authenticated client Kerberos principal. The two levels of access which can be configured are:</p> <ul> <li><em>Superuser</em> – principals authorized as a superuser are able to perform certain administrative functionality such as using the kudu command line tool to diagnose or repair cluster issues.</li> <li><em>User</em> – principals authorized as a user are able to access and modify all data in the Kudu cluster. This includes the ability to create, drop, and alter tables as well as read, insert, update, and delete data.</li> </ul> <p>Access levels are granted using whitelist-style Access Control Lists (ACLs), one for each of the two levels. Each access control list either specifies a comma-separated list of users, or may be set to <code class="language-plaintext highlighter-rouge">*</code> to indicate that all authenticated users are able to gain access at the specified level.</p> <p><em>Note</em>: The default value for the User ACL is <code class="language-plaintext highlighter-rouge">*</code>, which allows all users access to the cluster.</p> <h3 id="example-configuration">Example Configuration</h3> <p>The first and most important step is to remove the default ACL of <code class="language-plaintext highlighter-rouge">*</code> from Kudu’s <a href="https://kudu.apache.org/docs/configuration_reference.html#kudu-master_user_acl"><code class="language-plaintext highlighter-rouge">–user_acl</code> configuration</a>. This will ensure only the users you list will have access to the Kudu cluster. Then, to allow the Impala service to access all of the data in Kudu, the Impala service user, usually impala, should be added to the Kudu <code class="language-plaintext highlighter-rouge">–user_acl</code> configuration. Any user that is not using Impala will also need to be added to this list. For example, an Apache Spark job might be used to load data directly into Kudu. Generally, a single user is used to run scheduled jobs of applications that do not support fine-grained authorization on their own. For this example, that user is <code class="language-plaintext highlighter-rouge">etl_service</code>. The full <code class="language-plaintext highlighter-rouge">–user_acl</code> configuration is:</p> <figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nt">--user_acl</span><span class="o">=</span>impala,etl_service</code></pre></figure> <p>For more details see the Kudu <a href="https://kudu.apache.org/docs/security.html#_coarse_grained_authorization">authorization documentation</a>.</p> <h2 id="using-impalas-fine-grained-authorization">Using Impala’s Fine-Grained Authorization</h2> <p>Follow Impala’s <a href="https://impala.apache.org/docs/build/html/topics/impala_authorization.html">authorization documentation</a> to configure fine-grained authorization. Once configured, you can use Impala’s <a href="https://impala.apache.org/docs/build/html/topics/impala_grant.html"><code class="language-plaintext highlighter-rouge">GRANT</code> statements</a> to control the privileges of Kudu tables. These fine-grained privileges can be set at the database, table and column level. Additionally you can individually control <code class="language-plaintext highlighter-rouge">SELECT</code>, <code class="language-plaintext highlighter-rouge">INSERT</code>, <code class="language-plaintext highlighter-rouge">CREATE</code>, <code class="language-plaintext highlighter-rouge">ALTER</code>, and <code class="language-plaintext highlighter-rouge">DROP</code> privileges.</p> <p><em>Note</em>: A user needs the <code class="language-plaintext highlighter-rouge">ALL</code> privilege in order to run <code class="language-plaintext highlighter-rouge">DELETE</code>, <code class="language-plaintext highlighter-rouge">UPDATE</code>, or <code class="language-plaintext highlighter-rouge">UPSERT</code> statements against a Kudu table.</p> <p>Below is a brief example with a couple tables stored in Kudu:</p> <figure class="highlight"><pre><code class="language-sql" data-lang="sql"><span class="k">CREATE</span> <span class="k">TABLE</span> <span class="n">messages</span> <span class="p">(</span> <span class="n">name</span> <span class="n">STRING</span><span class="p">,</span> <span class="nb">time</span> <span class="nb">TIMESTAMP</span><span class="p">,</span> <span class="n">message</span> <span class="n">STRING</span><span class="p">,</span> <span class="k">PRIMARY</span> <span class="k">KEY</span><span class="p">(</span><span class="n">name</span><span class="p">,</span> <span class="nb">time</span><span class="p">)</span> <span class="p">)</span> <span class="n">PARTITION</span> <span class="k">BY</span> <span class="n">HASH</span><span class="p">(</span><span class="n">name</span><span class="p">)</span> <span class="n">PARTITIONS</span> <span class="mi">4</span> <span class="n">STORED</span> <span class="k">AS</span> <span class="n">KUDU</span><span class="p">;</span> <span class="k">GRANT</span> <span class="k">ALL</span> <span class="k">ON</span> <span class="k">TABLE</span> <span class="n">messages</span> <span class="k">TO</span> <span class="n">userA</span><span class="p">;</span> <span class="k">CREATE</span> <span class="k">TABLE</span> <span class="n">metrics</span> <span class="p">(</span> <span class="k">host</span> <span class="n">STRING</span> <span class="k">NOT</span> <span class="k">NULL</span><span class="p">,</span> <span class="n">metric</span> <span class="n">STRING</span> <span class="k">NOT</span> <span class="k">NULL</span><span class="p">,</span> <span class="nb">time</span> <span class="n">INT64</span> <span class="k">NOT</span> <span class="k">NULL</span><span class="p">,</span> <span class="n">value</span> <span class="nb">DOUBLE</span> <span class="k">NOT</span> <span class="k">NULL</span><span class="p">,</span> <span class="k">PRIMARY</span> <span class="k">KEY</span> <span class="p">(</span><span class="k">host</span><span class="p">,</span> <span class="n">metric</span><span class="p">,</span> <span class="nb">time</span><span class="p">)</span> <span class="p">)</span> <span class="n">PARTITION</span> <span class="k">BY</span> <span class="n">HASH</span><span class="p">(</span><span class="n">name</span><span class="p">)</span> <span class="n">PARTITIONS</span> <span class="mi">4</span> <span class="n">STORED</span> <span class="k">AS</span> <span class="n">KUDU</span><span class="p">;</span> <span class="k">GRANT</span> <span class="k">ALL</span> <span class="k">ON</span> <span class="k">TABLE</span> <span class="n">messages</span> <span class="k">TO</span> <span class="n">userB</span><span class="p">;</span></code></pre></figure> <h2 id="conclusion">Conclusion</h2> <p>This brief example that combines Kudu’s coarse-grained authorization and Impala’s fine-grained authorization should enable you to meet the security needs of your data workflow today. The pattern described here can be applied to other services and workflows using Kudu as well. For greater authorization flexibility, you can look forward to the near future when Kudu supports native fine-grained authorization on its own. The Apache Kudu contributors understand the importance of native fine-grained authorization and they are working on integrations with Apache Sentry and Apache Ranger.</p> </div> </article> </div> <div class="col-lg-3 recent-posts"> <h3>Recent posts</h3> <ul> <li> <a href="/2024/11/13/apache-kudu-1-17-1-release.html">Apache Kudu 1.17.1 Released</a> </li> <li> <a href="/2024/03/07/introducing-auto-incrementing-column.html">Introducing Auto-incrementing Column in Kudu</a> </li> <li> <a href="/2023/09/07/apache-kudu-1-17-0-released.html">Apache Kudu 1.17.0 Released</a> </li> <li> <a href="/2022/06/17/apache-kudu-1-16-0-released.html">Apache Kudu 1.16.0 Released</a> </li> <li> <a href="/2021/06/22/apache-kudu-1-15-0-released.html">Apache Kudu 1.15.0 Released</a> </li> <li> <a href="/2021/01/28/apache-kudu-1-14-0-release.html">Apache Kudu 1.14.0 Released</a> </li> <li> <a href="/2021/01/15/bloom-filter-predicate.html">Optimized joins & filtering with Bloom filter predicate in Kudu</a> </li> <li> <a href="/2020/09/21/apache-kudu-1-13-0-release.html">Apache Kudu 1.13.0 released</a> </li> <li> <a href="/2020/08/11/fine-grained-authz-ranger.html">Fine-Grained Authorization with Apache Kudu and Apache Ranger</a> </li> <li> <a href="/2020/07/30/building-near-real-time-big-data-lake.html">Building Near Real-time Big Data Lake</a> </li> <li> <a href="/2020/05/18/apache-kudu-1-12-0-release.html">Apache Kudu 1.12.0 released</a> </li> <li> <a href="/2019/11/20/apache-kudu-1-11-1-release.html">Apache Kudu 1.11.1 released</a> </li> <li> <a href="/2019/11/20/apache-kudu-1-10-1-release.html">Apache Kudu 1.10.1 released</a> </li> <li> <a href="/2019/07/09/apache-kudu-1-10-0-release.html">Apache Kudu 1.10.0 Released</a> </li> <li> <a href="/2019/04/30/location-awareness.html">Location Awareness in Kudu</a> </li> </ul> </div> </div> <footer class="footer"> <div class="row"> <div class="col-md-9"> <p class="small"> Copyright &copy; 2023 The Apache Software Foundation. </p> <p class="small"> Apache Kudu, Kudu, Apache, the Apache feather logo, and the Apache Kudu project logo are either registered trademarks or trademarks of The Apache Software Foundation in the United States and other countries. </p> </div> <div class="col-md-3"> <a class="pull-right" href="https://www.apache.org/events/current-event.html"> <img src="https://www.apache.org/events/current-event-234x60.png"/> </a> </div> </div> </footer> </div> <script src="/js/jquery.min.js"></script> <script> // Try to detect touch-screen devices. Note: Many laptops have touch screens. $(document).ready(function() { if ("ontouchstart" in document.documentElement) { $(document.documentElement).addClass("touch"); } else { $(document.documentElement).addClass("no-touch"); } }); </script> <script src="/js/bootstrap.min.js"></script> <script src="/js/anchor.js"></script> <script> anchors.options = { placement: 'right', visible: 'touch', }; anchors.add(); </script> </body> </html>