blog/2011/05/20/druid-part-deux.html

<!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8" /> <meta name="viewport" content="width=device-width, initial-scale=1.0"> <meta name="description" content="Apache Druid"> <meta name="keywords" content="druid,kafka,database,analytics,streaming,real-time,real time,apache,open source"> <meta name="author" content="Apache Software Foundation"> <title>Druid | Druid, Part Deux: Three Principles for Fast, Distributed OLAP</title> <link rel="alternate" type="application/atom+xml" href="/feed"> <link rel="shortcut icon" href="/img/favicon.png"> <link rel="stylesheet" href="/assets/css/font-awesome-5.css"> <link href='//fonts.googleapis.com/css?family=Open+Sans+Condensed:300,700,300italic|Open+Sans:300italic,400italic,600italic,400,300,600,700' rel='stylesheet' type='text/css'> <link rel="stylesheet" href="/css/bootstrap-pure.css?v=1.1"> <link rel="stylesheet" href="/css/base.css?v=1.1"> <link rel="stylesheet" href="/css/header.css?v=1.1"> <link rel="stylesheet" href="/css/footer.css?v=1.1"> <link rel="stylesheet" href="/css/syntax.css?v=1.1"> <link rel="stylesheet" href="/css/docs.css?v=1.1"> <script> (function() { var cx = '000162378814775985090:molvbm0vggm'; var gcse = document.createElement('script'); gcse.type = 'text/javascript'; gcse.async = true; gcse.src = (document.location.protocol == 'https:' ? 'https:' : 'http:') + '//cse.google.com/cse.js?cx=' + cx; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(gcse, s); })(); </script> </head> <body>  <script src="//ajax.googleapis.com/ajax/libs/jquery/2.2.4/jquery.min.js"></script> <div class="top-navigator"> <div class="container"> <div class="left-cont"> <a class="logo" href="/"><span class="druid-logo"></span></a> </div> <div class="right-cont"> <ul class="links"> <li class=""><a href="/technology">Technology</a></li> <li class=""><a href="/use-cases">Use Cases</a></li> <li class=""><a href="/druid-powered">Powered By</a></li> <li class=""><a href="/docs/latest/design/">Docs</a></li> <li class=""><a href="/community/">Community</a></li> <li class="header-dropdown"> <a>Apache</a> <div class="header-dropdown-menu"> <a href="https://www.apache.org/" target="_blank">Foundation</a> <a href="https://www.apache.org/events/current-event" target="_blank">Events</a> <a href="https://www.apache.org/licenses/" target="_blank">License</a> <a href="https://www.apache.org/foundation/thanks.html" target="_blank">Thanks</a> <a href="https://www.apache.org/security/" target="_blank">Security</a> <a href="https://www.apache.org/foundation/sponsorship.html" target="_blank">Sponsorship</a> </div> </li> <li class=" button-link"><a href="/downloads.html">Download</a></li> </ul> </div> </div> <div class="action-button menu-icon"> <span class="fa fa-bars"></span> MENU </div> <div class="action-button menu-icon-close"> <span class="fa fa-times"></span> MENU </div> </div> <script type="text/javascript"> var $menu = $('.right-cont'); var $menuIcon = $('.menu-icon'); var $menuIconClose = $('.menu-icon-close'); function showMenu() { $menu.fadeIn(100); $menuIcon.fadeOut(100); $menuIconClose.fadeIn(100); } $menuIcon.click(showMenu); function hideMenu() { $menu.fadeOut(100); $menuIconClose.fadeOut(100); $menuIcon.fadeIn(100); } $menuIconClose.click(hideMenu); $(window).resize(function() { if ($(window).width() >= 840) { $menu.fadeIn(100); $menuIcon.fadeOut(100); $menuIconClose.fadeOut(100); } else { $menu.fadeOut(100); $menuIcon.fadeIn(100); $menuIconClose.fadeOut(100); } }); </script>  <link rel="stylesheet" href="/css/blogs.css"> <div class="blog druid-header"> <div class="row"> <div class="col-md-8 col-md-offset-2"> <div class="title-image-wrap"> <div class="title-spacer"></div> <img class="title-image" src="http://metamarkets.com/wp-content/uploads/2011/05/toyota-sized-470x288.jpg" alt="Druid, Part Deux: Three Principles for Fast, Distributed OLAP"/> </div> </div> </div> </div> <div class="container blog"> <div class="row"> <div class="col-md-8 col-md-offset-2"> <div class="blog-entry"> <h1>Druid, Part Deux: Three Principles for Fast, Distributed OLAP</h1> <p class="text-muted">by <span class="author text-uppercase">Eric Tschetter</span> · May 20, 2011</p> <p>In a <a href="/blog/2011/04/30/introducing-druid.html">previous blog post</a> we introduced the distributed indexing and query processing infrastructure we call Druid. In that post, we characterized the performance and scaling challenges that motivated us to build this system in the first place. Here, we discuss three design principles underpinning its architecture.</p> <p><strong>1. Partial Aggregates + In-Memory + Indexes => Fast Queries</strong></p> <p>We work with two representations of our data: <em>alpha</em> represents the raw, unaggregated event logs, while <em>beta</em> is its partially aggregated derivative. This <em>beta</em> is the basis against which all further queries are evaluated:</p> <div class="highlight"><pre><code class="language-text" data-lang="text"><span></span>2011-01-01T01:00:00Z ultratrimfast.com google.com Male USA 1800 25 15.70 2011-01-01T01:00:00Z bieberfever.com google.com Male USA 2912 42 29.18 2011-01-01T02:00:00Z ultratrimfast.com google.com Male UK 1953 17 17.31 2011-01-01T02:00:00Z bieberfever.com google.com Male UK 3194 170 34.01 </code></pre></div> <p>This is the most compact representation that preserves the finest grain of data, while enabling on-the-fly computation of all O(2^n) possible dimensional roll-ups.</p> <p>The key to Druid’s speed is maintaining the <em>beta</em> data entirely in memory. Full scans are several orders of magnitude faster in memory than via disk. What we lose in having to compute roll-ups on the fly, we make up for with speed.</p> <p>To support drill-downs on specific dimensions (such as results for only ‘bieberfever.com’), we maintain a set of inverted indices. This allows for fast calculation (using AND & OR operations) of rows matching a search query. The inverted index enables us to scan a limited subset of rows to compute final query results – and these scans are themselves distributed, as we discuss next.</p> <p><strong>2. Distributed Data + Parallelizable Queries => Horizontal Scalability</strong></p> <p>Druid’s performance depends on having memory — lots of it. We achieve the requisite memory scale by dynamically distributing data across a cluster of nodes. As the data set grows, we can horizontally expand by adding more machines.</p> <p>To facilitate rebalancing, we take chunks of <em>beta</em> data and index them into segments based on time ranges. For high cardinality dimensions, distributing by time isn’t enough (we generally try to keep segments no larger than 20M rows), so we have introduced partitioning. We store metadata about segments within the query layer and partitioning logic within the segment generation code.</p> <p>We persist these segments in a storage system (currently S3) that is accessible from all nodes. If a node goes down, <a href="http://zookeeper.apache.org/">Zookeeper</a> coordinates the remaining live nodes to reconstitute the missing <em>beta</em> set.</p> <p>Downstream clients of the API are insulated from this rebalancing: Druid’s query API seamlessly handles changes in cluster topology.</p> <p>Queries against the Druid cluster are perfectly horizontal. We limited the aggregation operations we support – count, mean, variance and other parametric statistics – that are inherently parallelizable. While less parallelizable operations, such as median, are not supported, this limitation is offset by rich support of histogram and higher-order moment stores. The co-location of processing with in-memory data on each node reduces network load and dramatically improves performance.</p> <p>This architecture provides a number of extra benefits:</p> <ul> <li>Segments are read-only, so they can simultaneously serve multiple servers. If we have a hotspot in a particular index, we can replicate that index to multiple servers and load balance across them.</li> <li>We can provide tiered classes of service for our data, with servers occupying different points in the “query latency vs. data size” spectrum</li> <li>Our clusters can span data center boundaries</li> </ul> <p><strong>3. Real-Time Analytics: Immutable Past, Append-Only Future</strong></p> <p>Our system for real-time analytics is centered, naturally, on time. Because past events happen once and never change, they need not be re-writable. We need only be able to append new events.</p> <p>For real-time analytics, we have an event stream that flows into a set of real-time indexers. These are servers that advertise responsibility for the most recent 60 minutes of data and nothing more. They aggregate the real-time feed and periodically push an index segment to our storage system. The segment then gets loaded into memory of a standard server, and is flushed from the real-time indexer.</p> <p>Similarly, for long-range historical data that we want to make available, but not keep hot, we have deep-history servers. These use a memory mapping strategy for addressing segments, rather than loading them all into memory. This provides access to long-range data while maintaining the high-performance that our customers expect for near-term data.</p> <h2 id="summary">Summary</h2> <p>Druid’s power resides in providing users fast, arbitrarily deep exploration of large-scale transaction data. Queries over billions of rows, that previously took minutes or hours to run, can now be investigated directly with sub-second response times.</p> <p>We believe that the performance, scalability, and unification of real-time and historical data that Druid provides could be of broader interest. As such, we plan to open source our code base in the coming year.</p> </div> </div> </div> </div>  <footer class="druid-footer"> <div class="container"> <div class="text-center"> <p> <a href="/technology">Technology</a>&ensp;·&ensp; <a href="/use-cases">Use Cases</a>&ensp;·&ensp; <a href="/druid-powered">Powered by Druid</a>&ensp;·&ensp; <a href="/docs/latest/">Docs</a>&ensp;·&ensp; <a href="/community/">Community</a>&ensp;·&ensp; <a href="/downloads.html">Download</a>&ensp;·&ensp; <a href="/faq">FAQ</a> </p> </div> <div class="text-center"> <a title="Join the user group" href="https://groups.google.com/forum/#!forum/druid-user" target="_blank"><span class="fa fa-comments"></span></a>&ensp;·&ensp; <a title="Follow Druid" href="https://twitter.com/druidio" target="_blank"><span class="fab fa-twitter"></span></a>&ensp;·&ensp; <a title="GitHub" href="https://github.com/apache/druid" target="_blank"><span class="fab fa-github"></span></a> </div> <div class="text-center license"> Copyright © 2020 <a href="https://www.apache.org/" target="_blank">Apache Software Foundation</a>.<br> Except where otherwise noted, licensed under <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">CC BY-SA 4.0</a>.<br> Apache Druid, Druid, and the Druid logo are either registered trademarks or trademarks of The Apache Software Foundation in the United States and other countries. </div> </div> </footer> <script async src="https://www.googletagmanager.com/gtag/js?id=UA-131010415-1"></script> <script> window.dataLayer = window.dataLayer || []; function gtag(){dataLayer.push(arguments);} gtag('js', new Date()); gtag('config', 'UA-131010415-1'); </script> <script> function trackDownload(type, url) { ga('send', 'event', 'download', type, url); } </script> <script src="//code.jquery.com/jquery.min.js"></script> <script src="//maxcdn.bootstrapcdn.com/bootstrap/3.2.0/js/bootstrap.min.js"></script> <script src="/assets/js/druid.js"></script>  </body> </html>

blog/2011/05/20/druid-part-deux.html (239 lines of code) (raw):