_posts/2018-11-27-announcing-the-release-of-samza.html (158 lines of code) (raw):
---
layout: post
status: PUBLISHED
published: true
title: Announcing the release of Samza 1.0
id: aa488579-6e6d-424d-bda5-140d085dd6f2
date: '2018-11-27 09:26:32 -0500'
categories: samza
tags: []
permalink: samza/entry/announcing-the-release-of-samza
---
<p>We’re thrilled to announce to the release of Apache Samza 1.0.</p>
<p>Today Samza forms the backbone of hundreds of real-time production<br />
applications across a multitude of companies, such as LinkedIn, VMWare,<br />
Slack, Redfin among many others. This release of Samza adds a variety of<br />
features and capabilities to Samza’s existing arsenal, coupled with new<br />
and improved <em>documentation</em>, <em>code</em> snippets, <em>examples</em>, and a<br />
brand-new <em>website design</em>! Here are a few selected highlights:</p>
<ul>
<li>
<p><strong>Stable</strong> high level APIs that allow creating complex processing<br />
pipelines with ease.</p>
</li>
<li>
<p><strong>Beam Samza Runner</strong> now marries Beam’s best in class support for<br />
EventTime based windowed processing and sophisticated triggering<br />
with Samza’s stable and scalable stateful processing model.</p>
</li>
<li>
<p><strong>Table API</strong> that provides a common abstraction for accessing<br />
remote or local databases. Developers are now able to “join” an<br />
input event stream with such a Table.</p>
</li>
<li>
<p>Integration <strong>Test Framework</strong> to enable effortless testing of Samza<br />
jobs without deploying a Kafka, Yarn, or Zookeeper cluster.</p>
</li>
<li>
<p>Support for <strong>Apache Log4j2</strong> allowing improved logging performance,<br />
customization, and efficiency.</p>
</li>
<li>
<p><strong>Upgraded Kafka</strong> client and consumer.</p>
</li>
<li>
<p>An interactive <strong>shell for Samza SQL</strong> for seamless formulation,<br />
development, and testing of SamzaSQL queries.</p>
</li>
<li>
<p><strong>Side-input</strong> support that allows using log-compacted data sources<br />
to populate KV state for Samza applications.</p>
</li>
<li>
<p><strong>An improved website</strong> with detailed documentation and lots of code<br />
samples!</p>
</li>
</ul>
<p>In addition, Samza 1.0 brings numerous bug-fixes, upgrades, and<br />
improvements listed <a href="#jiralist">below</a>.</p>
<h2 id="new-features">New features</h2>
<p>Samza 1.0 brings full-feature support for the following:</p>
<h3 id="improved-stable-high-level-apis">Improved Stable High Level APIs</h3>
<p>Samza 1.0 brings <em>Descriptor APIs</em> that allows applications to specify<br />
their input and output <em>systems</em> and <em>streams</em> in code. Samza’s new<br />
<em>Context APIs</em> provide applications unified access to job-level,<br />
container-level, task-level, and application-level context and<br />
capabilities. This also simplifies Samza’s <em>ApplicationRunner</em><br />
interface.</p>
<p>This API evolution requires a few simple modifications to application<br />
code, which we describe in detail in our <a href="#upgradesteps">upgrade steps</a></p>
<h4 id="beam-runner-support">Beam Runner Support</h4>
<p>Samza’s Beam Runner enables executing Beam pipelines over Samza. This<br />
enables Samza applications to create complex processing pipelines that<br />
require event-time based processing, varying types of event-time based<br />
windowing, and more. This feature is supported in both the YARN and<br />
standalone deployment models.</p>
<h4 id="joining-streams-and-tables">Joining Streams and Tables</h4>
<p>Samza’s Table API provides developers with unified access to local<br />
and remote data sources such as <em>Key-Value stores</em> or <em>web services</em>,<br />
while providing features such as <em>rate-limiting, throttling,</em> and<br />
<em>caching</em> capabilities. This provides first-class API primitives for<br />
building Stream-Table join jobs. Learn more about the use, semantics,<br />
and examples for Table API <a href="http://TODO">here</a>.</p>
<h4 id="test-samza-without-zk-yarn-or-kafka">Test Samza without <em>ZK, Yarn</em> or <em>Kafka</em></h4>
<p>Samza 1.0 brings a test framework that allows testing Samza applications<br />
using <em>in-memory</em> input and output. Users can now setup test and<br />
testing pipelines for their applications without needing to setup any<br />
other services, such as Kafka, YARN, or Zookeeper.</p>
<h4 id="log4j2-support">Log4J2 support</h4>
<p>Samza now supports Apache Log4j 2 for system and application logging.<br />
Log4j 2 is an upgrade to Log4j that provides significant improvements<br />
over its predecessor, Log4j 1.x, such as better throughput and latency,<br />
custom log levels, and a pluggable logging architecture.</p>
<h4 id="kafka-upgrade">Kafka upgrade</h4>
<p>This release upgrades Samza to use Kafka’s high-level consumer (Kafka<br />
v0.11.1.62). This brings latency and throughput benefits for Samza<br />
applications that consume from Kafka, in addition to bug-fixes. This<br />
also means Samza applications can now better their utilization of the<br />
underlying Kafka cluster.</p>
<h4 id="samzasql-shell">SamzaSQL Shell</h4>
<p>SamzaSQL now provides a shell for users to type-in their SQL queries,<br />
while Samza does the heavy-lifting of wiring the inputs and outputs, and<br />
sizing the application in the background. This is great for testing and<br />
experimenting with queries while formulating your application-logic,<br />
specially suited for data-scientists and tinkerers.</p>
<h4 id="side-inputs">Side-inputs</h4>
<p>Samza 1.0 brings the ability to leverage existing log-compacted data<br />
sources (e.g., Kafka topics) to populate KV state for Samza<br />
applications. If your data processing pipeline involves Hadoop-to-Kafka<br />
push, this feature alleviates the need for your Samza job to create<br />
separate Kafka-topics to back KV state.</p>
<h4 id="improved-website-documentation-and-samples">Improved website, documentation, and samples</h4>
<p>We’ve re-designed the Samza website making it easier to find details on<br />
key Samza concepts and patterns. All documentation has been revised and<br />
rewritten, keeping in mind the feedback we got from our customers. We’ve<br />
revised and added sample application code to showcase Samza 1.0 and the<br />
use of its new APIs.</p>
<p><!--------------------------------------------------------------------------></p>
<h2 id="enhancements-and-upgrades"><a name="jiralist"></a> Enhancements and Upgrades</h2>
<p>This release brings the following enhancements, upgrades, and<br />
capabilities:</p>
<h3 id="api-enhancements-and-simplifications">API enhancements and simplifications</h3>
<p>SAMZA-1789: unify ApplicationDescriptor and ApplicationRunner for high-<br />
and low-level APIs in YARN and standalone environment</p>
<p>SAMZA-1804: System and stream descriptors</p>
<p>SAMZA-1858: Public APIs for shared context</p>
<p>SAMZA-1763: Add async methods to Table API</p>
<p>SAMZA-1786: Introduce the metadata store abstraction</p>
<p>SAMZA-1859: Zookeeper implementation of MetadataStore</p>
<p>SAMZA-1788: Add the LocationIdProvider abstraction</p>
<h3 id="upgrades-and-bug-fixes">Upgrades and Bug-fixes</h3>
<p>SAMZA-1768: Handle corrupted OFFSET file</p>
<p>SAMZA-1817: Long classpath support for non-split deployments<br />
SAMZA-1719: Add caching support to table-API</p>
<p>SAMZA-1783: Add Log4j2 functionality in Samza</p>
<p>SAMZA-1868: Refactor KafkaSystemAdmin from using SimpleConsumer</p>
<p>SAMZA-1776: Refactor KafkaSystemConsumer to remove the usage of<br />
deprecated SimpleConsumer client</p>
<p>SAMZA-1730: Adding state validation in StreamProcessor before any<br />
lifecycle operation and group coordination</p>
<p>SAMZA-1695: Clear events in ScheduleAfterDebounceTime on session<br />
expiration</p>
<p>SAMZA-1647: Fix race conditions in StreamProcessor</p>
<p>SAMZA-1371: Some Samza Containers get stuck at \“Starting BrokerProxy\”</p>
<p>SAMZA-1648: Integration Test Framework & Collection Stream Impl</p>
<p>SAMZA-1748: Failure tests in the standalone deployment</p>
<p>A source download of Samza 1.0 is available <a href="https://dist.apache.org/repos/dist/release/samza/1.0.0/">here</a>, and in Apache’s Maven repository. </p>
<h2 id="community"><a name="jiralist"></a> Community Developments</h2>
<p>A <a href="https://www.meetup.com/Stream-Processing-Meetup-LinkedIn/events/251481797">symposium</a><br />
on Stream processing with Apache Samza and Apache Kafka was held on July<br />
19th and on October 23rd. Both were attended by more than 350<br />
participants from across the industry. It featured in-depth talks on<br />
Samza’s Beam integration, its use at LinkedIn for real-time<br />
notifications, a talk on Kafka-replication at Uber, and Kafka cruise<br />
control, and many others.</p>
<p>Samza was also the focus of a talk at <a href="https://www.youtube.com/watch?v=2y8QImf-RpI">Strange Loop'18</a>,<br />
focussing in depth on its scalability, performance, extensibility, and<br />
programmability.</p>