Background

  • •The Challenge
  • •Sketch Origins
  • •Sketch Elements
  • •Presentations
  • •Overview Slide Deck
  • Architecture And Design

  • •The Major Sketch Families
  • •Large Scale Computing
  • •Key Features
  • •Sketch Features Matrix
  • •Components
  • •Sketches by Component
  • •Sketch Criteria
  • Memory Component

  • •Memory Component
  • •Memory Component Performance
  • •Notes on Order Sensitivity
  • •Notes on Concurrency
  • Sketch Families

    Distinct Counting

  • •Features Matrix
  • •Figures-of-Merit Comparison
  • CPC Sketches

  • •CPC Sketch
  • •CPC Sketch Performance
  • CPC Examples

  • •CPC Sketch Java Example
  • •CPC Sketch C++ Example
  • •CPC Sketch Pig UDFs
  • •CPC Sketch Hive UDFs
  • HyperLogLog Sketches

  • •HLL Sketch
  • •HLL Map Sketch
  • HLL Examples

  • •HLL Sketch Java Example
  • •HLL Sketch C++ Example
  • •HLL Sketch Pig UDFs
  • •HLL Sketch Hive UDFs
  • HLL Studies

  • •HLL Sketch Performance
  • •HLL vs Clearspring HLL++
  • •HLL Sketch vs Druid HyperLogLogCollector
  • Theta Sketches

  • •Theta Sketch Framework
  • Theta Examples

  • •Concurrent Theta Sketch
  • •Theta Sketch Java Example
  • •Theta Sketch Spark Example
  • •Theta Sketch Pig UDFs
  • •Theta Sketch Hive UDFs
  • KMV Tutorial

  • •The Inverse Estimate
  • •Empty Sketch
  • •First Estimator
  • •Better Estimator
  • •Rejection Rules
  • •Update V(kth) Rule
  • Set Operations and P-sampling

  • •Set Operations
  • •Model & Test Set Operations
  • •p-Sampling
  • Accuracy

  • •Basic Accuracy
  • •Accuracy Plots
  • •Relative Error Table
  • •SetOp Accuracy
  • •Unions With Different k
  • Size

  • •Theta Sketch Size
  • Speed

  • •Update Speed
  • •Merge Speed
  • Theta Sketch Theory

  • •Theta Sketch Framework (PDF)
  • •Theta Sketch Equations (PDF)
  • •DataSketches (PDF)
  • •Confidence Intervals Notes
  • •Merging Algorithm Notes
  • •Theta References
  • Tuple Sketches

  • •Tuple Overview
  • Tuple Examples

  • •Tuple Java Example
  • •Tuple Engagement Example
  • •Tuple Pig UDFs
  • •Tuple Hive UDFs
  • Most Frequent

  • •Frequency Sketches Overview
  • Frequent Item Sketches

  • •Frequent Items Overview
  • •Frequent Items Error Table
  • •Frequent Items References
  • •Frequent Items Performance
  • Most Frequent Examples

  • •Frequent Items Java Example
  • •Frequent Items C++ Example
  • •Frequent Items Pig UDFs
  • •Frequent Items Hive UDFs
  • Frequent Distinct Sketches

  • •Frequent Distinct Tuples Sketch
  • Quantiles And Histograms

  • •Quantiles and Ranks Tutorial
  • •Quantiles Overview
  • •KLL Floats sketch
  • •KLL Sketch Accuracy and Size
  • •REQ Floats sketch
  • •Original QuantilesSketch
  • Quantiles Examples

  • •Quantiles Sketch Java Example
  • •KLL Quantiles Sketch C++ Example
  • •Quantiles Sketch Pig UDFs
  • •Quantiles Sketch Hive UDFs
  • Quantiles Studies

  • •KLL sketch vs t-digest
  • •Druid Approximate Histogram
  • •Moments Sketch Study
  • •Quantiles StreamA Study
  • •Exact Quantiles for Studies
  • Quantiles Sketch Theory

  • •Optimal Quantile Approximation in Streams
  • •Quantiles References
  • Sampling

  • •Reservoir Sampling
  • •Reservoir Sampling Performance
  • •VarOpt Sampling
  • Sampling Examples

  • •Reservoir Sampling Java Example
  • •Reservoir Sampling Pig UDFs
  • •VarOpt Sampling Java Example
  • •VarOpt Sampling Pig UDFs
  • System Integrations

  • •Using Sketches in ApacheDruid
  • •Using Sketches in Apache Hive
  • •Using Sketches in Apache Pig
  • •Using Sketches in PostgreSQL
  • Community

  • •Community
  • •Downloads
  • •Committer Process
  • •Release Process For CPP Components
  • •Release Process For Java Components
  • •Transitioning from prior GitHub Site
  • Research

  • •Research