Ingesting JSON Data Into Apache Kudu with StreamSets Data Collector

Posted 14 Apr 2016 by Pat Patterson

At the Hadoop Summit in Dublin this week, Ted Malaska, Principal Solutions Architect at Cloudera, and I presented Ingest and Stream Processing - What Will You Choose?, looking at the big data streaming landscape with a focus on ingest. The session closed with a demo of StreamSets Data Collector, the open source graphical IDE for building ingest pipelines.

In the demo, I built a pipeline to read JSON data from Apache Kafka, augmented the data in JavaScript, and wrote the resulting records to both Apache Kudu (incubating) for analysis and Apache Kafka for visualization.

Here’s a recording of the session:

YouTube video

The Apache Kudu destination is new in StreamSets Data Collector, released this week and available for download.

Learn more about StreamSets at

About the Author

Pat Patterson was recently hired as Community Champion at StreamSets. Prior to StreamSets, Pat was a developer evangelist at Salesforce, focused on identity, integration and the Internet of Things and, before that, managed the OpenSSO community at Sun Microsystems. Pat enjoys hacking code at every level - from kernel drivers in C to web front ends in JavaScript.