back arrowView show

Evolving An ETL Pipeline For Better Productivity - Episode 83


Episode description


Building an ETL pipeline can be a significant undertaking, and sometimes it needs to be rebuilt when a better option becomes available. In this episode Aaron Gibralter, director of engineering at Greenhouse, joins Raghu Murthy, founder and CEO of DataCoral, to discuss the journey that he and his team took from an in-house ETL pipeline built out of open source components onto a paid service. He explains how their original implementation was built, why they decided to migrate to a paid service, and how they made that transition. He also discusses how the abstractions provided by DataCoral allows his data scientists to remain productive without requiring dedicated data engineers. If you are either considering how to build a data pipeline or debating whether to migrate your existing ETL to a service this is definitely worth listening to for some perspective.

  • Hello and welcome to the Data Engineering Podcast, the show about modern data management
  • When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With 200Gbit private networking, scalable shared block storage, and a 40Gbit public network, you’ve got everything you need to run a fast, reliable, and bullet-proof data platform. If you need global distribution, they’ve got that covered too with world-wide datacenters including new ones in Toronto and Mumbai. And for your machine learning workloads, they just announced dedicated CPU instances. Go to today to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show!
  • And to keep track of how your team is progressing on building new pipelines and tuning their workflows, you need a project management system designed by engineers, for engineers. Clubhouse lets you craft a workflow that fits your style, including per-team tasks, cross-project epics, a large suite of pre-built integrations, and a simple API for crafting your own. With such an intuitive tool it’s easy to make sure that everyone in the business is on the same page. Data Engineering Podcast listeners get 2 months free on any plan by going to today and signing up for a free trial. Support the show and get your data projects in order!
  • You listen to this show to learn and stay up to date with the ways that Python is being used, including the latest in machine learning and data analysis. For even more opportunities to meet, listen, and learn from your peers you don’t want to miss out on this year’s conference season. We have partnered with organizations such as O’Reilly Media, Dataversity, and the Open Data Science Conference. Coming up this fall is the combined events of Graphorum and the Data Architecture Summit. The agendas have been announced and super early bird registration for up to $300 off is available until July 26th, with early bird pricing for up to $200 off through August 30th. Use the code BNLLC to get an additional 10% off any pass when you register. Go to to learn more and take advantage of our partner discounts when you register.
  • You listen to this show to learn and stay up to date with what’s happening in databases, streaming platforms, big data, and everything else you need to know about modern data management. For even more opportunities to meet, listen, and learn from your peers you don’t want to miss out on this year’s conference season. We have partnered with organizations such as O’Reilly Media, Dataversity, and the Open Data Science Conference. Go to to learn more and take advantage of our partner discounts when you register.
  • Go to to subscribe to the show, sign up for the mailing list, read the show notes, and get in touch.
  • To help other people find the show please leave a review on iTunes and tell your friends and co-workers
  • Join the community in the new Zulip chat workspace at
  • Your host is Tobias Macey and today I’m interviewing Aaron Gibralter and Raghu Murthy about the experience of Greenhouse migrating their data pipeline to DataCoral
  • Introduction
  • How did you get involved in the area of data management?
  • Aaron, can you start by describing what Greenhouse is and some of the ways that you use data?
  • Can you describe your overall data infrastructure and the state of your data pipeline before migrating to DataCoral?
    • What are your primary sources of data and what are the targets that you are loading them into?
  • What were your biggest pain points and what motivated you to re-evaluate your approach to ETL?
    • What were your criteria for your replacement technology and how did you gather and evaluate your options?
  • Once you made the decision to use DataCoral can you talk through the transition and cut-over process?
    • What were some of the unexpected edge cases or shortcomings that you experienced when moving to DataCoral?
    • What were the big wins?
  • What was your evaluation framework for determining whether your re-engineering was successful?
  • Now that you are using DataCoral how would you characterize the experiences of yourself and your team?
    • If you have freed up time for your engineers, how are you allocating that spare capacity?
  • What do you hope to see from DataCoral in the future?
  • What advice do you have for anyone else who is either evaluating a re-architecture of their existing data platform or planning out a greenfield project?
Contact Info Parting Question
  • From your perspective, what is the biggest gap in the tooling or technology for data management today?

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

episodes iconMore Episodes

Navigating Boundless Data Streams With The Swim Kernel - Episode 98

September 18th, 2019


The conventional approach to analytics involves collecting large amounts of data that can be cleaned, followed by a separate step for …

Building A Reliable And Performant Router For Observability Data - Episode 97

September 10th, 2019


The first stage in every data project is collecting information and routing it to a storage system for later analysis. For operational data …

Building A Community For Data Professionals at Data Council - Episode 96

September 2nd, 2019


Data professionals are working in a domain that is rapidly evolving. In order to stay current we need access to deeply technical …

Building Tools And Platforms For Data Analytics - Episode 95

August 26th, 2019


Data engineers are responsible for building tools and platforms to power the workflows of other members of the business. Each group of users …

A High Performance Platform For The Full Big Data Lifecycle - Episode 94

August 19th, 2019


Managing big data projects at scale is a perennial problem, with a wide variety of solutions that have evolved over the past 20 years. One …

Digging Into Data Replication At Fivetran - Episode 93

August 12th, 2019


The extract and load pattern of data replication is the most commonly needed process in data engineering workflows. Because of the myriad sources and destinations that are available, it is also among the most …

Solving Data Discovery At Lyft - Episode 92

August 5th, 2019


Data is only valuable if you use it for something, and the first step is knowing that it is available. As organizations grow and data …

Simplifying Data Integration Through Eventual Connectivity - Episode 91

July 29th, 2019


The ETL pattern that has become commonplace for integrating data from multiple sources has proven useful, but complex to maintain. For a …

Straining Your Data Lake Through A Data Mesh - Episode 90

July 22nd, 2019


The current trend in data management is to centralize the responsibilities of storing and curating the organization’s information to a data engineering team. This organizational pattern is reinforced by the …

Data Labeling That You Can Feel Good About - Episode 89

July 15th, 2019


Successful machine learning and artificial intelligence projects require large volumes of data that is properly labelled. The challenge is …

Scale Your Analytics On The Clickhouse Data Warehouse - Episode 88

July 8th, 2019


The market for data warehouse platforms is large and varied, with options for every use case. ClickHouse is an open source, column-oriented …

Stress Testing Kafka And Cassandra For Real-Time Anomaly Detection - Episode 87

July 2nd, 2019


Anomaly detection is a capability that is useful in a variety of problem domains, including finance, internet of things, and systems …

The Workflow Engine For Data Engineers And Data Scientists - Episode 86

June 25th, 2019


Building a data platform that works equally well for data engineering and data science is a task that requires familiarity with the needs of …

Maintaining Your Data Lake At Scale With Spark - Episode 85

June 17th, 2019


Building and maintaining a data lake is a choose your own adventure of tools, services, and evolving best practices. The flexibility and …

Managing The Machine Learning Lifecycle - Episode 84

June 10th, 2019


Building a machine learning model can be difficult, but that is only half of the battle. Having a perfect model is only useful if you are …

Data Lineage For Your Pipelines - Episode 82

May 27th, 2019


Some problems in data are well defined and benefit from a ready-made set of tools. For everything else, there’s Pachyderm, the platform for …

Build Your Data Analytics Like An Engineer - Episode 81

May 20th, 2019


In recent years the traditional approach to building data warehouses has shifted from transforming records before loading, to transforming …

Using FoundationDB As The Bedrock For Your Distributed Systems - Episode 80

May 7th, 2019


The database market continues to expand, offering systems that are suited to virtually every use case. But what happens if you need something customized to your application? FoundationDB is a distributed …

Running Your Database On Kubernetes With KubeDB - Episode 79

April 29th, 2019


Kubernetes is a driving force in the renaissance around deploying and running applications. However, managing the database layer is still a separate concern. The KubeDB project was created as a way of providing …

Unpacking Fauna: A Global Scale Cloud Native Database - Episode 78

April 22nd, 2019


One of the biggest challenges for any business trying to grow and reach customers globally is how to scale their data storage. FaunaDB is a cloud native database built by the engineers behind Twitter’s …

Index Your Big Data With Pilosa For Faster Analytics - Episode 77

April 15th, 2019


Database indexes are critical to ensure fast lookups of your data, but they are inherently tied to the database engine. Pilosa is rewriting that equation by providing a flexible, scalable, performant engine for …

Serverless Data Pipelines On DataCoral - Episode 76

April 8th, 2019


How much time do you spend maintaining your data pipeline? How much end user value does that provide? Raghu Murthy founded DataCoral as a way to abstract the low level details of ETL so that you can focus on …

Why Analytics Projects Fail And What To Do About It - Episode 75

April 1st, 2019


Analytics projects fail all the time, resulting in lost opportunities and wasted resources. There are a number of factors that contribute to …

Building An Enterprise Data Fabric At CluedIn - Episode 74

March 25th, 2019


Data integration is one of the most challenging aspects of any data platform, especially as the variety of data sources and formats grow. Enterprise organizations feel this acutely due to the silos that occur …

A DataOps vs DevOps Cookoff In The Data Kitchen - Episode 73

March 18th, 2019


Delivering a data analytics project on time and with accurate information is critical to the success of any business. DataOps is a set of practices to increase the probability of success by creating value early …

Customer Analytics At Scale With Segment - Episode 72

March 4th, 2019


Customer analytics is a problem domain that has given rise to its own industry. In order to gain a full understanding of what your users are …

Deep Learning For Data Engineers - Episode 71

February 25th, 2019


Deep learning is the latest class of technology that is gaining widespread interest. As data engineers we are responsible for building and …

The Alluxio Distributed Storage System - Episode 70

February 19th, 2019


Distributed storage systems are the foundational layer of any big data stack. There are a variety of implementations which support different specialized use cases and come with associated tradeoffs. Alluxio is …

Building Machine Learning Projects In The Enterprise - Episode 69

February 11th, 2019


Machine learning is a class of technologies that promise to revolutionize business. Unfortunately, it can be difficult to identify and …

Cleaning And Curating Open Data For Archaeology - Episode 68

February 4th, 2019


Archaeologists collect and create a variety of data as part of their research and exploration. Open Context is a platform for cleaning, …

Managing Database Access Control For Teams With strongDM - Episode 67

January 29th, 2019


Controlling access to a database is a solved problem… right? It can be straightforward for small teams and a small number of storage …

Building Enterprise Big Data Systems At LEGO - Episode 66

January 21st, 2019


Building internal expertise around big data in a large organization is a major competitive advantage. However, it can be a difficult process …

TimescaleDB: The Timeseries Database Built For SQL And Scale - Episode 65

January 14th, 2019


The past year has been an active one for the timeseries market. New products have been launched, more businesses have moved to streaming analytics, and the team at Timescale has been keeping busy. In this …

Performing Fast Data Analytics Using Apache Kudu - Episode 64

January 7th, 2019


The Hadoop platform is purpose built for processing large, slow moving data in long-running batch jobs. As the ecosystem around it has grown, so has the need for fast data analytics on fast moving data. To fill …

Simplifying Continuous Data Processing Using Stream Native Storage In Pravega with Tom Kaitchuck - Episode 63

December 31st, 2018


As more companies and organizations are working to gain a real-time view of their business, they are increasingly turning to stream …

Continuously Query Your Time-Series Data Using PipelineDB with Derek Nelson and Usman Masood - Episode 62

December 24th, 2018


Processing high velocity time-series data in real-time is a complex challenge. The team at PipelineDB has built a continuous query engine that simplifies the task of computing aggregates across incoming streams …

Advice On Scaling Your Data Pipeline Alongside Your Business with Christian Heinzmann - Episode 61

December 17th, 2018


Every business needs a pipeline for their critical data, even if it is just pasting into a spreadsheet. As the organization grows and gains more customers, the requirements for that pipeline will change. In this …

Putting Apache Spark Into Action with Jean Georges Perrin - Episode 60

December 10th, 2018


Apache Spark is a popular and widely used tool for a variety of data oriented projects. With the large array of capabilities, and the …

Apache Zookeeper As A Building Block For Distributed Systems with Patrick Hunt - Episode 59

December 3rd, 2018


Distributed systems are complex to build and operate, and there are certain primitives that are common to a majority of them. Rather then re-implement the same capabilities every time, many projects build on top …

Set Up Your Own Data-as-a-Service Platform On Dremio with Tomer Shiran - Episode 58

November 26th, 2018


When your data lives in multiple locations, belonging to at least as many applications, it is exceedingly difficult to ask complex questions …

Stateful, Distributed Stream Processing on Flink with Fabian Hueske - Episode 57

November 19th, 2018


Modern applications and data platforms aspire to process events and data in real time at scale and with low latency. Apache Flink is a true stream processing engine with an impressive set of capabilities for …

How Upsolver Is Building A Data Lake Platform In The Cloud with Yoni Iny - Episode 56

November 11th, 2018


A data lake can be a highly valuable resource, as long as it is well built and well managed. Unfortunately, that can be a complex and time-consuming effort, requiring specialized knowledge and diverting …

Self Service Business Intelligence And Data Sharing Using Looker with Daniel Mintz - Episode 55

November 5th, 2018


Business intelligence is a necessity for any organization that wants to be able to make informed decisions based on the data that they …

Using Notebooks As The Unifying Layer For Data Roles At Netflix with Matthew Seal - Episode 54

October 29th, 2018


Jupyter notebooks have gained popularity among data scientists as an easy way to do exploratory analysis and build interactive reports. …

Of Checklists, Ethics, and Data with Emily Miller and Peter Bull (Cross Post from Podcast.__init__) - Episode 53

October 22nd, 2018


As data science becomes more widespread and has a bigger impact on the lives of people, it is important that those projects and products are built with a conscious consideration of ethics. Keeping ethical …

Improving The Performance Of Cloud-Native Big Data At Netflix Using The Iceberg Table Format with Ryan Blue - Episode 52

October 15th, 2018


With the growth of the Hadoop ecosystem came a proliferation of implementations for the Hive table format. Unfortunately, with no formal …

Combining Transactional And Analytical Workloads On MemSQL with Nikita Shamgunov - Episode 51

October 9th, 2018


One of the most complex aspects of managing data for analytical workloads is moving it from a transactional database into the data warehouse. What if you didn’t have to do that at all? MemSQL is a distributed …

Building A Knowledge Graph From Public Data At Enigma With Chris Groskopf - Episode 50

October 1st, 2018


There are countless sources of data that are publicly available for use. Unfortunately, combining those sources and making them useful in …

A Primer On Enterprise Data Curation with Todd Walter - Episode 49

September 24th, 2018


As your data needs scale across an organization the need for a carefully considered approach to collection, storage, organization, and access …

Take Control Of Your Web Analytics Using Snowplow With Alexander Dean - Episode 48

September 17th, 2018


Every business with a website needs some way to keep track of how much traffic they are getting, where it is coming from, and which actions …

Keep Your Data And Query It Too Using Chaos Search with Thomas Hazel and Pete Cheslock - Episode 47

September 10th, 2018


Elasticsearch is a powerful tool for storing and analyzing data, but when using it for logs and other time oriented information it can become …

An Agile Approach To Master Data Management with Mark Marinelli - Episode 46

September 3rd, 2018


With the proliferation of data sources to give a more comprehensive view of the information critical to your business it is even more …

Protecting Your Data In Use At Enveil with Ellison Anne Williams - Episode 45

August 27th, 2018


There are myriad reasons why data should be protected, and just as many ways to enforce it in tranist or at rest. Unfortunately, there is …

Graph Databases In Production At Scale Using DGraph with Manish Jain - Episode 44

August 20th, 2018


The way that you store your data can have a huge impact on the ways that it can be practically used. For a substantial number of use cases, the optimal format for storing and querying that information is as a …

Putting Airflow Into Production With James Meickle - Episode 43

August 13th, 2018


The theory behind how a tool is supposed to work and the realities of putting it into practice are often at odds with each other. Learning …

Taking A Tour Of PostgreSQL with Jonathan Katz - Episode 42

August 6th, 2018


One of the longest running and most popular open source database projects is PostgreSQL. Because of its extensibility and a community focus …

Mobile Data Collection And Analysis Using Ona And Canopy With Peter Lubell-Doughtie - Episode 41

July 30th, 2018


With the attention being paid to the systems that power large volumes of high velocity data it is easy to forget about the value of data …

Ceph: A Reliable And Scalable Distributed Filesystem with Sage Weil - Episode 40

July 16th, 2018


When working with large volumes of data that you need to access in parallel across multiple instances you need a distributed filesystem that …

Building Data Flows In Apache NiFi With Kevin Doran and Andy LoPresto - Episode 39

July 8th, 2018


Data integration and routing is a constantly evolving problem and one that is fraught with edge cases and complicated requirements. The …

Leveraging Human Intelligence For Better AI At Alegion With Cheryl Martin - Episode 38

July 2nd, 2018


Data is often messy or incomplete, requiring human intervention to make sense of it before being usable as input to machine learning …

Package Management And Distribution For Your Data Using Quilt with Kevin Moore - Episode 37

June 25th, 2018


Collaboration, distribution, and installation of software projects is largely a solved problem, but the same cannot be said of data. Every …

User Analytics In Depth At Heap with Dan Robinson - Episode 36

June 17th, 2018


Web and mobile analytics are an important part of any business, and difficult to get right. The most frustrating part is when you realize …

CockroachDB In Depth with Peter Mattis - Episode 35

June 11th, 2018


With the increased ease of gaining access to servers in data centers across the world has come the need for supporting globally distributed data storage. With the first wave of cloud era databases the ability to …

ArangoDB: Fast, Scalable, and Multi-Model Data Storage with Jan Steeman and Jan Stücke - Episode 34

June 4th, 2018


Using a multi-model database in your applications can greatly reduce the amount of infrastructure and complexity required. ArangoDB is a …

The Alooma Data Pipeline With CTO Yair Weinberger - Episode 33

May 28th, 2018


Building an ETL pipeline is a common need across businesses and industries. It’s easy to get one started but difficult to manage as new …

PrestoDB and Starburst Data with Kamil Bajda-Pawlikowski - Episode 32

May 21st, 2018


Most businesses end up with data in a myriad of places with varying levels of structure. This makes it difficult to gain insights from across …

Brief Conversations From The Open Data Science Conference: Part 2 - Episode 31

May 14th, 2018


The Open Data Science Conference brings together a variety of data professionals each year in Boston. This week’s episode consists of a pair of brief interviews conducted on-site at the conference. First up …

Brief Conversations From The Open Data Science Conference: Part 1 - Episode 30

May 7th, 2018


The Open Data Science Conference brings together a variety of data professionals each year in Boston. This week’s episode consists of a pair of brief interviews conducted on-site at the conference. First up …

Metabase Self Service Business Intelligence with Sameer Al-Sakran - Episode 29

April 30th, 2018


Business Intelligence software is often cumbersome and requires specialized knowledge of the tools and data to be able to ask and answer …

Octopai: Metadata Management for Better Business Intelligence with Amnon Drori - Episode 28

April 23rd, 2018


The information about how data is acquired and processed is often as important as the data itself. For this reason metadata management systems are built to track the journey of your business data to aid in …

Data Engineering Weekly with Joe Crobak - Episode 27

April 15th, 2018


The rate of change in the data engineering industry is alternately exciting and exhausting. Joe Crobak found his way into the work of data management by accident as so many of us do. After being engrossed with …

Defining DataOps with Chris Bergh - Episode 26

April 8th, 2018


Managing an analytics project can be difficult due to the number of systems involved and the need to ensure that new information can be …

ThreatStack: Data Driven Cloud Security with Pete Cheslock and Patrick Cable - Episode 25

April 1st, 2018


Cloud computing and ubiquitous virtualization have changed the ways that our applications are built and deployed. This new environment requires a new way of tracking and addressing the security of our systems. …

MarketStore: Managing Timeseries Financial Data with Hitoshi Harada and Christopher Ryan - Episode 24

March 25th, 2018


The data that is used in financial markets is time oriented and multidimensional, which makes it difficult to manage in either relational or …

Stretching The Elastic Stack with Philipp Krenn - Episode 23

March 19th, 2018


Search is a common requirement for applications of all varieties. Elasticsearch was built to make it easy to include search functionality in projects built in any language. From that foundation, the rest of the …

Database Refactoring Patterns with Pramod Sadalage - Episode 22

March 12th, 2018


As software lifecycles move faster, the database needs to be able to keep up. Practices such as version controlled migration scripts and …

The Future Data Economy with Roger Chen - Episode 21

March 5th, 2018


Data is an increasingly sought after raw material for business in the modern economy. One of the factors driving this trend is the increase in applications for machine learning and AI which require large …

Honeycomb Data Infrastructure with Sam Stokes - Episode 20

February 26th, 2018


One of the sources of data that often gets overlooked is the systems that we use to run our businesses. This data is not used to directly provide value to customers or understand the functioning of the business, …

Data Teams with Will McGinnis - Episode 19

February 19th, 2018


The responsibilities of a data scientist and a data engineer often overlap and occasionally come to cross purposes. Despite these challenges …

TimescaleDB: Fast And Scalable Timeseries with Ajay Kulkarni and Mike Freedman - Episode 18

February 11th, 2018


As communications between machines become more commonplace the need to store the generated data in a time-oriented manner increases. The market for timeseries data stores has many contenders, but they are not …

Pulsar: Fast And Scalable Messaging with Rajan Dhabalia and Matteo Merli - Episode 17

February 4th, 2018


One of the critical components for modern data infrastructure is a scalable and reliable messaging system. Publish-subscribe systems have …

Dat: Distributed Versioned Data Sharing with Danielle Robinson and Joe Hand - Episode 16

January 29th, 2018


Sharing data across multiple computers, particularly when it is large and changing, is a difficult problem to solve. In order to provide a …

Snorkel: Extracting Value From Dark Data with Alex Ratner - Episode 15

January 22nd, 2018


The majority of the conversation around machine learning and big data pertains to well-structured and cleaned data sets. Unfortunately, that is just a small percentage of the information that is available, so …

CRDTs and Distributed Consensus with Christopher Meiklejohn - Episode 14

January 15th, 2018


As we scale our systems to handle larger volumes of data, geographically distributed users, and varied data sources the requirement to …

Citus Data: Distributed PostGreSQL for Big Data with Ozgun Erdogan and Craig Kerstiens - Episode 13

January 8th, 2018


PostGreSQL has become one of the most popular and widely used databases, and for good reason. The level of extensibility that it supports has …

Wallaroo with Sean T. Allen - Episode 12

December 25th, 2017


Data oriented applications that need to operate on large, fast-moving sterams of information can be difficult to build and scale due to the …

SiriDB: Scalable Open Source Timeseries Database with Jeroen van der Heijden - Episode 11

December 18th, 2017


Time series databases have long been the cornerstone of a robust metrics system, but the existing options are often difficult to manage in …

Confluent Schema Registry with Ewen Cheslack-Postava - Episode 10

December 10th, 2017


To process your data you need to know what shape it has, which is why schemas are important. When you are processing that data in multiple … with Bryon Jacob - Episode 9

December 3rd, 2017


We have tools and platforms for collaborating on software projects and linking them together, wouldn’t it be nice to have the same …

Data Serialization Formats with Doug Cutting and Julien Le Dem - Episode 8

November 22nd, 2017


With the wealth of formats for sending and storing data it can be difficult to determine which one to use. In this episode Doug Cutting, …

Buzzfeed Data Infrastructure with Walter Menendez - Episode 7

November 14th, 2017


Buzzfeed needs to be able to understand how its users are interacting with the myriad articles, videos, etc. that they are posting. This lets them produce new content that will continue to be well-received. To …

Astronomer with Ry Walker - Episode 6

August 6th, 2017


Building a data pipeline that is reliable and flexible is a difficult task, especially when you have a small team. Astronomer is a platform …

Rebuilding Yelp's Data Pipeline with Justin Cunningham - Episode 5

June 18th, 2017


Yelp needs to be able to consume and process all of the user interactions that happen in their platform in as close to real-time as possible. To achieve that goal they embarked on a journey to refactor their …

ScyllaDB with Eyal Gutkind - Episode 4

March 18th, 2017


If you like the features of Cassandra DB but wish it ran faster with fewer resources then ScyllaDB is the answer you have been looking for. In this episode Eyal Gutkind explains how Scylla was created and how it …

Defining Data Engineering with Maxime Beauchemin - Episode 3

March 5th, 2017


What exactly is data engineering? How has it evolved in recent years and where is it going? How do you get started in the field? In this …

Dask with Matthew Rocklin - Episode 2

January 22nd, 2017


There is a vast constellation of tools and platforms for processing and analyzing your data. In this episode Matthew Rocklin talks about how …

Pachyderm with Daniel Whitenack - Episode 1

January 14th, 2017


Do you wish that you could track the changes in your data the same way that you track the changes in your code? Pachyderm is a platform for …

Introducing The Show - Episode 0

January 8th, 2017

  • Hello and welcome to the Data Engineering Podcast, the show about modern data infrastructure
  • Go to to subscribe …
Loading ...

Download the RadioPublic app for
 FREE and never miss an episode.

Get it on Google PlayDownload on the App Store