Cover art for podcast Plumbers of Data Science

Plumbers of Data Science

85 EpisodesProduced by Andreas KretzWebsite

Data Engineering is the plumbing of data science. Almost invisible, but super important and a big mess when done wrong.I Talk about trends, tools and techniques Data Engineering. I want to help you get started and inspire you to create and learn Data Engineering. Check out my Data Engineering Academ… read more

85 Episodes | 2017 - 2021

#86 The Ultimate Data Engineering Introduction

January 14th, 2021


The Podcast is back!!!! I promise I am going to keep it up to date this time ;)
In this episode I talk about my newest Data Engineering course. I …

#085 Big Data and Data Science Landscape plus trying to read Tweets with Nifi

May 28th, 2019


We are looking into the network communication protocol map. I first saw this like 10 years ago and its awesome. 

Then we check out the Big Data and …

#084 Behind the scenes: Audio podcast, free transcriptions and GitHub

May 27th, 2019


Today's podcast is a bit of a behind the scenes. 

What it takes to do a audio podcast. How you can get audio to text transcriptions for free. 

.Also Github questions on how to work with branches on the Cookbook

#083 Data Engineering at OLX Case Study

May 27th, 2019


Today a case study about OLX with a guest it was super fun!

Here are the slides Alexeyand I talked about:

#082 Reading Tweets With Apache Nifi & IaaS vs PaaS vs SaaS

May 27th, 2019


In this episode we install the Nifi docker container and look into how we can extract the twitter data.

We are also talking about the differences between infrastructure as a service, platform as a service and …

#081 How to get tweets from the Twitter API

May 27th, 2019


In this episode we look into the Twitter API documentation, which I love by the way.

How can we get old tweets for a certain hashtags and how to get …

#080 How To Find A Job In Germany & Answering Mails

May 27th, 2019


Tips on how you find a job in Germany and two super interesting mails.

#079 Trying to stay true to myself and making the cookbook public on GitHub

May 27th, 2019


The cookbook my Youtube, it will be for free, forever! Check out the data engineering cookbook on GitHub:

#078 Cookbook collaboration and updates

May 27th, 2019


Updates of the cookbook and how to collaborate on it

#077 Lambda and Kappa Architecture

May 27th, 2019


In this episode we talk about the lambda architecture with stream and batch processing as well as a alternative the Kappa Architecture that consists only of streaming. Also Data engineer vs data scientist and we discuss …

#076 Cloud vs On Premise How To Decide

May 27th, 2019


How do you choose between Cloud vs On-Premise, pros and cons and what you have to think about. Because there are good reasons to not go cloud.

Also thoughts on how to choose between the cloud providers by just comparing …

#075 Creating the Course Structure For My Data Engineering Course

May 27th, 2019


In this episode we go over the ideas I have for the data engineering course structure. It was your chance for you to influence what we put in there.

#074 Starting My Data Engineering Online Course

May 27th, 2019


In this video we go over some of the 100+ comments I received on LinkedIn about a data engineering training. 

#073 Data Engineering At LinkedIn Case Study

May 27th, 2019


Let's check out how LinkedIn is processing data

#072 Data Engineering At Twitter Case Study

May 27th, 2019


How is Twitter doing Data Engineering? Oh man, they have a lot of cool things to share these tweets. 

#071 Data Engineering At Spotify Case Study

May 27th, 2019


In this episode we are looking at the data engineering at Spotify, my favorite music streaming service. How do they process all that data?

#070 The Engineering Culture At Spotify

May 27th, 2019


In this podcast we look at the engineering culture at Spotify, my favorite music streaming service. 

The process behind the development of Spotify is …

#069 Data Engineering At Pinterest Case Study

May 27th, 2019


A look into how Pinterest is doing data engineering.

#068 A Budget Data Science PC Build

May 27th, 2019


Configuring a sub 1000 dollar PC for data engineering and machine learning

Link to the builds:

900$ build:

1500$ …

#067 Data Engineering At NASA Case Study

May 27th, 2019


A look into how NASA is doing data engineering.

#066 How To Do Data Science From A Data Engineers Perspective

May 27th, 2019


A simple introduction how to do data science in the context of the internet of things. 

#065 Data Engineering At CERN Case Study

May 27th, 2019


A look into how CERN is doing Data Engineering. They get huge amounts of data from the Large Hydron Colider. Let's check it out.

#064 Data Engineering At Case Study

May 27th, 2019


A look into how is doing data engineering.

#063 Data Engineering At Airbnb Case Study

May 27th, 2019


A look into how Airbnb is doing Data Engineering.

#062 Data Engineering At Netflix Case Study

May 27th, 2019


How Netflix is doing Data Engineering using their Keystone platform

#061 Reworking My Cookbook For Data Engineering

May 27th, 2019


I decided to rework the cookbook focusing more on case studies and less on explaining tools.

People keep asking me for a path to become a data …

#060 What Is Hadoop And Is Hadoop Still Relevant In 2019?

May 27th, 2019


A Introduction into Hadoop HDFS, YARN and MapReduce. 

Yes, Hadoop is still relevant in 2019 even if you look into serverless tools. 

#059 A Look Into The Siemens Mindsphere IoT Platform? | #059

May 27th, 2019


The Internet of things is a huge deal. There are many platforms available. But, which one is actually good?

Join me on a 50 minute dive into the Siemens Mindsphere online documentation.

 I have to say I was super …

#058 Guitars And Data Live Stream

May 27th, 2019


A stream full of mediocre guitar playing and great Q&A about Hadoop. 

#057 Introducing The Plumbers Medium Publication

May 27th, 2019


I have created a Medium Publication especially for us Plumbers of Data Science who work in Data Engineering and Big Data.

It's called, you guessed it, Plumbers of Data Science.

#056 NoSQL Key Value Stores Explained With HBase

May 27th, 2019


What is the difference between SQL and NoSQL?

In this episode I show you on the example of HBase how a key/value store works. 

#055 Data Warehouse vs Data Lake

May 27th, 2019


On this podcast I talk about data warehouses and data lakes.

When do people use which? What are the pros and cons of both?

Architecture examples for both and does it make sense to completely move to a data lake?

#054 How to Market Yourself in 2019 Student or Professional

May 27th, 2019


In this episode I talk about how you can gain a competitive edge on the job market. It's super simple, you can and should start with it TODAY by …

#053 The Data Science Depression Is Coming? What You Can Do

May 27th, 2019


The Data Science Hype is still strong. Where's the industry going, towards a cliff? Here's what can you do?

#052 Data Engineering Cookbook Live Stream

May 27th, 2019


In this episode I show you the first version of my data engineering cookbook.

#051 Five Books To Buy As A Data Engineer & My Book Buying Strategy

May 27th, 2019


Getting a book and reading it cover to cover is useless. In this episode I show you my strategy of buying books complimentary to your work. And 5 great books I read over the years that helped me get where I am now.

#050 Data Engineer Scientist or Analyst Which One Is For You?

May 27th, 2019


In this podcast we talk about the differences between data scientists, analysts and engineers. Which are the three main data science jobs.

All three super important.

#049 I Found A REAL Use For Blockchain, At Least I thought So

May 27th, 2019


After all the BS solutions using Blockchain I thought I finally found one that makes sense. Of all the possibilities it's the EU data protection law …

#048 From Wannabe Data Scientist To Engineer My Journey

May 27th, 2019


In this episode Kate Strachnyi interviews me for her humans of data science podcast. We talk about how I found out that I am more into the engineering part of data science. 

#047 The Truth About Data Science Salary For Graduates

May 27th, 2019


In this episode I show you how much data science graduates are actually payed in Germany.

All over the internet you can find that Data Science salary …

#046 How To Use GitHub for LaTeX Version Control

May 27th, 2019


In this podcast I am showing you how I use GitHub to write my Data Engineering Cookbook with LaTex.

#045 Why I Use LaTeX to Write Professionally And You Should Too

December 7th, 2018


What is the best editing tool to write a thesis, a dissertation or a paper? NOT Word or Pages! It's LaTeX.
In today's video I show you why I decided …

#044 How to Increase Your Chances for Internships or a Full-time Job

November 27th, 2018


You have certifications or a university degree, but can't find a job?
Sharing your ideas and knowledge will increase your chances!
Here's how you can do that.

"Day One" by Declan DP

#041 Agile Development Is Important But Please Don't Do Scrum

October 18th, 2018


I love agile development. People keep telling you to do Scrum, like it's the only and best choice to be agile. It's not. Here's my take on scrum and my four main beefs with it. Watch out for these issues if you are …

#040 Huge Big Data News! Cloudera and Hortonworks Merge

October 9th, 2018


So, Cloudera and Hortonworks merge... In today's Plumbers of Data Science Podcast I talk about what these, big data vendors do. How they enable companies, admins and developers to do data science and many more things.

#039 Is ETL Dead For Data Science and Big Data?

October 3rd, 2018


Is ETL dead in Data Science and Big Data?
In today's podcast I share with you my views on your questions regarding ETL (extract, transform, load).

Data Lakes & Data Warehouse where is the difference?
Is ETL still …

#38 Morning advice to beginner Data Scientists and Data Engineers

September 27th, 2018


What's the difference between Data Scientists & Data Analysts?
What to do to find internships or a full time job?
Data Scientist and Engineer in …

#037 How To Boost Teamwork With Version Control

September 12th, 2018


Without the proper tools and techniques of version control the team's efficiency goes down the drain. In this episode I talk about how tools like …

#036 Why Distributed Processing Is Super Important

September 10th, 2018


You need to become comfortable with distributed processing. Data Science or the Internet of Things, the amount of data that is getting produced and processed grows like crazy. In this podcast I talk about how a platform …

#035 Learning By Doing Is The Best Thing Ever!

September 6th, 2018


For me, school and university was hard. The lectures, sitting down and getting told how things work.
Reading books and learning dry stuff was a drag. …

#034 Talent Stacks For Data Engineers

September 4th, 2018


Becoming an expert in single skill is not the way to go for a data engineer. In this episode I talk about which talents go good together in terms of technical and personal ones. So, that you build up a stack of …

#033 How APIs Rule The World

September 3rd, 2018


Strong APIs make a good platform. In this episode I talk about why you need APIs and why Twitter is a great example. Especially JSON APIs are my personal favorite. Because JSON is also important in the Big Data world, …

#032 How to Design Security Zones and Lambda Architecture

August 30th, 2018


Security is everything! That's why today, I took some time to give you some tips about how to make a good design. The Lambda Architecture with stream and batch processing is one of the cornerstones for Big Data and Data …

#031 IT Networking Infrastructure and Linux

August 29th, 2018


The understanding of how information is transported over the network is super important. OS wise you will mostly encounter Linux so here are some important Linux basics you need to know.

Firewalls, Ports, IP-Adresses, …

#030 Why the hardware and the GPU is super important

August 28th, 2018


Knowing the hardware is super important for a data engineer. Even if you are using cloud servers. CPU, RAM, GPU, HDD, SSD...
Especially the GPU is a great help to Data Scientists who are doing machine learning.

#029 A New Mission

August 27th, 2018


I am bringing the Podcast back! Lets call it season 2.
New name, new mission: Helping you become a data engineer.
Daily podcast, recorded in my car or my office, getting you up to speed ASAP.

4 Vs Of Big Data Are Enough!

May 23rd, 2018


8 V's, 10 V's, 12 V's . The best way to explain Big Data is to use the four V's:

Volume, Velocity, Variety and Veracity.

In this podcast episode I talk …

Why Companies Badly Need Data Scientists And Engineers

May 18th, 2018


In this episode I give you my take on why companies badly need data scientists and engineers.
Because in this data driven world, you can accomplish a lot with just a few people.
All you need is a vision, some sense for …

What You Need To Know About Data Engineering

May 16th, 2018


This podcast is all about what you as a data engineer really do.
From building platforms to collaboration with data scientists and customers.

I'm a Big Data Engineer and it's Super Awesome!

May 15th, 2018


There is this other data science job called data engineer and it's super important. Because data science does not equal data scientist.
In today's podcast I talk about how I finally realized that data engineering is my …

BI vs Data Science vs Big Data

April 4th, 2018


I have recently been asked: "What is the difference between BI, Data Science and Big Data". So, it thought I make a quick podcast about this for you guys. I think especially beginners will help this a lot.

How Much Big Data Do You Need To Learn As A Data Scientist?

March 20th, 2018


Big Data tools are very important. But how deep should you really go as a data scientist?
How do you best learn all this stuff?
Some questions I try to …

Working With Time Series Data And Missing Values

March 16th, 2018


Time series data is tricky. Especially if you have missing data.
In this episode I talk about a few things you can do to handle this problem.

Hadoop For Data Scientists An Introduction

March 13th, 2018


Hey Podcast, in this episode I talk about the core functions of Hadoop.

Three Methods of Streaming Data

March 9th, 2018


There are three different methods of streamging: At least once, at most once and exactly once. Listen why it makes a huge difference which one you …

Dirty Data, Unicorn Scientists

February 26th, 2018


Where to get dirty data to train cleaning it?
What are unicorn data scientists and what is THE skill you need if you aren't one.

DS Office Hours Nr. 3

February 9th, 2018


How to define a data science problem.

NoSQL Vs SQL How To Choose

February 9th, 2018


NoSQL databases like HBase are awesome!
But why and when should you use them?
How does a key value store like HBase work?
Today I am talking about …

Creating A Gaming-AI-Bot

February 5th, 2018


Creating a Gaming-AI with Reinforcement Learning • Creating a Gaming-AI with Reinforcement Learning

Loosing $$ With Data Science

January 31st, 2018


Loosing money with data science in the short term does not matter. It's about the long run, not quick sales.
This is a story about how this happened in the insurance industry. And how to go at it to turn this loss into …

Swish Swedens Awesome Fintech

January 22nd, 2018


Swish the Swedish Fintech that Blew My Mind in 2017

Data Science Office Hours

January 18th, 2018


Chat and Q&A about data science with data scientists

Gartner's Hype Cycle Explained

January 15th, 2018


It is very important for me to keep track of emerging technologies and trends. You want to know where the industry is headed. You don't want to miss …

How to Show That ML and AI Works

January 12th, 2018


How to convince people that machine learning actually works? It's simpler than you think.
You have the data. Data doesn't lie!

Analytics on Edge Devices

January 11th, 2018


Unavailable cellphone coverage really pissed me off. You cannot transmit data and do cloud based analytics during that time. That is why edge devices in the field have to get more and more get analytics capabilities …

BigData and Catastrophic Success

January 8th, 2018


In this episode I talk about the 4 Vs of big data. And how Big Data can save you from catastrophic success.

Machine Learning In Production

January 6th, 2018


Doing machine learning in production is very different then for proof of concepts or in education.
One of the hardest parts is keeping models updated.

Data Science VS Big Data

January 3rd, 2018


Choose between Big Data & Data Science • Choose between Big Data & Data Science • Choose between Big Data & Data Science

Agriculture snd DS DailyKayy 004

January 3rd, 2018


How Data Science Transforms Agriculture | DailyKayy 004 • How Data Science Transforms Agriculture | DailyKayy 004 • How Data Science Transforms …

DS Preventing Insurance Fraud

December 15th, 2017


Insurance companies use data science to detect fraudulent cases. Saving themselves and the customers money

How Data Transforms Healthcare

December 12th, 2017


How Smartwatches and Fitbits Transform Healthcare • How Smartwatches and Fitbits Transform Healthcare

Learn Data Science Go Docker!

December 7th, 2017


Docker is so awesome for beginners. Preconfigured images let you start coding in minutes. No annoying dev environment setup.

Measure Everything!

December 6th, 2017


Social media, Product development. Get that data and analyze it!
Start winning!

Prime Video, Tesla and

November 29th, 2017


Prime Video X-Ray Feature, Tesla & Comma AI • Prime Video X-Ray Feature, Tesla & Comma AI • Prime Video X-Ray Feature, Tesla & Comma AI

Loading ...

Are you the creator of this podcast?

Verify your account

and pick the featured episodes for your show.

Listen to Plumbers of Data Science


A free podcast app for iPhone and Android

  • User-created playlists and collections
  • Download episodes while on WiFi to listen without using mobile data
  • Stream podcast episodes without waiting for a download
  • Queue episodes to create a personal continuous playlist
RadioPublic on iOS and Android
Or by RSS
RSS feed

Connect with listeners

Podcasters use the RadioPublic listener relationship platform to build lasting connections with fans

Yes, let's begin connecting
Browser window

Find new listeners

  • A dedicated website for your podcast
  • Web embed players designed to convert visitors to listeners in the RadioPublic apps for iPhone and Android
Clicking mouse cursor

Understand your audience

  • Capture listener activity with affinity scores
  • Measure your promotional campaigns and integrate with Google and Facebook analytics
Graph of increasing value

Engage your fanbase

  • Deliver timely Calls To Action, including email acquistion for your mailing list
  • Share exactly the right moment in an episode via text, email, and social media
Icon of cellphone with money

Make money

  • Tip and transfer funds directly to podcastsers
  • Earn money for qualified plays in the RadioPublic apps with Paid Listens