Cover art for podcast Data Engineering Podcast

Data Engineering Podcast

420 EpisodesProduced by Tobias MaceyWebsite

This show goes behind the scenes for the tools, techniques, and difficulties associated with the discipline of data engineering. Databases, workflows, automation, and data manipulation are just some of the topics that you will find here.

1:08:48

Removing The Barrier To Exploratory Analytics with Activity Schema and Narrator

Summary

The perennial question of data warehousing is how to model the information that you are storing. This has given rise to methods as varied as star and snowflake schemas, data vault modeling, and wide tables. The challenge with many of those approaches is that they are optimized for answering known questions but brittle and cumbersome when exploring unknowns. In this episode Ahmed Elsamadisi shares his journey to find a more flexible and universal data model in the form of the "activity schema" that is powering the Narrator platform, and how it has allowed his customers to perform self-service exploration of their business domains without being blocked by schema evolution in the data warehouse. This is a fascinating exploration of what can be done when you challenge your assumptions about what is possible.

Announcements
  • Hello and welcome to the Data Engineering Podcast, the show about modern data management
  • When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show!
  • Atlan is a collaborative workspace for data-driven teams, like Github for engineering or Figma for design teams. By acting as a virtual hub for data assets ranging from tables and dashboards to SQL snippets & code, Atlan enables teams to create a single source of truth for all their data assets, and collaborate across the modern data stack through deep integrations with tools like Snowflake, Slack, Looker and more. Go to dataengineeringpodcast.com/atlan today and sign up for a free trial. If you’re a data engineering podcast listener, you get credits worth $3000 on an annual subscription
  • Modern Data teams are dealing with a lot of complexity in their data pipelines and analytical code. Monitoring data quality, tracing incidents, and testing changes can be daunting and often takes hours to days. Datafold helps Data teams gain visibility and confidence in the quality of their analytical data through data profiling, column-level lineage and intelligent anomaly detection. Datafold also helps automate regression testing of ETL code with its Data Diff feature that instantly shows how a change in ETL or BI code affects the produced data, both on a statistical level and down to individual rows and values. Datafold integrates with all major data warehouses as well as frameworks such as Airflow & dbt and seamlessly plugs into CI workflows. Go to dataengineeringpodcast.com/datafold today to start a 30-day trial of Datafold.
  • Your host is Tobias Macey and today I’m interviewing Ahmed Elsamadisi about Narrator, a platform to enable anyone to go from question to data-driven decision in minutes
Interview
  • Introduction
  • How did you get involved in the area of data management?
  • Can you describe what Narrator is and the story behind it?
  • What are the challenges that you have seen organizations encounter when attempting to make analytics a self-serve capability?
  • What are the use cases that you are focused on?
  • How does Narrator fit within the data workflows of an organization?
  • How is the Narrator platform implemented?
    • How has the design and focus of the technology evolved since you first started working on Narrator?
  • The core element of the analyses that you are building is the "activity schema". Can you describe the design process that led you to that format?
    • What are the challenges that are posed by more widely used modeling techniques such as star/snowflake or data vault?
      • How does the activity schema address those challenges?
  • What are the performance characteristics of deriving models from an activity schema/timeseries table?
  • For someone who wants to use Narrator, what is involved in transforming their data to map into the activity schema?
    • Can you talk through the domain modeling that needs to happen when determining what entities and actions to capture?
  • What are the most interesting, innovative, or unexpected ways that you have seen Narrator used?
  • What are the most interesting, unexpected, or challenging lessons that you have learned while working on Narrator?
  • When is Narrator the wrong choice?
  • What do you have planned for the future of Narrator?
Contact Info Parting Question
  • From your perspective, what is the biggest gap in the tooling or technology for data management today?
Links

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Support Data Engineering Podcast

Educational emoji reaction

Educational

Interesting emoji reaction

Interesting

Funny emoji reaction

Funny

Agree emoji reaction

Agree

Love emoji reaction

Love

Wow emoji reaction

Wow

Listen to Data Engineering Podcast

RadioPublic

A free podcast app for iPhone and Android

  • User-created playlists and collections
  • Download episodes while on WiFi to listen without using mobile data
  • Stream podcast episodes without waiting for a download
  • Queue episodes to create a personal continuous playlist
RadioPublic on iOS and Android
Or by RSS
RSS feed
https://www.dataengineeringpodcast.com/rss

Connect with listeners

Podcasters use the RadioPublic listener relationship platform to build lasting connections with fans

Yes, let's begin connecting
Browser window

Find new listeners

  • A dedicated website for your podcast
  • Web embed players designed to convert visitors to listeners in the RadioPublic apps for iPhone and Android
Clicking mouse cursor

Understand your audience

  • Capture listener activity with affinity scores
  • Measure your promotional campaigns and integrate with Google and Facebook analytics
Graph of increasing value

Engage your fanbase

  • Deliver timely Calls To Action, including email acquistion for your mailing list
  • Share exactly the right moment in an episode via text, email, and social media
Icon of cellphone with money

Make money

  • Tip and transfer funds directly to podcastsers
  • Earn money for qualified plays in the RadioPublic apps with Paid Listens