Weekly deep dives on data management with the engineers and entrepreneurs who are shaping the industry
Dat: Distributed Versioned Data Sharing with Danielle Robinson and Joe Hand - Episode 16
Sharing data across multiple computers, particularly when it is large and changing, is a difficult problem to solve. In order to provide a simpler way to distribute and version data sets among collaborators the Dat Project was created. In this episode Danielle Robinson and Joe Hand explain how the project got started, how it functions, and some of the many ways that it can be used. They also explain the plans that the team has for upcoming features and uses that you can watch out for in future releases.
Hello and welcome to the Data Engineering Podcast, the show about modern data infrastructure
When you’re ready to launch your next project you’ll need somewhere to deploy it. Check out Linode at dataengineeringpodcast.com/linode and get a $20 credit to try out their fast and reliable Linux virtual servers for running your data pipelines or trying out the tools you hear about on the show.
Continuous delivery lets you get new features in front of your users as fast as possible without introducing bugs or breaking production and GoCD is the open source platform made by the people at Thoughtworks who wrote the book about it. Go to dataengineeringpodcast.com/gocd to download and launch it today. Enterprise add-ons and professional support are available for added peace of mind.
The O’Reilly AI Conference is also coming up. Happening April 29th to the 30th in New York it will give you a solid understanding of the latest breakthroughs and best practices in AI for business. Go to dataengineeringpodcast.com/aicon-new-york to register and save 20%
If you work with data or want to learn more about how the projects you have heard about on the show get used in the real world then join me at the Open Data Science Conference in Boston from May 1st through the 4th. It has become one of the largest events for data scientists, data engineers, and data driven businesses to get together and learn how to be more effective. To save 60% off your tickets go to dataengineeringpodcast.com/odsc-east-2018 and register.
Your host is Tobias Macey and today I’m interviewing Danielle Robinson and Joe Hand about Dat Project, a distributed data sharing protocol for building applications of the future
How did you get involved in the area of data management?
What is the Dat project and how did it get started?
How have the grants to the Dat project influenced the focus and pace of development that was possible?
Now that you have established a non-profit organization around Dat, what are your plans to support future sustainability and growth of the project?
Can you explain how the Dat protocol is designed and how it has evolved since it was first started?
How does Dat manage conflict resolution and data versioning when replicating between multiple machines?
One of the primary use cases that is mentioned in the documentation and website for Dat is that of hosting and distributing open data sets, with a focus on researchers. How does Dat help with that effort and what improvements does it offer over other existing solutions?
One of the difficult aspects of building a peer-to-peer protocol is that of establishing a critical mass of users to add value to the network. How have you approached that effort and how much progress do you feel that you have made?
How does the peer-to-peer nature of the platform affect the architectural patterns for people wanting to build applications that are delivered via dat, vs the common three-tier architecture oriented around persistent databases?
What mechanisms are available for content discovery, given the fact that Dat URLs are private and unguessable by default?
For someone who wants to start using Dat today, what is involved in creating and/or consuming content that is available on the network?
What have been the most challenging aspects of building and promoting Dat?
What are some of the most interesting or inspiring uses of the Dat protocol that you are aware of?