The podcast about Python and the people who make it great
•38:00
Doing Dask Powered Data Science In The Saturn Cloud
Summary
A perennial problem of doing data science is that it works great on your laptop, until it doesn’t. Another problem is being able to recreate your environment to collaborate on a problem with colleagues. Saturn Cloud aims to help with both of those problems by providing an easy to use platform for creating reproducible environments that you can use to build data science workflows and scale them easily with a managed Dask service. In this episode Julia Signall, head of open source at Saturn Cloud, explains how she is working with the product team and PyData community to reduce the points of friction that data scientists encounter as they are getting their work done.
Announcements
Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science.
When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show!
Your host as usual is Tobias Macey and today I’m interviewing Julia Signell about building distributed processing workflows in Python through the power of Dask
Interview
Introductions
How did you get introduced to Python?
Can you describe what you are building at Saturn Cloud?
Who are your target users and how does that inform the features and priorities that you build into your platform?
What are the road blocks that data scientists typically encounter when working on their laptop/workstation?
How does open source factor into the Saturn product?
What are some of the projects that you are collaborating with/contributing to as part of your work at Saturn?
How has your experience at Anaconda informed your work at Saturn?
Can you describe how the Saturn Cloud platform is architected?
How has it changed or evolved since it was first launched?
Can you describe the learning curve that data scientists go through when adopting Dask?
What are some examples of projects or workflows that Dask enables which are not possible/practical to do locally?
How would you characterize the overall awareness/adoption of Dask in the Python data science community?
What are the most interesting, innovative, or unexpected ways that you have seen Saturn Cloud used?
What are the most interesting, unexpected, or challenging lessons that you have learned while working on Saturn Cloud?
When is Saturn Cloud the wrong choice?
What do you have planned for the future of Saturn Cloud?