Although this is not directly related to machine learning/deep learning but a data warehouse is something that data scientists have to deal with while building machine learning models. I hope it would be somewhat useful for people to know an engineering/business/data science perspective for choosing a data warehouse.
We recently selected a data warehouse after making some basic data for some data warehouses: AWS Redshift, AWS Athena, Snowflake, Google BigQuery. In this slide, I share some aspects of our business that led us to choose Snowflake and what were the pros and cons of different actions.
It mainly consists of exchanging POC experiences in different data warehouses. One thing that is missing from the slides is BigQuery. I recently got certified Google Certified Professional - Data Engineer. so I already knew his skills, so I did not do POC. Our technology was far superior to AWS, so switching to GCP would make sense if there were significant advantages you could not see in comparison when choosing a data warehouse.
I make the next post so I can share the good pieces of snowflakes in the coming weeks. If in doubt, add some comments and I will try to contact you as soon as I arrive.
Thanks and Regards,
In the first of this series of blog posts about Data-Warehousing, I’ve been talking about how we use and manage our Amazon Redshift cluster at Drivy. One of the most significant issues we had at this time was: how to isolate the compute from the storage to ensure maximum concurrency on read in order to do more and more data analysis and on-board more people in the team. I briefly introduced Amazon Spectrum and promised to talk about how we were going to use it in a second blog post… But, that turned out not to be the case, because we ultimately decided to choose another data-warehousing technology (Snowflake Online Certification Training) that addresses the issue mentioned above, among other things, that I’ll expose here.
Snowflake is a data warehouse built on top of the Amazon Web Services or Microsoft Azure cloud infrastructure. There’s no hardware or software to select, install, configure, or manage, so it’s ideal for organizations that don’t want to dedicate resources for setup, maintenance, and support of in-house servers. And data can be moved easily into Snowflake using an ETL solution like Stitch.
But what sets Snowflake apart is its architecture and data sharing capabilities. The Snowflake architecture allows storage and computes to scale independently, so customers can use and pay for storage and computation separately. And the sharing functionality makes it easy for organizations to quickly share governed and secure data in real-time.
For more info, please check this website: https://www.stitchdata.com/resources/snowflake/
I just want to mention one more thing if anyone wants to know more about the snowflake online classes, please feel free to ping me.
Thanks & Regards,