With the rise of cloud computing and big data, columnar databases have increased in popularity. One of the main reasons for its rise in popularity is due to its efficiency for analytical queries and therefore business intelligence tools. This post aims to identify the key differences between these two database types and point you in the right direction for your future data warehouse.
Databricks launched a new open source product at the Spark AI Summit 2019 called Delta Lake. Delta Lake touts that it brings ACID transactions to Apache Spark and big data workloads.
I recently had to connect my Azure Databricks instance to our Azure Data Lake Storage (Generation 1) and was running into some problems getting everything set up. I am sure I am not the only one out there having these problems so if you do as well, here is a little guide to get you connected.
Pandas is one of the most widely used Python libraries in data science and analytics. Pandas is a large library with immense capabilities. In this introductory blog post to Pandas I will cover the basics of its famous data structure, the DataFrame, and some basic data wrangling techniques.
In this blog post I will lay out five (5) reasons you should consider Databricks before starting your next data science project.