Business Intelligence dashboards are becoming more and more prevalent in businesses. Building an effective dashboard following best-practices leads down a comprehensive BI process. In this post, we will try to cover 4 of the most important things to keep in mind when assembling your dashboard. Good dashboard design simplifies large amounts of data to answer important questions raised by the business. In order to answer these questions, the dashboard needs to tell a clear and defined story while expressing the meaning of the data in clear visualizations, allowing the viewer to dig into the details if necessary. Bad Example A quick Google search and we have found a plethora of terribly designed dashboards. Here is one example: Terrible dashboard design There is simply too much going on in such a small place, all at once. It is cluttered, and distracting. How to Create Beuautiful Dashboards? So how can you avoid…
With the rise of cloud computing and big data, columnar databases have increased in popularity. One of the main reasons for its rise in popularity is due to its efficiency for analytical queries and therefore business intelligence tools. This post aims to identify the key differences between these two database types and point you in the right direction for your future data warehouse.
I recently had to connect my Azure Databricks instance to our Azure Data Lake Storage (Generation 1) and was running into some problems getting everything set up. I am sure I am not the only one out there having these problems so if you do as well, here is a little guide to get you connected.
This post describes the purpose and how to use SQL window functions. If you have looked up window functions in the official documentation, you might have noticed that it can be quite difficult to understand, but I am here to clarify a few things and help you make sense of it.
Jupyter notebook is by far my all time favorite tool. It is the go-to tool for data exploration for any data scientist or data analyst out there.
Pandas is one of the most widely used Python libraries in data science and analytics. Pandas is a large library with immense capabilities. In this introductory blog post to Pandas I will cover the basics of its famous data structure, the DataFrame, and some basic data wrangling techniques.
K-Means clustering is one of the most popular unsupervised machine learning algorithms. It is a classification algorithm, meaning it’s purpose is to arrange the unlabeled data by shared qualities and characteristics.
There are hundreds of Python libraries aimed to make lives easier for data scientists. Some good and some bad, some large libraries covering many areas and some that only do a couple things very well. Here is a list of 5 Python libraries that every data scientist is required to have installed in their environment.
Word clouds are a popular way of “summarizing” a large repository or corpus of text. In this tutorial we will be taking Winston Churchill’s speech “We Shall Fight on the Beaches”, and create the word cloud pictured below.
In this blog post I will lay out five (5) reasons you should consider Databricks before starting your next data science project.