Kara Annanie and Stian Ulriksen is presenting at SQLSaturday in Denver on the topic of Machine Learning in Databricks. Come check it out!
On September 17th, I will be presenting an introduction to Azure Databricks for the Boulder SQL User Group at Datavail Corporation in Broomfield. Not sure what Databricks is, or if it is for you? Check out this short post about some of the key capabilities. Come check it out if you are in the area! https://www.meetup.com/BoulderSql/events/261149223/ Hope to see you there!
Business Intelligence dashboards are becoming more and more prevalent in businesses. Building an effective dashboard following best-practices leads down a comprehensive BI process. In this post, we will try to cover 4 of the most important things to keep in mind when assembling your dashboard. Good dashboard design simplifies large amounts of data to answer important questions raised by the business. In order to answer these questions, the dashboard needs to tell a clear and defined story while expressing the meaning of the data in clear visualizations, allowing the viewer to dig into the details if necessary. Bad Example A quick Google search and we have found a plethora of terribly designed dashboards. Here is one example: Terrible dashboard design There is simply too much going on in such a small place, all at once. It is cluttered, and distracting. How to Create Beuautiful Dashboards? So how can you avoid…
A logistic regression is a model that is appropriate to use when the dependent variable is binary, i.e. 0s and 1s, True or False, Yes or No. The logistic regression is part of the regression analysis library and could therefor be interpreted as a predictive analytics model.
Pandas in Python is an awesome library to help you wrangle with your data, but it can only get you so far. When you start moving into the Big Data space, PySpark is much more effective in accomplishing what you want. This post aims at helping you migrate what you know about Pandas to PySpark. If you are new to Spark, checkout this post about Databricks, and go spin up a cluster to play around. Apache Spark and PySpark Before we get going, let’s take a step back and talk about Apache Spark. Spark is a fast and general engine for large-scale data processing. Spark uses distributed computing to accomplish higher speeds on large datasets. When you submit a request to Spark, the driver node distributes the workload to a number of worker nodes who processes parts of the request in parallel. Think of it as an improvement to original…
With the rise of cloud computing and big data, columnar databases have increased in popularity. One of the main reasons for its rise in popularity is due to its efficiency for analytical queries and therefore business intelligence tools. This post aims to identify the key differences between these two database types and point you in the right direction for your future data warehouse.
Databricks launched a new open source product at the Spark AI Summit 2019 called Delta Lake. Delta Lake touts that it brings ACID transactions to Apache Spark and big data workloads.
I recently had to connect my Azure Databricks instance to our Azure Data Lake Storage (Generation 1) and was running into some problems getting everything set up. I am sure I am not the only one out there having these problems so if you do as well, here is a little guide to get you connected.
The purpose of a pie chart is to tell a story about the parts-to-whole aspect of your data. In other words, it describes how big one part is compared to other parts. This all sounds like a visualization with a good purpose, so why shouldn’t you use it?
Join Kara Annanie and I at the Denver SQL User Group meeting at Microsoft’s offices in Denver on Thursday. We will be presenting on Azure Databricks and how you can get started using it. https://www.meetup.com/Denver-SQL-Server-User-Group/events/jnhjcqyzgbxb/