This post describes the purpose and how to use SQL window functions. If you have looked up window functions in the official documentation, you might have noticed that it can be quite difficult to understand, but I am here to clarify a few things and help you make sense of it.
Jupyter notebook is by far my all time favorite tool. It is the go-to tool for data exploration for any data scientist or data analyst out there.
Before we get started, let’s answer the question “what is
The goal of the platform is to make collaboration easier for teams on large projects. Git increases speed, integrity, and workflow efficiency. Git is used by more than 1,700 companies around the world (through different git platforms such as GitHub and GitLab).
Pandas is one of the most widely used Python libraries in data science and analytics. Pandas is a large library with immense capabilities. In this introductory blog post to Pandas I will cover the basics of its famous data structure, the DataFrame, and some basic data wrangling techniques.
K-Means clustering is one of the most popular unsupervised machine learning algorithms. It is a classification algorithm, meaning it’s purpose is to arrange the unlabeled data by shared qualities and characteristics.
There are hundreds of Python libraries aimed to make lives easier for data scientists. Some good and some bad, some large libraries covering many areas and some that only do a couple things very well. Here is a list of 5 Python libraries that every data scientist is required to have installed in their environment.
Word clouds are a popular way of “summarizing” a large repository or corpus of text. In this tutorial we will be taking Winston Churchill’s speech “We Shall Fight on the Beaches”, and create the word cloud pictured below.
In this blog post I will lay out five (5) reasons you should consider Databricks before starting your next data science project.
Both correlation and causation are statistical terms that are often misunderstood. Understanding both of these concepts is vital before making a decision based on data.