Before we get started, let’s answer the question “what is git?”. Git is a distributed version control system for keeping track of changes in source code during the software development process.

The goal of the platform is to make collaboration easier for teams on large projects. Git increases speed, integrity, and workflow efficiency. Git is used by more than 1,700 companies around the world (through different git platforms such as GitHub and GitLab).

Data Analytics

Pandas is one of the most widely used Python libraries in data science and analytics. Pandas is a large library with immense capabilities. In this introductory blog post to Pandas I will cover the basics of its famous data structure, the DataFrame, and some basic data wrangling techniques.

K-Means clustering is one of the most popular unsupervised machine learning algorithms. It is a classification algorithm, meaning it’s purpose is to arrange the unlabeled data by shared qualities and characteristics.

There are hundreds of Python libraries aimed to make lives easier for data scientists. Some good and some bad, some large libraries covering many areas and some that only do a couple things very well. Here is a list of 5 Python libraries that every data scientist is required to have installed in their environment.

There are a lot of blogs out there focusing on data science. Throughout the past few years, I have found at least a handful of blogs that I read regularly. I’ve tried my best to narrow it down to my favorite 5 blogs.