With the rise of cloud computing and big data, columnar databases have increased in popularity. One of the main reasons for its rise in popularity is due to its efficiency for analytical queries and therefore business intelligence tools. This post aims to identify the key differences between these two database types and point you in the right direction for your future data warehouse.

Rowise (Row-oriented) Database

In a row-oriented database, data is stored (and retrieved) one row at a time. For example, if you are looking for all employees in a given department, the database will fetch all the information in all the rows matching that department. If you are only looking for a specific column of the dataset, loading everything in a row provides a lot of unnecessary data and creates inefficiencies.

Row-oriented databases are ideal for scenarios where you are looking up individual transactions and are not suited for aggregations.

The best examples of a row-oriented database is a Relation Database.

Columnar (Column-oriented) Database

In a columnar database, the data is stored (and retrieved) in columns. When retrieved, only the columns and data needed are loaded. Because it only retrieves the data needed, it is best suited for analytical processing and business intelligence tools.

Columnar databases also perform better in aggregations because of high compression rates. This is due to fewer distinct or unique values in columns (think departments in a table of employees).

The best example of a columnar database is HBase (click here to read more about HBase).

Conclusion

Both of these types of databases do a great job, and one is not better than the other. They serve different purposes as we discussed, and it is important to understand the different use cases.

Hopefully this post helped you better understand what these two types of databases are.

I have half a decade of experience working with data science and data engineering in a variety of fields both professionally and in academia. I ahve demonstrated advanced skills in developing machine learning algorithms, econometric models, intuitive visualizations and reporting dashboards in order to communicate data and technical terminology in an easy to understand manner for clients of varying backgrounds.

Write A Comment