Code360 powered by Coding Ninjas X Naukri.com. Code360 powered by Coding Ninjas X Naukri.com
Table of contents
1.
Introduction
2.
Document Databases
2.1.
MongoDB
2.2.
Features
3.
Columnar Databases
3.1.
HBase Columnar Database
3.2.
Features
4.
Graph Databases
4.1.
Neo4J Graph Database
4.2.
Features
5.
Frequently Asked Questions
5.1.
What are the examples of columnar databases?
5.2.
Is MongoDB a graph database?
5.3.
Is NoSQL a graph database?
5.4.
What is meant by a document database?
6.
Conclusion
Last Updated: Mar 27, 2024
Easy

Operational Databases- Part 2

Introduction

This blog discusses the various types of databases that are used in the software industry. Operational databases are critical components of Big Data Analytics. Efficient storage of data as per the requirements is crucial to deliver the query results within stipulated time. In the previous blog, we discussed two major database design choices - Relational and Non-Relational Databases. It is recommended to go through that blog first to get a better understanding of operational databases.

In this blog, we will discuss Document, Columnar and Graph Databases with examples. So, let’s get started.

Document Databases

There are two types of document databases. One is often referred to as a repository for full document-style content (Word files, complete web pages, and so on). The other is a database that stores document components for either permanent storage as a static entity or dynamic assembly of document parts. JavaScript Object Notation (JSON) and/or Binary JSON (BSON) offer the structure of the documents and its parts. When you need to create a large number of reports that must be dynamically built from elements that change regularly, document databases come in handy. Document completion in healthcare, for example, will vary in content composition depending on member profile (age, residency, and income level), healthcare plan, and government programme eligibility.

JSON is a data-transfer format based on a subset of the JavaScript programming language at its core. It is textual in nature and relatively easy to read and write, while being part of a computer language. It also has the benefit of being simple to manage by computers. In JSON, there are two basic structures that are supported by most, if not all, modern programming languages. A collection of name/value pairs is the first basic structure, and it is represented programmatically as objects, records, keyed lists, and so on. An ordered list of values is the second basic structure, and it is represented programmatically as arrays, lists, or sequences. BSON is a binary serialisation of JSON structures with the goal of improving performance and scalability.

MongoDB

MongoDB got its name from the “hu(mongo)us database” system - Name of an open-source project. It is open source and is freely available under the GNU AGPL v3.0 licence. It is maintained by a firm called 10gen. From 10gen, commercial licences with full support are available at 10gen.

MongoDB is gaining in popularity, and it could be a suitable fit for your big data implementation's data store. MongoDB is made up of databases with "collections" in them. "Documents" make up a collection, and each document is made up of fields. You can index a collection in the same way that you can index a relational database. As a result, data lookup performance improves.

However, unlike other databases, MongoDB returns a "cursor," which serves as a pointer to the data. This is a highly important feature because it allows you to count or classify data without having to extract it. MongoDB comes with BSON, a binary implementation of JSON documents.

Features

  • Ad hoc queries, distributed inquiries, and full-text search are supported by this querying service.
  • A sharding service that spreads a single database across a cluster of servers in a single data centre or across numerous data centres. A shard key controls the service. The shard key is used to intelligently distribute documents across several instances.
  • To support analytics and aggregation of various collections/documents, MapReduce is used.
  • GridFS is a grid-based file system that allows huge objects to be stored by dividing them among numerous documents.

Columnar Databases

Because the data in each row of a table is kept together, relational databases are row orientated. The data in a columnar, or column-oriented database, is stored in rows. Although this may appear to be a minor distinction, it is the most fundamental feature of columnar databases. It's simple to add columns, and you can do so row by row, giving you a lot of flexibility, performance, and scalability. You might wish to utilise a columnar database if you have a lot of data and it's diverse. It's incredibly customizable; you just keep adding columns.

HBase Columnar Database

HBase is one of the most used columnar databases. It is an Apache Software Foundation project that is released under the Apache Software Licence v2.0. HBase's primary data storage needs are met by the Hadoop file system and MapReduce engine. 

HBase is based on Google's BigTable (an efficient method of storing non-relational data). As a result, HBase implementations are multidimensional sorted maps that are very scalable, sparse, distributed, and durable. A row key, column key, and timestamp are used to index the map, and each value is an uninterpreted array of bytes. HBase is an excellent choice when your big data system necessitates random, real-time read/write data access. It is frequently used to save results for subsequent analysis.

Features

  • HBase provides programmatic access via a Java API.
  • HBase allows LAN and WAN failover and recovery through the use of region servers. At the core, there is a master server that is in charge of monitoring the region servers as well as the cluster's metadata.
  • HBase provides strongly consistent reads and writes despite not being a "ACID" implementation and is not based on an eventually consistent model. This implies you can use it for high-speed requirements as long as you don't require RDBMS's "additional features," such as complete transaction support or typed columns.
  • Because the data is disseminated by the underlying file system, HBase allows for transparent, automatic splitting and redistribution of its content.

Graph Databases

The basic structure of graph databases is known as "node-relationship." This structure is particularly beneficial when dealing with data that is strongly interrelated. Properties, a key-value pair where the data is kept, are supported by nodes and relationships. The relationships in these databases are used to navigate them.

Because of the rigid table structures and inability to follow relationships between data wherever they may lead, this type of storage and navigation is not possible in RDBMSs. A graph database could be used to manage geographic data for oil exploration or to simulate and optimise the networks of a telecommunications provider.

Neo4J Graph Database

Neo4J is one of the most extensively used graph databases (www.neo4j.org). It is a free and open source project released under the GNU Public License v3.0. Neo Technology offers a commercially supported version under the GNU AGPL v3.0 licence and commercial licensing. Neo4J is an ACID transaction database that provides high availability via clustering. It is a reliable and scalable database that is simple to construct due to the underlying structure of node-relationship attributes and how naturally it maps to our actual human interactions. Because it does not require a schema or data typing, it is intrinsically very flexible.

Features

  • Integration: To provide seamless compatibility with nongraphing data stores, Neo4J supports transaction management with rollback.
  • Synchronisation: Neo4J provides event-driven behaviours via an event bus, periodic synchronisation with itself or an RDBMS as the master, and traditional batch synchronisation.
  • Resilience: Neo4J supports both cold (when the database is not running) and hot (while the database is active) backups, as well as a high-availability clustering option. Standard notifications can be integrated with existing operations management systems.
  • Query language: Neo4J supports Cypher, a declarative language intended primarily for querying graphs and their components. Cypher commands are ad hoc queries of graph data that are loosely based on SQL syntax.

Frequently Asked Questions

What are the examples of columnar databases?

Examples of Columnar Databases are 'Bigtable, Cassandra, HBase, Vertica, Druid, Accumulo, and Hypertable'.

Is MongoDB a graph database?

While it's a general purpose document database, MongoDB provides graph and tree traversal capabilities with its $graphLookup stage in the aggregation pipeline.

Is NoSQL a graph database?

The NoSQL graph database is a technology for data management designed to handle very large sets of structured, semi-structured or unstructured data.

What is meant by a document database?

A document database is a type of non relational database that is designed to store and query data as JSON-like documents. Document databases make it easier for developers to store and query data in a database by using the same document-model format they use in their application code.

Conclusion

This blog extensively discussed different types of database design choices, namely, document, graph and columnar databases. We also discussed an example database of each type of database with corresponding features.

We hope that this blog has helped you enhance your knowledge regarding operational databases and if you would like to learn more, check out our articles on Databases(DBMS). Do upvote our blog to help other ninjas grow. 

Recommended Readings:

If you want to explore preparation strategy for SDE placements, please have a look at this YouTube tutorial.

Head over to our practice platform Coding Ninjas Studio to practise top problems, attempt mock tests, read interview experiences, and much more.!

You can follow these links to get hands-on various state of the art topics - SQL ProblemsCoding InterviewsVideo Resources.

Happy Coding!

Live masterclass