Introduction
This blog discusses the various types of databases that are used in the software industry. Operational databases are critical components of Big Data Analytics. Efficient storage of data as per the requirements is crucial to deliver the query results within stipulated time. In the previous blog, we discussed two major database design choices - Relational and Non-Relational Databases. It is recommended to go through that blog first to get a better understanding of operational databases.
In this blog, we will discuss Document, Columnar and Graph Databases with examples. So, let’s get started.
Document Databases
There are two types of document databases. One is often referred to as a repository for full document-style content (Word files, complete web pages, and so on). The other is a database that stores document components for either permanent storage as a static entity or dynamic assembly of document parts. JavaScript Object Notation (JSON) and/or Binary JSON (BSON) offer the structure of the documents and its parts. When you need to create a large number of reports that must be dynamically built from elements that change regularly, document databases come in handy. Document completion in healthcare, for example, will vary in content composition depending on member profile (age, residency, and income level), healthcare plan, and government programme eligibility.
JSON is a data-transfer format based on a subset of the JavaScript programming language at its core. It is textual in nature and relatively easy to read and write, while being part of a computer language. It also has the benefit of being simple to manage by computers. In JSON, there are two basic structures that are supported by most, if not all, modern programming languages. A collection of name/value pairs is the first basic structure, and it is represented programmatically as objects, records, keyed lists, and so on. An ordered list of values is the second basic structure, and it is represented programmatically as arrays, lists, or sequences. BSON is a binary serialisation of JSON structures with the goal of improving performance and scalability.
MongoDB
MongoDB got its name from the “hu(mongo)us database” system - Name of an open-source project. It is open source and is freely available under the GNU AGPL v3.0 licence. It is maintained by a firm called 10gen. From 10gen, commercial licences with full support are available at 10gen.
MongoDB is gaining in popularity, and it could be a suitable fit for your big data implementation's data store. MongoDB is made up of databases with "collections" in them. "Documents" make up a collection, and each document is made up of fields. You can index a collection in the same way that you can index a relational database. As a result, data lookup performance improves.
However, unlike other databases, MongoDB returns a "cursor," which serves as a pointer to the data. This is a highly important feature because it allows you to count or classify data without having to extract it. MongoDB comes with BSON, a binary implementation of JSON documents.
Features
- Ad hoc queries, distributed inquiries, and full-text search are supported by this querying service.
- A sharding service that spreads a single database across a cluster of servers in a single data centre or across numerous data centres. A shard key controls the service. The shard key is used to intelligently distribute documents across several instances.
- To support analytics and aggregation of various collections/documents, MapReduce is used.
- GridFS is a grid-based file system that allows huge objects to be stored by dividing them among numerous documents.