Code360 powered by Coding Ninjas X Naukri.com. Code360 powered by Coding Ninjas X Naukri.com
Table of contents
1.
Introduction
2.
Documents
3.
Collections
4.
Comparison With Relational Databases
5.
CRUD Operations
6.
Features
7.
Use Cases
8.
Some Popular Document Databases
9.
Advantages of Document Databases 
10.
Disadvantages of Document Databases
11.
FAQs
11.1.
What are document databases?
11.2.
What are documents in a document database?
11.3.
What is the key difference between document databases and relational databases?
11.4.
Why is document databases said to be schema-less?
12.
Conclusion
Last Updated: Mar 27, 2024
Medium

Document Databases in Big Data

Author ANKIT KUMAR
0 upvote
Master Python: Predicting weather forecasts
Speaker
Ashwin Goyal
Product Manager @

Introduction

One of the most common types of NoSQL databases is document databases. A document database stores data in the form of JSON documents rather than columns and rows. JSON is a native language that can be used to store and query data. The documents store the data in a field-value pair.

Document databases use flexible documents rather than fixed rows and columns to store data. They are the most popular alternative to tabular, relational databases.

Documents

They are the records in a document Database. They are equivalent to tuples in a relational database.

Below is an example of a JSON document that stores four field-value pairs. The values can be a variety of types and structures, including strings, numbers, dates, arrays, or objects. 

{
"ID" : "07",
"Name" : "MS Dhoni",
"Team" : "CSK",
"Responsibility" : "Captain",
}

The data - ID, Name, Team, and Responsibility represents fields, and the data corresponding to each field is the value for that respective field.

Get the tech career you deserve, faster!
Connect with our expert counsellors to understand how to hack your way to success
User rating 4.7/5
1:1 doubt support
95% placement record
Akash Pal
Senior Software Engineer
326% Hike After Job Bootcamp
Himanshu Gusain
Programmer Analyst
32 LPA After Job Bootcamp
After Job
Bootcamp

Collections

A group of similar kinds of documents is known as collections. It is to be clearly understood that by similar, we don't mean that each document should have the exact same fields. One of the major advantages of the documents is that each document can have different fields.

We can create a collection of documents (say players) and store the necessary information in it. Each document can have different fields representing the attributes of the players. In simple terms, the collections are equivalent to tables in a relational database.

Comparison With Relational Databases

The factors that differentiate document databases from relational databases are:

  • Better Data Model

Documents are much more natural to work with because they map to objects in code. There is no need to decompose data across tables, perform costly joins, or integrate an additional Object Relational Mapping (ORM) layer. While the data is organised in the form of tuples in the relational databases, the documents have properties instead of rows. These properties represent the field-value pair.

  • Acceptance of JSON

JSON has become very popular. More and more developers are using JSON as it is lightweight, language-independent, and human-readable. Instead of performing complex join operations in a relational database, it is easy to retrieve information from JSON documents.

  • Flexibility of Schema

One of the major advantages of document databases is that it provides a flexible schema. One does not have to compulsorily define the schema and provide the attributes of the column beforehand as we do in the relational database. The documents can have different field-value pairs.

  • Easy to Work With

Users no longer have to worry about manually splitting related data across multiple tables when storing it or rejoining it when retrieving it. They also do not need to rely on an ORM to handle data manipulation for them. Instead, they can work directly with the data in their applications.

CRUD Operations

Document databases typically include an API or query language that enables developers to perform CRUD (create, read, update, and delete) operations.

  • Create

Each document comes with a unique identifier when created.

  • Read

With the help of unique identifiers or field values, the query language or the API is able to search for a particular document. This is how the documents are read from the document database. Indexes can be added to the database in order to increase read performance.

  • Update

Document databases also provide the facility to update the documents either partly or whole.

  • Delete

Document databases also allow us to delete the documents.

Features

The features of the document databases are:

  • Document Model

Unlike relational databases, where the data is stored in tables, or graph databases, where the data is stored using nodes and edges, the data in the document databases are stored in documents having field-value pairs.

  • Flexible Schema

All the documents in the collection need not have the exact same fields. The documents may have different field-value pairs.

  • Resilient and Distributed

Document databases allow horizontal scaling and data distribution. It provides resiliency through replication.

  • Querying

Document databases come with an API or query language that developers can use to perform CRUD operations on the database. Developers can search for documents based on unique identifiers or field values.

Use Cases

There are various use cases of the document databases. A few of them are listed below.

  • Content Management
  • Book Database
  • Payment processing
  • Internet of Things(IoT)
  • Mobile applications
  • Real-time analytics

Some Popular Document Databases

Some of the most popular document databases are:

  • MongoDB
  • Amazon DocumentDB
  • Cosmos DB
  • ArangoDB
  • Couchbase Server
  • CouchDB

You can read more about MongoDB here.

Advantages of Document Databases 

There are various reasons to use document databases. The major advantages of the document databases are:

  • Open formats

It uses XML, JSON, and other derivatives to describe documents.

  • No foreign keys

The documents are independent of each other.

  • Schema-less

There is no restriction on the structure of the data storage. All the documents in the collection need not have the exact same fields. This provides for a schema-less database.

  • Easy Querying

With the help of API and query languages, querying the documents in the database is easy and simple.

Disadvantages of Document Databases

Contrary to the advantages and features of the document databases, there are also some flaws in it. They are described below.

  • ACID transaction

It does not support multi-document ACID transactions.A change involving two collections in a document database will necessitate running two separate queries (per collection). This violates the atomicity requirement.

  • Consistency-check Limitations

It may happen that two documents may not be interrelated or may contain duplicate data.

  • Security

The document databases must be handled with care. As most companies have started adopting NoSQL databases, they are also prone to data leaks.

FAQs

What are document databases?

A document database stores data in the form of JSON documents rather than columns and rows. It stores data in field-value pairs.

What are documents in a document database?

They are the records in a document database. They are equivalent to tuples in a relational database.

What is the key difference between document databases and relational databases?

Document databases typically use flexible JSON-like documents with field-value pairs to model data. Relational databases typically use rigid tables with fixed rows and columns to model data.

Why is document databases said to be schema-less?

One does not have to compulsorily define the schema and provide the attributes of the column beforehand. There is no restriction on the structure of the data storage. All the documents in the collection need not have the exact same fields. This is why document databases are said to be schema-less.

Conclusion

In this article, we have extensively discussed the document databases in Big Data.

  • A document database stores data in the form of JSON documents rather than columns and rows. It stores data in field-value pairs.
  • Documents are much more natural to work with because they map to objects in code. There is no need to decompose data across tables, perform costly joins, or integrate an additional Object Relational Mapping (ORM) layer.
  • There is no restriction on the structure of the data storage. All the documents in the collection need not have the exact same fields. 

We hope that this blog has helped you enhance your knowledge regarding document databases in Big Data and if you would like to learn more, check out our articles here. You can also try the guided path here for interview preparations. Do read these blogs related to Data miningData warehouseoperational databases, and big data. Do upvote our blog to help other ninjas grow. Happy Coding!

Previous article
Roles and Responsibilities of Data Visualization Analyst (Visual Analyst)
Next article
Polyglot Persistence
Live masterclass