Collections
A group of similar kinds of documents is known as collections. It is to be clearly understood that by similar, we don't mean that each document should have the exact same fields. One of the major advantages of the documents is that each document can have different fields.
We can create a collection of documents (say players) and store the necessary information in it. Each document can have different fields representing the attributes of the players. In simple terms, the collections are equivalent to tables in a relational database.
Comparison With Relational Databases
The factors that differentiate document databases from relational databases are:
Documents are much more natural to work with because they map to objects in code. There is no need to decompose data across tables, perform costly joins, or integrate an additional Object Relational Mapping (ORM) layer. While the data is organised in the form of tuples in the relational databases, the documents have properties instead of rows. These properties represent the field-value pair.
JSON has become very popular. More and more developers are using JSON as it is lightweight, language-independent, and human-readable. Instead of performing complex join operations in a relational database, it is easy to retrieve information from JSON documents.
One of the major advantages of document databases is that it provides a flexible schema. One does not have to compulsorily define the schema and provide the attributes of the column beforehand as we do in the relational database. The documents can have different field-value pairs.
Users no longer have to worry about manually splitting related data across multiple tables when storing it or rejoining it when retrieving it. They also do not need to rely on an ORM to handle data manipulation for them. Instead, they can work directly with the data in their applications.
CRUD Operations
Document databases typically include an API or query language that enables developers to perform CRUD (create, read, update, and delete) operations.
Each document comes with a unique identifier when created.
With the help of unique identifiers or field values, the query language or the API is able to search for a particular document. This is how the documents are read from the document database. Indexes can be added to the database in order to increase read performance.
Document databases also provide the facility to update the documents either partly or whole.
Document databases also allow us to delete the documents.
Features
The features of the document databases are:
Unlike relational databases, where the data is stored in tables, or graph databases, where the data is stored using nodes and edges, the data in the document databases are stored in documents having field-value pairs.
All the documents in the collection need not have the exact same fields. The documents may have different field-value pairs.
- Resilient and Distributed
Document databases allow horizontal scaling and data distribution. It provides resiliency through replication.
Document databases come with an API or query language that developers can use to perform CRUD operations on the database. Developers can search for documents based on unique identifiers or field values.
Use Cases
There are various use cases of the document databases. A few of them are listed below.
- Content Management
- Book Database
- Payment processing
- Internet of Things(IoT)
- Mobile applications
- Real-time analytics
Some Popular Document Databases
Some of the most popular document databases are:
- MongoDB
- Amazon DocumentDB
- Cosmos DB
- ArangoDB
- Couchbase Server
- CouchDB
You can read more about MongoDB here.
Advantages of Document Databases
There are various reasons to use document databases. The major advantages of the document databases are:
It uses XML, JSON, and other derivatives to describe documents.
The documents are independent of each other.
There is no restriction on the structure of the data storage. All the documents in the collection need not have the exact same fields. This provides for a schema-less database.
With the help of API and query languages, querying the documents in the database is easy and simple.
Disadvantages of Document Databases
Contrary to the advantages and features of the document databases, there are also some flaws in it. They are described below.
It does not support multi-document ACID transactions.A change involving two collections in a document database will necessitate running two separate queries (per collection). This violates the atomicity requirement.
- Consistency-check Limitations
It may happen that two documents may not be interrelated or may contain duplicate data.
The document databases must be handled with care. As most companies have started adopting NoSQL databases, they are also prone to data leaks.
FAQs
What are document databases?
A document database stores data in the form of JSON documents rather than columns and rows. It stores data in field-value pairs.
What are documents in a document database?
They are the records in a document database. They are equivalent to tuples in a relational database.
What is the key difference between document databases and relational databases?
Document databases typically use flexible JSON-like documents with field-value pairs to model data. Relational databases typically use rigid tables with fixed rows and columns to model data.
Why is document databases said to be schema-less?
One does not have to compulsorily define the schema and provide the attributes of the column beforehand. There is no restriction on the structure of the data storage. All the documents in the collection need not have the exact same fields. This is why document databases are said to be schema-less.
Conclusion
In this article, we have extensively discussed the document databases in Big Data.
- A document database stores data in the form of JSON documents rather than columns and rows. It stores data in field-value pairs.
- Documents are much more natural to work with because they map to objects in code. There is no need to decompose data across tables, perform costly joins, or integrate an additional Object Relational Mapping (ORM) layer.
- There is no restriction on the structure of the data storage. All the documents in the collection need not have the exact same fields.
We hope that this blog has helped you enhance your knowledge regarding document databases in Big Data and if you would like to learn more, check out our articles here. You can also try the guided path here for interview preparations. Do read these blogs related to Data mining, Data warehouse, operational databases, and big data. Do upvote our blog to help other ninjas grow. Happy Coding!