Introduction
In past decades generated data was structured and much lesser in amount than today. RDBMS was used to store these kinds of data. But Nowadays, we process a massive amount of semi-structured data like emails,JSONS,XML,.csv files, etc. RDBMS failed to store and manage this data, and HBase came into the picture.
HBase is a data model similar to Google's big table designed to provide quick random access to vast amounts of structured data. HBase is an open-source, multidimensional, distributed, scalable NoSQL database written in Java and runs on top of HDFS (Hadoop Distributed File System). It is designed to store a large collection of sparse data sets. It allows users to retrieve information instantly.
For example, HBase quickly does this job if HBase contains 5 billion records and we wish to find 20 large items.
Also see, Multiple Granularity in DBMS
Storage Mechanism in HBase
HBase is a column-oriented database in which the data is stored in a table. In the HBase table schema, only column families are defined. The HBase table has multiple families, and each family can have unlimited columns. The column values are stored sequentially on a disk. Every table cell has its timestamp (which means a digital record of time or occurrence of a particular event at a future time.)
- Table: The table is a collection of rows.
- Row: Row is a collection of column families.
- Column family: It is a collection of columns.
- Column: A column is a group of key-value pairs.
- Timestamp: It keeps a record of digital time and date.