Introduction
Big data is a set of data that is huge in volume and of diverse types. A model should be designed keeping in mind how all elements come together in the aspect of data architecture. This might require time in the initial stages, but it will be beneficial in the later stages to save a lot of development time. Big Data is more of a strategy than a project.
When creating an environment to support Big Data, good design principles are essential while dealing with analytics, storage, applications, or reporting. The environment includes observations for infrastructure software, hardware, management, and operational software, APIs, and development tools. An architecture should address the foundational requirements, which are:
- Capture
- Integrate
- Organize
- Analyze
- Act
Big Data Stack Representation
The figure below presents the layered reference architecture which can be used as a framework for the process of Big data Technology, which addresses the requirements for big data projects.
Source: Big Data for Dummies
This is an extensive stack, and one may focus on certain aspects initially based on the specific problems that have to be addressed. Let us understand the layers more extensively.
- Redundant Physical Infrastructure: It is fundamental to the stability and operation of the big data infrastructure. To support huge volumes of data, the physical infrastructure for big data needs to be different.
- Security Infrastructure: Data Security is important for big data analysis in companies. The data needs to be protected and also meet the requirements.
- Operational Databases: To operate your business, you have to incorporate all data sources that give you a complete picture of the business and how data impacts it.
- Interfaces and feeds from/to the Internet: It both internally managed data feeds and data from external sources. It exists at every level and between every layer of the stack and without integrations services, big data isn’t possible.