Do you think IIT Guwahati certified course can help you in your career?
No
Introduction
Mostly, unstructured data of size petabytes or exabytes which are increasing exponentially are referred to as Big Data. These data are globally present and are distributed. Due to a drastic increase in storage, here SQL or Oracle server fails, thus Hadoop, Spark, and MapReduce come into existence.
These data are classified into three categories:
Structured Data
Unstructured Data
Semi-structured Data
Let us discuss about each of these Types of Data Structure and the key differences between them.
Structured Data
Any data that are accessible and are stored or processed in the form of fixed-format is termed structured data. The employee table in the Database is an example of structured data. Banking transaction data and website-related data are all examples of structured data. The attributes present in structured data must be related to each other in some form. These data are stored in a relational database.
Unstructured Data
Irregular and ambiguous data, having no predefined data model and no pre-defined structure, are referred to as unstructured data. However, the most straightforward way to extract information is from unstructured data due to the presence of Artificial Intelligence. These data can be a combination of text, numbers, audio, video, images, messages, social media posts and many more. Twitter, Instagram, Facebook, and Google all are made up of unstructured data.
Semi-structured Data
These kinds of data falls between structured and unstructured data. It is a combination of partly structured data and partly unstructured data. For example, emails, XML, and WWW are all semi-structured data.
Difference between Structured, Semi-structured and Unstructured data
Structured Data
Unstructured Data
Semi-structured Data
Well organised data
Not organised at all
Partially organised
It is less flexible and difficult to scale. It is schema dependent.
It is flexible and scalable. It is schema independent.
It is more flexible and simpler to scale than structured data but lesser than unstructured data.
It is based on relational database.
It is based on character and binary data.
It is based on XML/ RDF
Versioning over tuples,row,tables
Versioning is like as a whole data.
Versioning over tuples is possible.
Easy analysis
Difficult analysis
Difficult analysis compared to structured data but easier when compared to unstructured data.
Financial data, bar codes are some of the examples of structured data.
Media logs, videos, audios are some of the examples of unstructured data.
Tweets organised by hashtags, folder organised by topics are some of the examples of unstructured data.
Till now, I assume you must have got a basic idea about the difference between structured, semi-structured and unstructured data.
Frequently Asked Questions
What is Hadoop? Why is it used?
Hadoop is a software framework which is used to store data and run applications on clusters. It helps in providing massive storage for any kind of data or information. It provides enormous processing power and also the ability that can handle virtually limitless concurrent tasks or jobs running simultaneously.
What is scalability?
Scalability is the measure or unit of a system's ability that can either increase or decrease the performance and also the cost with respect to changes in application and also sometimes, system processing demands. Scalability can also be referred to as differentiability, or expandability.
What is ambiguous data? Explain with an example.
Ambiguous data are those information that exists as the same thing in two different attributes. We can also say that these are those types of data which are not specific or are uncertain. For example, a person with the same first and last name may give rise to ambiguity.
Conclusion
This article taught us about different types of big data. We discussed each data individually and also the difference between structured, semi-structured and unstructured data.
We hope you could easily take away all critical and conceptual techniques by walking over the given examples.