BIGDATA, BIGDATA PROBLEMS AND HADOOP

Nivedita Shinde
3 min readSep 17, 2020

What is BIGDATA??

Big data refers to the large, diverse sets of information that grow at ever-increasing rates. It encompasses the volume of information, the velocity or speed at which it is created and collected, and the variety or scope of the data points being covered

Types of BIGDATA-

  1. Structured Data:

The data that can be stored and processed in a fixed format is called as Structured Data. Data stored in a relational database management system is one example of ‘structured’ data.

Structured data is is considered the most ‘traditional’ form of data storage, since the earliest versions of database management systems (DBMS) were able to store, process and access structured data.

2.Unstructured Data:

Unstructured data is information that either does not have a pre-defined data model or is not organized in a pre-defined manner. Unstructured information is typically text-heavy, but may contain data such as dates, numbers, and facts as well.

3.Semi-structured Data:

Semi-structured data is the data which does not conforms to a data model but has some structure. It lacks a fixed or rigid schema. It is the data that does not reside in a rational database but that have some organisational properties that make it easier to analyse.

Characteristics of BigData:

  1. Volume: ‘Volume’ is one characteristic that needs to be considered while dealing with Big Data. The name Big Data itself is related to a size that is enormous. Also, whether a particular data can actually be considered as a Big Data or not, is dependent upon the volume of data.
  2. Velocity: The term ‘velocity’ refers to the speed of generation of data. How fast the data is generated and processed to meet the demands, determines the real potential in the data. Big Data Velocity deals with the speed at which data flows.
  3. Variety: ‘Variety’ refers to heterogeneous sources and the nature of data, both structured and unstructured. During earlier days, spreadsheets and databases were the only sources of data considered by most of the applications.
  4. Veracity: Big Data Veracity refers to the biases, noise, and abnormality in data. Is the data that is being stored, and mined meaningful to the problem being analyzed.

Problems in storing BigData:

  1. Infrastructure.
  2. Cost.
  3. Security.
  4. Corruption.
  5. Scale.
  6. UI and accessibility.
  7. Compatibility.

HADOOP:

Hadoop is an open source, Java based framework used for storing and processing big data. The data is stored on inexpensive commodity servers that run as clusters. Hadoop uses the MapReduce programming model for faster storage and retrieval of data from its nodes.

Hadoop Tools for BigData:

THANK YOU!!!

--

--