Big Data Analytics Unit1
Big Data Analytics Unit1
• WHAT IS DATA ?
• A MEANING LESS INFORMATION IS CALLED DATA , THE MEASUREMENT COLLECTIONS,QUNTITY OF THINGS,
ACTIONS OF EVENTS ARE CALLED “DATA”.
• WITH DATA WE CANNOT JUSTIFY THE GIVEN DATA IS TRUE OR FALSE.
• WHAT IS INFORMATION ?
• STRUCTURED DATA IS KNOWN AS INFORMATION.
• DATA MUST BE PROCESSED IS PRESENTED IN PARTICULAR FORMAT IS KNOWN AS INFORMATION.
DAY-1 27-08-24
STATES OF DATA ?
• States of data consists of 3 types .they are :
• 1.CAPTURE
• 2.TRANSFORM
• 3.STORE
• 1.CAPTURE :
• Input data or previously stored data for present analysis is known as “Capture”.
• 2. .TRANSFORM :
• The unstructured data or tabular data will be refined to change the excat information is called “Transform”.
• 3.STORE :
• After processesing the result of analysis will be stored in the magnetic tapes.
DAY-1 27-08-24
• TYPES OF DATA :
• DATA IS CLASSIFIED INTO 3 TYPES:
• 1.STRUCTURED DATA:
• THE DATA HAS PROPER FORMAT ASSOCIATED WITH IT.
• 2.UNSTRUCTURED DATA :
• THE DATA DOES NOT HAS PROPER FORMAT .
• 3.SEMI STRUCTURED DATA :
• THE DATA DOES NOT HAVE ANY PROPER FORMAT WITH IT.
INTRODUCTION TO BIG DATA :
• WHAT IS BIG DATA ?
• IT IS ALSO A DATA BUT HUGE IN SIZE YET GRORING EXPONENTIALLY WITH TIME.
• WHEN THE DATA IS LARGE AND COMPLEX THERE IS NO TRADITIONAL TECHNIQUES FOR STORING AND PROCESSESING THE DATA.
• THE TERM "BIG DATA" TYPICALLY DESCRIBES DATASETS THAT ARE TOO LARGE, TOO DIVERSE, OR TOO COMPLEX TO BE ANALYSED USING TRADITIONAL
DATA PROCESSING TOOLS AND METHODS.
• THE GROWTH OF BIG DATA IS DRIVEN BY THE WIDESPREAD USE OF DIGITAL TECHNOLOGIES, SUCH AS SOCIAL MEDIA, MOBILE DEVICES, AND THE
INTERNET OF THINGS (IOT).
• THESE TECHNOLOGIES GENERATE MASSIVE AMOUNTS OF DATA THAT CAN BE COLLECTED, STORED, AND ANALYSED FOR INSIGHTS INTO CONSUMER
BEHAVIOUR, MARKET TRENDS, AND OTHER BUSINESS INSIGHTS.
• HADOOP, APACHE SPARK, AND NOSQL THIS IS FOR DATA BASES AS WELL AS DATA PROCESSING AND ANALYSIS TOOLS SUCH AS PYTHON, R, AND SQL .
INTRODUCTION TO BIG DATA :
• STORING THE DATA :
• HUGE AMOUNT OF DATA IS GENERATING WE NEED TO STORE IT THE FOLLOWING STEPS ARE TAKES PLACE
AT STORING THE BIG DATA :
• 1.ANALYSE THE DATA SOURCES.
• 2.ELIMINATE THE DUPLICATE DATA.
• 3.ESTABLISH NO SQL .
PROCESSESING THE DATA.
• BIG DATA TECHNOLOGIES AND TOOLS INCLUDE DATA STORAGE AND MANAGEMENT SYSTEMS LIKE
HADOOP, APACHE SPARK, AND NOSQL DATABASES, AS WELL AS DATA PROCESSING AND ANALYSIS TOOLS
SUCH AS PYTHON, R, AND SQL.
• MACHINE LEARNING ALGORITHMS AND ARTIFICIAL INTELLIGENCE TECHNIQUES ARE ALSO COMMONLY
USED TO ANALYSE BIG DATA AND EXTRACT INSIGHTS.
• CONNECTING AND EXTRACTING DATA FILES.
• SUB DIVIDE THE DATA INTO PARTITONS WITH HADOOP MAP REDUCE.
WHY THE BIG DATA ?
• NOW A DAYS SOCIAL MEDIA PLATFORMS GENERATING HUGE AMOUT OF DATA DAILY.
• INCREASING STORAGE DEVICES.
• AVAILBILITY OF DIFFERENT DATA FORMATS.
• INCREASING PROCESSING POWER.
• NEEDS A GOOD AND ACCURATE DATA.
INTRODUCTION TO BIG DATA PLATFORM
• IT IS IT OR SOFTWARE SOLUTION WHICH CAN COMBINES AND UTILITIES AS A SINGLE SOLUTION .
• IT IS A PLATFORM WHERE THE UPCOMING CHANGES CAN THE WITH YOUR EXITING FRAUTURES AND
TOOLS.
• CHANGE THE NATURE OF SOFTWARE ENGINEERING. WHERE THE BUSINESS PROCESS (ON TECHNOLOGY
SHOULD BE REPLACE. THIS IS ALSO CALLED AN CHANGE MANAGEMENT.
• THE ONE ORGANIZATION WHO IS DEVDOPING THE PLATFORM MUST BE DEPLOYBLE TO THE CUSTOMERS
AND CUSTOMER CAN EASILY OPERATE AND MANAGE HUGE DATA.
INTRODUCTION TO BIG DATA PLATFORM
• THE ARCHITECTURE OF A BIG DATA PLATFORM TYPICALLY CONSISTS OF SEVERAL LAYERS, INCLUDING :
• 1. DATA INGESTION : THIS LAYER IS RESPONSIBLE FOR COLLECTING AND BRINGING IN DATA FROM VARIOUS SOURCES, SUCH AS STRUCTURED AND
UNSTRUCTURED DATA, INTO THE PLATFORM.
• 2. DATA STORAGE : THIS LAYER STORES THE INGESTED DATA IN A DISTRIBUTED AND SCALABLE MANNER, TYPICALLY USING TECHNOLOGIES SUCH AS
HADOOP DISTRIBUTED FILE SYSTEM (HDFS), NOSQL DATABASES, AND CLOUD-BASED STORAGE SOLUTIONS.
• 3. DATA PROCESSING: THIS LAYER INVOLVES PROCESSING AND TRANSFORMING THE INGESTED DATA INTO A FORMAT THAT CAN BE EASILY ANALYSED
USING VARIOUS DATA PROCESSING TECHNOLOGIES SUCH AS APACHE SPARK, APACHE STORM, AND MAPREDUCE.
• 4. DATA ANALYSIS: THIS LAYER USES ADVANCED ANALYTICS TECHNIQUES SUCH AS MACHINE LEARNING, DATA MINING, AND NATURAL LANGUAGE
PROCESSING TO EXTRACT INSIGHTS AND KNOWLEDGE FROM THE PROCESSED DATA.
• 5. DATA VISUALIZATION: THIS LAYER PRESENTS THE ANALYSED DATA IN AN EASILY UNDERSTANDABLE AND INTERACTIVE FORMAT, TYPICALLY USING
VISUALIZATION TOOLS SUCH AS TABLEAU, QLIKVIEW, AND D3.JS
CHALLEMGES OF CONVOLUTIONAL
SYSTEMS :
INTILLIGENT DATA ANALYSIS :
NATURE OF DATA :
ANALYTIC PROCESSES AND TOOLS
• ANALYTIC PROCESSES AND TOOLS ARE USED TO PROCESS, ANALYZE, AND VISUALIZE DATA IN ORDER TO DERIVE INSIGHTS AND
KNOWLEDGE FROM IT. THERE ARE SEVERAL KEY STEPS IN THE ANALYTIC PROCESS, WHICH INCLUDE:
• 1. DATA COLLECTION AND PREPARATION: THIS INVOLVES IDENTIFYING AND GATHERING RELEVANT DATA FROM VARIOUS SOURCES
AND PREPARING IT FOR ANALYSIS. THIS INCLUDES DATA CLEANING, TRANSFORMATION, AND NORMALIZATION.
• 2. DATA EXPLORATION AND VISUALIZATION: THIS INVOLVES USING VISUAL TOOLS AND TECHNIQUES TO EXPLORE AND GAIN
INSIGHTS FROM THE DATA. DATA VISUALIZATION TOOLS, SUCH AS CHARTS AND GRAPHS, CAN BE USED TO PRESENT DATA IN A
CLEAR AND UNDERSTANDABLE WAY.
• 3. DATA MODELING AND ANALYSIS: THIS INVOLVES USING STATISTICAL AND MACHINE LEARNING TECHNIQUES TO BUILD MODELS
AND ANALYZE THE DATA. THESE MODELS CAN BE USED TO MAKE PREDICTIONS AND IDENTIFY PATTERNS IN THE DATA.
• 4. INTERPRETATION AND COMMUNICATION: THIS INVOLVES INTERPRETING THE RESULTS OF THE ANALYSIS AND COMMUNICATING
THE INSIGHTS TO STAKEHOLDERS. THIS INCLUDES PRESENTING FINDINGS IN A CLEAR AND ACTIONABLE WAY.
There are several tools and technologies used in the analytic process, including:
1. Statistical software: Statistical software, such as R and SAS, are used to perform data analysis
and build models.
2. Business intelligence tools: Business intelligence tools, such as Tableau and Power BI, are used to
visualize and analyze data.
3. Machine learning platforms: Machine learning platforms, such as TensorFlow and Scikit-learn, are used
to build and deploy machine learning models.
4. Data integration tools: Data integration tools, such as Apache Kafka and Apache NiFi, are used to
integrate data from various sources.
5. Cloud-based analytics platforms: Cloud-based analytics platforms, such as AWS and Google Cloud
Platform, provide scalable and cost-effective solutions for data analysis and processing
ANALYSIS V/S REPORTING