Unit 1
Unit 1
Unit 1:
Syllabus:
1
ITECH WORLD AKTU
Key Points:
• Outcome: Generates insights for strategic decisions in various domains like busi-
ness, healthcare, and technology.
• Tools: Includes Python, R, Excel, and specialized tools like Tableau, Power BI.
Example: A retail store uses data analytics to identify customer buying patterns
and optimize inventory management, ensuring popular products are always in stock.
1. Social Data:
2. Machine-Generated Data:
• Sensors and IoT Devices: Data from devices like thermostats, smart-
watches, and industrial sensors.
• Log Data: Records of system activities, such as server logs and application
usage.
• GPS Data: Location information generated by devices like smartphones and
vehicles.
• Telemetry Data: Remote data transmitted from devices, such as satellites
and drones.
3. Transactional Data:
2
ITECH WORLD AKTU
Example:
• A social media platform like Twitter generates vast amounts of social data from
tweets, hashtags, and mentions.
• Machine-generated data from GPS in delivery trucks helps optimize routes and
reduce costs.
• A retail store’s transactional data tracks customer purchases and identifies high-
demand products.
• Structured Data: Data that is organized in a tabular format with rows and
columns. It follows a fixed schema, making it easy to query and analyze.
• Semi-Structured Data: Data that does not have a rigid structure but contains
tags or markers to separate elements. It lies between structured and unstructured
data.
Comparison Table:
• Volume: Refers to the sheer amount of data generated. Modern data systems
must handle terabytes or even petabytes of data.
• Velocity: Refers to the speed at which data is generated and processed. Real-time
data processing is crucial for timely insights.
3
ITECH WORLD AKTU
– Example: Stock market systems process millions of trades per second to pro-
vide real-time updates.
• Variety: Refers to the different types and formats of data, including structured,
semi-structured, and unstructured data.
• Veracity: Refers to the quality and reliability of the data. High veracity ensures
data accuracy, consistency, and trustworthiness.
– Example: Data from unreliable sources or with missing values can lead to
incorrect insights.
Real-Life Scenario: Social media platforms like Twitter deal with high Volume
(millions of tweets daily), high Velocity (real-time updates), high Variety (text, images,
videos), and mixed Veracity (authentic and fake information).
4
ITECH WORLD AKTU
ciently manage. These platforms enable businesses and organizations to derive meaningful
insights from large-scale and diverse data.
Key Features of Big Data Platforms:
• Hadoop:
• Spark:
• NoSQL Databases:
5
ITECH WORLD AKTU
Example: A retail company uses data analytics to predict customer demand for
products, enabling them to stock inventory more efficiently.
• Early Stages (Manual and Small Data): In the past, analytics was performed
manually with small datasets, often using spreadsheets or simple statistical tools.
• Relational Databases and SQL: With the rise of structured data, relational
databases and SQL-based querying became more prevalent, offering better scala-
bility for handling larger datasets.
• Big Data and Distributed Computing: The advent of big data technolo-
gies such as Hadoop and Spark allowed for the processing and analysis of massive
datasets across distributed systems.
6
ITECH WORLD AKTU
• Cloud Computing: Cloud-based platforms like AWS, Google Cloud, and Azure
have made scaling analytics infrastructure easier by providing on-demand resources,
reducing the need for physical hardware.
• Real-Time Data Analytics: Technologies such as Apache Kafka and stream
processing frameworks have enabled the processing of data in real-time, further
enhancing scalability.
• Data Collection: Gathering raw data from various sources such as databases,
APIs, or sensors.
• Data Cleaning: Identifying and correcting errors or inconsistencies in the dataset
to improve the quality of the data.
• Data Exploration: Visualizing and summarizing data to understand patterns and
distributions.
• Model Building: Selecting and applying statistical or machine learning models
to predict or classify data.
• Evaluation and Interpretation: Evaluating the accuracy and effectiveness of
models, and interpreting the results for actionable insights.
Tools:
• Statistical Tools: R, Python (with libraries like Pandas, NumPy), SAS
• Machine Learning Frameworks: TensorFlow, Scikit-learn, Keras
• Big Data Tools: Hadoop, Apache Spark
• Data Visualization: Tableau, Power BI, Matplotlib (Python)
4 Analysis vs Reporting
The difference between analysis and reporting lies in their purpose and approach to data:
• Analysis: Involves deeper insights into data, such as identifying trends, patterns,
and correlations. It often requires complex statistical or machine learning methods.
• Reporting: Focuses on summarizing data into a readable format, such as charts,
tables, or dashboards, to provide stakeholders with easy-to-understand summaries.
Example: A report might display sales numbers for the last quarter, while analysis
might uncover reasons behind those numbers, such as customer buying behavior or market
conditions.
7
ITECH WORLD AKTU
• Apache Spark: A fast, in-memory data processing engine for big data analytics.
• Power BI: A powerful business analytics tool that allows users to visualize data
and share insights.
• Tableau: A data visualization tool that enables users to create interactive dash-
boards and visual reports.
• Python with Libraries: Libraries like Pandas, Matplotlib, and Scikit-learn enable
efficient data analysis and visualization.
• Healthcare: Analyzing patient data for better diagnosis, treatment plans, and
management of healthcare resources.
• Finance: Fraud detection, risk assessment, and portfolio optimization through the
analysis of financial data.
8
ITECH WORLD AKTU
• Data Preparation: Collecting, cleaning, and transforming data into usable for-
mats.
9
ITECH WORLD AKTU
• Optimizes Resource Usage: The lifecycle ensures efficient use of resources, such
as time, tools, and personnel. By organizing tasks in a structured way, projects are
completed more efficiently, avoiding wasted effort and resources.
• Improves Communication: Clear milestones and stages help teams stay aligned
and facilitate communication about the progress of the project. This clarity is
especially useful when different teams or departments are involved.
• Better Decision-Making: The lifecycle ensures that all steps are thoroughly exe-
cuted, leading to high-quality insights. This improves decision-making by providing
businesses with reliable and actionable data.
• Data Scientist:
• Data Engineer:
10
ITECH WORLD AKTU
• Business Analyst:
– A business analyst bridges the gap between the technical team (data scientists
and engineers) and business stakeholders.
– They are responsible for understanding the business problem and translating
it into actionable data-driven solutions.
– Business analysts also interpret the results of data analysis and communicate
them in a way that is understandable for non-technical stakeholders.
– Example: A business analyst analyzes customer feedback data and interprets
the results to help the marketing team refine their targeting strategy.
• Project Manager:
1. Discovery:
2. Data Preparation:
3. Model Planning:
11
ITECH WORLD AKTU
4. Model Building:
• Implement the selected models using tools like Python, R, or machine learning
libraries (e.g., Scikit-learn, TensorFlow).
• Train the model on the prepared dataset.
• Tune hyperparameters to improve model performance.
5. Communicating Results:
6. Operationalization:
• Deploy the model into a production environment for real-time analysis or batch
processing.
• Integrate the model with existing business systems (e.g., CRM, ERP).
• Monitor and maintain the model’s performance over time.
Example: A retail company builds a model to predict customer churn and integrates
it into their CRM system.
12