Dsbda Ut6
Dsbda Ut6
Bar Chart Represents categorical data with rectangular bars Compare product sales
Line Chart Displays trends over time Stock price over months
Box Plot Visualizes spread and outliers in data Exam score analysis
Tool Description
Power BI A Microsoft product that connects data sources and builds dashboards.
QlikView/Qlik Sense Offers associative data models for dynamic dashboards and reports.
Google Data Studio Free Google tool for simple, shareable reports using Google data.
Plotly Python and JavaScript library for interactive plots and dashboards.
Matplotlib / Seaborn (Python) Libraries used for 2D plotting and statistical visualizations.
1. Bar Charts
• Purpose: Compare values across categories.
• Use Case: Comparing sales across regions or product categories.
2. Line Charts
• Purpose: Display trends over time.
• Use Case: Monthly revenue growth or stock price fluctuations.
3. Pie Charts
• Purpose: Show proportions of a whole.
• Use Case: Market share distribution among companies.
4. Histograms
• Purpose: Show the distribution of a single variable.
• Use Case: Frequency of scores in an exam.
5. Scatter Plots
• Purpose: Visualize the relationship between two numeric variables.
• Use Case: Analyzing the correlation between advertising spend and revenue.
6. Heatmaps
• Purpose: Show values in a matrix with color intensity.
• Use Case: Correlation matrix of features in a dataset.
7. Box Plots (Box-and-Whisker)
• Purpose: Summarize distribution using median, quartiles, and outliers.
• Use Case: Compare test scores across different classes.
8. Area Charts
• Purpose: Similar to line charts but emphasize the magnitude of change.
• Use Case: Visualizing the cumulative growth of users over time.
9. Tree Maps
• Purpose: Display hierarchical data as nested rectangles.
• Use Case: Visualizing budget allocation in an organization.
5. Analytical techniques used in Big data visualization.
Big data visualization involves representing massive, complex datasets visually to reveal trends, patterns,
and insights. Analytical techniques are essential to summarize, process, and present this data effectively.
Below are the key techniques used:
1. Descriptive Analytics
• Purpose: Summarizes past data to understand what happened.
• Example: Visualizing average monthly sales or website traffic trends.
• Tools: Bar charts, line charts, pie charts.
2. Diagnostic Analytics
• Purpose: Explores data to understand the reasons behind events or trends.
• Example: Analyzing why sales dropped in a region by comparing KPIs.
• Tools: Heatmaps, correlation matrices, drill-down charts.
3. Predictive Analytics
• Purpose: Uses historical data and machine learning to forecast future outcomes.
• Example: Forecasting product demand or customer churn.
• Tools: Time series charts, regression lines, prediction intervals.
4. Prescriptive Analytics
• Purpose: Recommends actions based on predictive insights and optimization.
• Example: Recommending pricing strategies based on customer segmentation.
• Tools: Decision trees, optimization dashboards.
6. Cluster Analysis
• Purpose: Groups similar data points together to identify patterns or segments.
• Example: Customer segmentation based on behavior.
• Tools: Cluster heatmaps, 3D scatter plots, dendrograms.
7. Anomaly Detection
• Purpose: Identifies unusual data points or outliers.
• Example: Detecting fraudulent transactions or spikes in sensor data.
• Tools: Line charts with threshold bands, box plots.
9. Geospatial Analysis
• Purpose: Analyzes data related to geographic locations.
• Example: Mapping customer density or delivery routes.
• Tools: Choropleth maps, geo heatmaps.
2. Scatter Plot
• Description: A graph of plotted points that show the relationship between two variables.
• Use Cases:
o Identifying correlations or patterns between variables.
o Detecting clusters or outliers.
• Example: Relationship between hours studied and exam scores.
3. Histogram
• Description: A bar graph representing the frequency distribution of a dataset.
• Use Cases:
o Showing the distribution of a single variable.
o Understanding how data is spread (e.g., normal, skewed).
• Example: Distribution of ages in a customer dataset.
4. Density Plot
• Description: A smoothed version of a histogram that shows the probability density function of a
continuous variable.
• Use Cases:
o Comparing distributions between groups.
o Understanding data distribution in a continuous manner.
• Example: Comparing test scores between two student groups.
1. MapReduce
• Definition: A programming model used for processing and generating large datasets in a distributed
manner.
• Working:
o Map step: Converts input data into key-value pairs.
o Reduce step: Aggregates results based on keys.
• Use Case: Counting the number of occurrences of words in a large document.
• Advantages:
o Handles huge datasets efficiently.
o Fault-tolerant and scalable.
2. Pig
• Definition: A high-level platform that uses a scripting language (Pig Latin) to process large datasets.
• Components:
o Pig Latin: Language used to express data flows.
o Execution Engine: Converts Pig Latin into MapReduce jobs.
• Use Case: Data transformation tasks like filtering, joining, and grouping.
• Advantages:
o Easier to write than raw MapReduce.
o Suitable for ETL (Extract, Transform, Load) operations.
3. Hive
• Definition: A data warehouse system for Hadoop that allows querying of large datasets using a SQL-
like language called HiveQL.
• Working: Hive queries are internally converted to MapReduce jobs.
• Use Case: Running SQL-like queries on big data stored in HDFS.
• Advantages:
o Ideal for users familiar with SQL.
o Schema flexibility and partitioning support.
o Good for summarization and analysis.
4. Apache Spark
• Definition: A fast, general-purpose big data processing engine that performs in-memory computing
for increased speed.
• Components:
o Spark Core: The base engine for large-scale computation.
o Spark SQL: SQL queries.
o Spark Streaming: Real-time data processing.
o MLlib: Machine learning library.
o GraphX: Graph processing.
• Use Case: Real-time data analysis, iterative machine learning tasks.
• Advantages:
o Faster than MapReduce due to in-memory processing.
o Supports multiple languages (Python, Scala, Java, R).
o Compatible with Hadoop and HDFS.