0% found this document useful (0 votes)
12 views2 pages

Big Data Questions Answers

The document outlines various data structures in Big Data, including structured, semi-structured, unstructured, graph, and key-value formats, emphasizing the need for flexibility in handling large volumes of data. It describes the Big Data ecosystem, detailing components for storage, processing, data ingestion, querying, analytics, and visualization. Additionally, it covers the data analytics lifecycle, K-means clustering, association rules, and the responsibilities of a data scientist.

Uploaded by

Ratnesh hirnaik
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views2 pages

Big Data Questions Answers

The document outlines various data structures in Big Data, including structured, semi-structured, unstructured, graph, and key-value formats, emphasizing the need for flexibility in handling large volumes of data. It describes the Big Data ecosystem, detailing components for storage, processing, data ingestion, querying, analytics, and visualization. Additionally, it covers the data analytics lifecycle, K-means clustering, association rules, and the responsibilities of a data scientist.

Uploaded by

Ratnesh hirnaik
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Q1: Explain different data structures in Big Data.

- Structured: Tabular data stored in relational databases (e.g., MySQL, Oracle).

- Semi-structured: Data with some organization but not strictly tabular (e.g., XML, JSON).

- Unstructured: Raw data without a specific format (e.g., text, images, videos).

- Graph: Data modeled as nodes and edges (e.g., social networks).

- Key-Value: Data stored as key-value pairs (e.g., Redis, DynamoDB).

Big Data requires flexible data structures to store and process massive volumes efficiently.

Q2: Explain Big Data Ecosystem.

- Storage: HDFS, Amazon S3 for data storage.

- Processing: Hadoop, Spark for distributed processing.

- Data Ingestion: Apache Flume, Sqoop, Kafka for importing data.

- Querying: Hive, Pig, Impala for querying large datasets.

- Analytics: MLlib, Mahout for machine learning.

- Visualization: Tableau, Power BI for interpreting results.

The ecosystem supports the collection, storage, processing, and analysis of Big Data.

Q3: Explain Discovery of data phase of data analytics lifecycle.

- Identify business goals and problems.

- Understand data sources and availability.

- Define analytics objectives and success metrics.

- Form hypotheses and assumptions.

- Prepare a project plan and timeline.

This phase ensures objectives are clear before data preparation.

Q4: Write a short note on K-means.

- K-means is an unsupervised machine learning algorithm.


- Used for clustering data into K groups based on features.

- Starts with random centroids, then assigns points to nearest centroid.

- Updates centroids by averaging assigned points.

- Stops when centroids stabilize or max iterations reached.

- Applications: customer segmentation, image compression.

Q5: Explain Association Rule.

- Association rules identify relationships between variables.

- Used in market basket analysis (e.g., bread -> butter).

- Metrics:

- Support: Frequency of itemset.

- Confidence: Likelihood of item Y with X.

- Lift: Strength of association.

- Algorithm: Apriori.

- Helps in decision-making and recommendations.

Q6: Explain responsibilities of a Data Scientist.

- Collect and clean large datasets.

- Analyze data for meaningful patterns.

- Build predictive models using ML.

- Communicate findings via visualizations.

- Collaborate with stakeholders.

- Monitor and optimize models.

- Stay updated with new data science tools.

You might also like