Msbte UT 1 QB Answers
Msbte UT 1 QB Answers
4 Marks:-
1. Explain data science?
Data science is an interdisciplinary field that involves
using various techniques, algorithms, and systems to
analyze and interpret large sets of data to derive
insights and make informed decisions.
Here are 8 key points about data science:
1. Data Collection and Cleaning: Before any
analysis, data needs to be gathered from different
sources and cleaned to ensure accuracy. Raw data
often contains errors, missing values, or
inconsistencies that need to be addressed.
2. Exploratory Data Analysis (EDA): EDA involves
visualizing and summarizing data to understand
patterns, distributions, and relationships. This step
helps data scientists to uncover hidden insights
and decide on further analysis.
3. Statistical Analysis: Data science heavily relies
on statistics to make inferences, test hypotheses,
and understand the likelihood of certain events or
outcomes. This includes tools like regression
analysis, probability, and hypothesis testing.
4. Machine Learning: Machine learning (ML)
algorithms allow computers to learn from data,
identifying patterns without being explicitly
programmed. Common techniques include
classification, regression, clustering, and decision
trees.
5. Big Data Technologies: Handling vast amounts
of data requires specialized tools like Hadoop,
Spark, and cloud computing resources. These tools
allow for processing, storing, and analyzing data
that exceeds traditional computing power.
6. Data Visualization: Visual representations of
data, such as graphs, charts, and dashboards, are
used to communicate findings clearly. Visualization
helps stakeholders easily interpret complex data
insights.
7. Predictive Modeling: One of the primary goals of
data science is to predict future outcomes based
on historical data. This involves building models
that can forecast trends, behaviors, or risks with a
certain level of confidence.
8. Communication and Decision-Making: Data
science isn't just about analyzing data—it’s also
about communicating the findings to non-technical
stakeholders. Data scientists must be able to
explain their results clearly and help guide
business decisions.
Q.2 Explain analytics flow of big data?
The analytics flow for big data involves several key
stages:
a. Data Collection: Gather data from diverse sources
like sensors, social media, and logs.
b. Data Storage: Store large datasets in scalable
solutions like Hadoop or NoSQL databases.
c. Data Cleaning and Preprocessing: Clean and
transform data to ensure quality and usability.
d. Data Analysis: Conduct exploratory analysis and
summarize trends or patterns using descriptive
analytics.
e. Modeling and Machine Learning: Apply machine
learning techniques to build models that predict or
classify data.
f. Data Visualization and Reporting: Visualize insights
through dashboards and reports for better
decision-making.
g. Deployment and Integration: Deploy models into
production and integrate insights into business
systems.
h. Monitoring and Maintenance: Continuously monitor
models and data pipelines for accuracy and
performance.
This flow ensures that big data is processed,
analyzed, and turned into actionable insights
effectively using advanced technologies.
Q.3 Explain the big data collection process of big
data analytics with an example
Big data collection for analytics involves these key
steps:
1. Identify & Select: Define project goals and choose
relevant data sources (e.g., databases, social
media).
2. Acquire: Gather data using methods like web
scraping, APIs, or data streaming.
3. Store: Use distributed systems (e.g., Hadoop) for
secure and efficient storage.
4. Preprocess: Clean and transform data, handling
missing values and inconsistencies.
5. Govern: Implement policies for data access,
security, and privacy, ensuring compliance.
A retail example: a company wanting to analyze
customer behavior would collect data from
databases, websites, and social media, store it in
Hadoop, clean it, and then analyze it while adhering
to privacy regulations.