0% found this document useful (0 votes)
0 views

Data Science Overview

The document provides an overview of Data Science, defining it as an interdisciplinary field that extracts insights from data and listing applications such as predictive analytics and fraud detection. It outlines the Data Science project life cycle, which includes stages from problem definition to monitoring. Additionally, it covers web scraping, web crawling, text mining, and sentiment analysis, along with R programming examples for data extraction and database interaction.

Uploaded by

Rae
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
0 views

Data Science Overview

The document provides an overview of Data Science, defining it as an interdisciplinary field that extracts insights from data and listing applications such as predictive analytics and fraud detection. It outlines the Data Science project life cycle, which includes stages from problem definition to monitoring. Additionally, it covers web scraping, web crawling, text mining, and sentiment analysis, along with R programming examples for data extraction and database interaction.

Uploaded by

Rae
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Data Science Overview

answer

Here are the answers to the questions shown in the image, each suitable for a 5-mark
answer:

1. Explain the term Data Science. Write down applications of it.


Answer:
Data Science is an interdisciplinary field that uses scientific methods, algorithms, and
systems to extract knowledge and insights from structured and unstructured data. It
combines statistics, computer science, and domain expertise.
Applications:

Predictive analytics in healthcare (disease prediction)

Customer behavior analysis in marketing

Fraud detection in finance

Recommendation systems (e.g., Netflix, Amazon)

Autonomous driving systems

2. Explain the life cycle of Data Science Project.


Answer:
The Data Science life cycle includes the following stages:

1. Problem Definition: Understand the business problem.

2. Data Collection: Gather relevant data from various sources.

3. Data Cleaning: Handle missing values, outliers, and inconsistencies.

1/4
4. Data Exploration: Analyze patterns, trends, and relationships.

5. Feature Engineering: Create relevant variables for modeling.

6. Model Building: Apply machine learning algorithms.

7. Evaluation: Assess model performance using metrics.

8. Deployment: Implement the model in production.

9. Monitoring and Maintenance: Ensure model accuracy over time.

3. Describe about Web scraping and Web crawling.


Answer:
Web Scraping is the process of extracting data from websites using tools or scripts. It helps
gather specific information like product prices, reviews, etc.
Web Crawling is the automated process of browsing the web to index content from
websites, typically used by search engines.
Difference:

Scraping focuses on data extraction, while crawling focuses on link traversal.

Scraping targets specific pages; crawling spans across multiple sites/pages.

4. Write a R program to explain web scraping.


Answer:

# Load required libraries


library(rvest)

# URL of the webpage


url <- "https://example.com"

# Read HTML content


webpage <- read_html(url)

# Extract specific data, e.g., headings


headings <- html_text(html_nodes(webpage, "h2"))

2/4
# Print extracted headings
print(headings)

5. Write a R program to collect data from a table in MySQL Database.


Answer:

# Load required package


library(RMySQL)

# Connect to MySQL database


conn <- dbConnect(MySQL(), user='root', password='password', dbname='your_db',
host='localhost')

# Fetch data from table


data <- dbGetQuery(conn, "SELECT * FROM your_table")

# Print data
print(data)

# Disconnect
dbDisconnect(conn)

6. Write a R program to create and insert data in a table in MySQL Database.


Answer:

# Load package
library(RMySQL)

# Connect to database
conn <- dbConnect(MySQL(), user='root', password='password', dbname='your_db',
host='localhost')

3/4
# Create table
dbSendQuery(conn, "CREATE TABLE IF NOT EXISTS students (id INT, name VARCHAR(50))")

# Insert data
dbSendQuery(conn, "INSERT INTO students VALUES (1, 'John Doe'), (2, 'Jane Doe')")

# Disconnect
dbDisconnect(conn)

7. Write a short note on text mining.


Answer:
Text mining is the process of deriving meaningful information from text. It involves steps like
text preprocessing, tokenization, removing stop words, stemming, and applying NLP
techniques to extract patterns. Applications include spam detection, sentiment analysis, and
document categorization.

8. Write a short note on sentiment analysis.


Answer:
Sentiment analysis is a technique in text mining that determines the emotional tone behind
a body of text. It classifies text as positive, negative, or neutral. It's widely used in social
media monitoring, product review analysis, and customer feedback evaluation.

Let me know if you need these in a document format or as a printable PDF!

4/4

You might also like