Unit 1 Ds
Unit 1 Ds
The main goal of data science is to uncover patterns, extract meaningful information,
and generate actionable insights from data to aid in decision-making, solve
problems, and drive innovation. Data scientists utilize a range of tools, techniques,
and programming languages to collect, clean, analyze, and visualize data. They
apply statistical and machine learning models to make predictions, build data-driven
models, and uncover hidden patterns within the data.
5. Big Data: Large and complex datasets that cannot be easily managed,
processed, or analyzed using traditional data processing techniques. Big data
often involves high-volume, high-velocity, and high-variety data.
ACET
UNIT 1 DATA SCIENCE
10. Deep Learning: A subfield of machine learning that utilizes artificial neural
networks with multiple layers to learn and extract hierarchical representations
of data, often used for tasks such as image and speech recognition.
13. Clustering: A data exploration technique that involves grouping similar data
points or objects together based on their inherent similarities or patterns.
These are just a few examples of data science terminology. The field of data science
is vast and continuously evolving, so there are many more terms and concepts to
explore.
ACET
UNIT 1 DATA SCIENCE
1. Problem Definition: The first step is to clearly define the problem or question
you want to address. This involves understanding the business context,
identifying the objectives, and formulating a well-defined problem statement.
2. Data Acquisition: In this step, you gather the relevant data necessary to
solve the problem. This can involve obtaining data from various sources such
as databases, APIs, web scraping, or other data collection methods.
3. Data Cleaning and Preprocessing: Raw data often contains errors, missing
values, outliers, or inconsistencies. In this step, you clean and preprocess the
data to ensure its quality and suitability for analysis. This may include tasks
like handling missing values, removing duplicates, standardizing formats, and
transforming variables.
4. Exploratory Data Analysis (EDA): EDA involves exploring and analyzing the
data to gain insights and understand its characteristics. This can include
summarizing the data, visualizing distributions, identifying patterns, and
ACET
UNIT 1 DATA SCIENCE
7. Model Evaluation: Once the model is trained, you evaluate its performance
using appropriate evaluation metrics. This helps you assess how well the
model is performing and whether it meets the desired objectives. You may
need to fine-tune the model parameters or try different algorithms to improve
its performance.
8. Model Deployment: After the model has been evaluated and meets the
desired criteria, it can be deployed into a production environment. This step
involves integrating the model into an application, setting up data pipelines,
and ensuring the model can handle new data in real-time.
It's important to note that the data science process is often iterative, with feedback
and insights gained at each step influencing decisions made in previous steps. This
iterative nature allows for continuous improvement and refinement of the analysis.
1. Programming Languages:
ACET
UNIT 1 DATA SCIENCE
4. Data Visualization:
ACET
UNIT 1 DATA SCIENCE
Git: Git is a widely used version control system that helps manage
code and track changes. It allows collaboration among team members,
facilitates code sharing, and helps maintain project integrity.
8. Cloud Platforms:
ACET
UNIT 1 DATA SCIENCE
Types of Data
Qualitative or Categorical Data is data that can’t be measured or counted in the form
of numbers. These types of data are sorted by category, not by number. That’s why it
is also known as Categorical Data. These data consist of audio, images, symbols, or
text.
Colours
Nominal Data
Nominal Data is used to label variables without any order or quantitative value. The
color of hair can be considered nominal data, as one color can’t be compared with
another color.
ACET
UNIT 1 DATA SCIENCE
Ordinal Data
Ordinal data have natural ordering where a number is present in some kind of order
by their position on the scale. These data are used for observation like customer
satisfaction, happiness, etc., but we can’t do any arithmetical tasks on them.
Quantitative Data
Discrete Data
The term discrete means distinct or separate. The discrete data contain the values
that fall under integers or whole numbers. The total number of students in a class is
an example of discrete data. These data can’t be broken into decimal or fraction
values.
The discrete data are countable and have finite values; their subdivision is not
possible. These data are represented mainly by a bar graph, number line, or
frequency table.
ACET
UNIT 1 DATA SCIENCE
Continuous Data
Continuous data are in the form of fractional numbers. It can be the version of an
android phone, the height of a person, the length of an object, etc. Continuous data
represents information that can be divided into smaller levels. The continuous
variable can take any value within a range.
Height of a person
Speed of a vehicle
“Time-taken” to finish the work
Wi-Fi Frequency
Market share price
Data science has a wide range of applications across various industries and
domains. Here are some common and prominent applications of data science:
ACET
UNIT 1 DATA SCIENCE
5. Financial Analysis: In finance, data science is used for fraud detection, credit
risk assessment, algorithmic trading, portfolio optimization, and customer
segmentation for personalized financial services.
7. Social Media Analysis: Data science is used to analyze social media data,
understand user sentiment, track trends, and identify influencers, which helps
businesses with their marketing and reputation management.
11. Image and Speech Recognition: Data science techniques are employed to
develop image recognition systems used in self-driving cars, security
surveillance, and medical imaging, as well as speech recognition applications
like virtual assistants.
12. Sports Analytics: In sports, data science is used for player performance
analysis, game strategy optimization, injury prediction, and fan engagement.
ACET