0% found this document useful (0 votes)
2 views

Lecture 2-Quick Overview of Data Science

Uploaded by

BHAWANI KUMARI
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Lecture 2-Quick Overview of Data Science

Uploaded by

BHAWANI KUMARI
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

By

Dr. Aditya
Bhardwaj
Quick Overview of
Data Science

[email protected]

Big Data Analytics and Business Intelligence (CSET/CMCA-580)


Data Growth Over the Years

The need for data science arises from the growing volume, variety, and velocity of data
generated by various industries
Data Storage Units
What Happens on Internet in one Minute ?
What is Data Science?
• Data science combines math and
statistics, specialized programming,
advanced analytics, artificial intelligence
(AI) and machine learning with specific
subject matter expertise to uncover
actionable insights hidden in an
organization’s data.

• Data science is a multidisciplinary field


focused on discovering actionable
insights from large sets of raw
(unstructured) and structured data.
• These insights can be used to guide
decision making and strategic
6/24
But, Why Do We Need Data Science?
Data Engineer vs Data Scientist vs Data Analyst

8/24
Data Engineer vs Data Scientist vs Data Analyst
1. Data Engineers:
•Role: Data engineers focus on building and maintaining the infrastructure that allows data to
be collected, stored, and accessed. They design and implement data pipelines, manage data
warehouses, and ensure data quality and reliability.
•Use of Big Data Tools: They use big data tools to process and transform large datasets.
Common tools include Hadoop, Spark, Kafka, and various cloud services like AWS, Azure, and
Google Cloud. Data engineers set up the architecture that allows for efficient storage and
retrieval of big data.

2. Data Scientists:
•Role: Data scientists analyze and interpret complex data to provide actionable insights. They
build and validate predictive models, perform statistical analyses, and apply machine learning
algorithms.
•Use of Big Data Tools: Data scientists use big data analytics tools to manipulate and analyze
large datasets. They use tools like Hadoop, Spark, TensorFlow, and various machine learning
libraries in Python or R. They may also use data visualization tools to present findings. Their
focus is on deriving insights and making predictions from data. 9/24
Data Engineer vs Data Scientist vs Data Analyst
3. Data Analysts:
•Role: Data analysts focus on interpreting data and generating reports to support decision-
making. They often work with structured data and are involved in data visualisation,
dashboard creation, and reporting.
•Use of Big Data Tools: Data analysts may use big data tools to access and query large
datasets, especially if their role involves working with large volumes of data. They commonly
use tools like SQL, Excel, Tableau, and Power BI for data analysis and visualisation

10/24
Quiz- Data Engineer vs Data Scientist vs Data Analyst
1. Analyze data to identify patterns and trends to predict future
outcomes.
2. Analyze data to summarize the past in visual form

3. Preparing the solution that data scientists use for their work.-

Sol.
a) Data Scientist
b) Data Analyst
c) Data Engineer

11/24
Real life usage of Data Science
• Air Conditioner: Indoor Air Quality Monitoring: Some advanced air
conditioners incorporate sensors to measure indoor air quality parameters
like humidity, CO2 levels, and pollutants. Data science algorithms can
analyze this data to provide insights and recommendations for improving
indoor air quality.

• Smart TV: Voice andData on its own is useless unless you can
Gesture Recognition: Smart TVs equipped with voice
make sense of it!
or gesture recognition features use machine learning algorithms to
interpret user commands accurately and provide a seamless user
experience.

• Content Search and Organization: Data science techniques can be utilized


to improve content search algorithms, making it easier for users to find and
organize their favorite shows, movies, and streaming services.
Future DataScience
 Image Recognition - As more and more data are accumulated by a company, its clarity increases. For
example, think about an automated vehicle, a Tesla, a self-driving car. How do you think it detects the road?
When many people drive on the same route over and over, the image of this road becomes more precise. This
better image will make the drive for the next person on the same route more comfortable.

 Health care advancements- With an increased patient database, the health care system will recognize any
deficiency quickly, which can help the government immediately mitigate the oncoming health crises.

 Weather forecasting- With enough previous years' data and powerful analysis tools, predicting oncoming
storms could be possible soon, saving hundreds of lives and minimizing property loss.

 Fraud Detection- If algorithms and AI tools are in place, fraudulent transactions are rectified instantly. Such
activities can also be shut down if that is the problem taken into consideration by an AI.

 Gaming- Video games have become at par with sports nowadays. The user experience is personalized when
more and more data is collected. The habits, likes, and dislikes of a person can be taken care of when this data
is collected.

 Logistics- AI systems have already become advanced, like Google Maps telling us which route to take or avoid
due to traffic. This system can become more potent, and different problems like road accidents can be taken
care of too.

 Recommendation systems- Entertainment industry has already benefited from all the data collection they
Big Data Analytics Cloud Services
1. Amazon
•Tools:
• Amazon Redshift
• Amazon EMR (Elastic MapReduce)
• AWS Lambda
• Amazon Kinesis
• Amazon Athena
2. Google
•Tools:
• Google BigQuery Data on its own is useless unless you can
• Google Cloud Dataflow make sense of it!
• Google Cloud Dataproc
• Google Cloud Pub/Sub
• Google AI Platform
3. Microsoft
•Tools:
• Azure Synapse Analytics
• Azure HDInsight
• Azure Databricks
• Azure Stream Analytics
• Azure Data Lake Storage
Thank You

You might also like