Introduction to Datascience (en)
Introduction to Datascience (en)
HCM
KHOA CÔNG NGHỆ THÔNG TIN
Chapter 1
Introduction to Data Science.
Careers in Data Science
1
Contents
1. Introduction to Data Science
2. Application of Data Science
3. Jobs Roles in the World of Data Science
4. Development trends
5. Demand for IT human resources and career opportunities for IT
6. Computer Ethics
2
The Roots of Data Science
3
What is data science?
4
Data Science Lifecycle
The data science lifecycle revolves around
using machine learning and various
analytics strategies to generate insights
and predictions from information to
achieve the goals of a commercial
enterprise.
5
Data and Data Science
6
Data and Data Science
7
Applications of Data Science
Data science has become a go-to term for every industry, starting from eCommerce to healthcare and
transportation.
8
source from DataFlair
Applications of Data Science
9
source from DataFlair
Data Science in Banking
Banking is one of the biggest applications of Data Science. It helps in keeping up with the
competition and providing better services to their customers.
Data Science in banking plays a crucial role in various banking activities like fraud detection,
developing recommendation engines, providing efficient customer support services, etc.
10
Data Science in Finance
11
Data science applications in the manufacturing industry
In the manufacturing industry, Data Science mainly provides insights to maximize profits, minimize risks
and evaluate productivity, with the aim of providing more flexibility to customers, suppliers and business
12
partners, increasing the scale and speed of development as well as competition of companies.
Data Science in Transport
13
Data Science in Healthcare
14
Data Science in Healthcare
Drug Discovery: Today, the pharmaceutical industry depends
data, use patient information to develop models and improve the drug
discovery process.
16
Predictions about the future of Data Science
With cloud deployment and data analytics, data science
5. Data Science is moving into an era of
has made it easy to access data through serverless
becoming a team activity. It speaks not about
technology. More data scientists focus on using the hybrid
creating a model, but what would you use it for
cloud to solve complex business concerns at a faster pace.
once you build it. Data Science will grow more
Natural Language Processing (NLP), Artificial
conscious of the increased cybersecurity threats
Intelligence (AI), IoT, and ML algorithms in conjunction
with data science have been helping the business solve 6. Data Scientists will face a growing Cloud
huge datasets and empower human-machine interactions. Computing prevalence
1. The tasks of Data Scientists hired to augment 7. Data Scientist’s jobs become more
business processes could be automated in the near future operationalized with advanced tools to capture their
workflows and train enterprise on their best
2. Data Science will incorporate concepts from various practices
fields like sociology and psychology– it will soon become 8. Coding and AI skills will become more
interdisciplinary essential, and data scientists need to be more
business-minded
3. Social Media and other online platforms will become
the source for the collection of more data
17
Jobs Roles in the World of Data science
18
Data Scientist
Data scientists are professionals responsible for Data Scientist Skills
collecting, analyzing, and interpreting large amounts
of data. - Understanding of linear algebra and
multivariable calculus
The data scientist's major role is to deliver effective - Programming knowledge in Python, R,
actionable business insights based on their study of the Scala
data. This can be achieved by identifying patterns, - Knowledge of probability and statistics
anomalies, or trends in the data to implement the best - Advanced hands-on experience with
practices an organization can follow for profitable AI/ML development frameworks
business decision-making. - Understanding of relational database
management systems like SQL
The data scientist's roles and responsibilities are as - Knowledge of NLP algorithms and
follows: approaches
- Identifying the necessary data sets and variables - Excellent analytics and visualization skills
for the analysis - Experience in Deep Learning frameworks
- Collecting large data sets from various sources (e.g., TensorFlow)
- Perform predictive analysis - Strong communication and presentation
- Searching for patterns and trends in data that skills
might impact the business’s trajectory
- Using data visualization tools and approaches to
create charts and dashboards.
- Collaborating with IT and business teams.
19
Data Analyst
Data Analysts are responsible for collecting massive
amounts of data, preparing, transforming, managing,
processing, and visualizing the data for business
growth.
They often deal with big data (structured, unstructured,
and semi-structured) to generate reports to identify
patterns, gain valuable insights, and produce
visualizations easily deciphered by stakeholders and
non-technical business users.
Roles and Responsibilities:
- Conduct surveys to collect raw data.
Data Analyst Skills:
- Employ automated techniques to extract data
from primary and secondary data sources The essential skills to become a data analyst are:
- Analyze data and present it in the form of graphs Programming language skills in R/Python
and reports.
- Use statistical methodologies and procedures to Knowledge of relational database systems and
proficiency with SQL
make reports
- Work with online database systems Experience with data analysis and extraction from
- Improve data collection and quality procedures in many sources
collaboration with the rest of the team Tableau's experience with distributed computing and
20
massive data collections
Data Engineer
Data engineers are responsible for developing,
constructing, and managing data pipelines. They create
and test scalable big data ecosystems for businesses so
that most data scientists and machine learning
engineers may run their algorithms on reliable and
well-optimized data platforms. Data engineers also
process collected data in batches and match its format
to the stored data.
Roles and Responsibilities:
- The day-to-day tasks of a data engineer are as
follows:
- Using data to identify hidden patterns and predict
trends
- Creating reports and providing updates to
stakeholders based on data analytics. Key Skills :
- Developing technological solutions in
collaboration with data architects to increase data - Knowledge of at least one programming language,
accessibility and consumption. such as Python
- Evaluating company requirements and providing - Understanding of data modeling for both big data
solutions appropriately. and data warehousing
- Creating dashboards and tools for business users - Experience with Big Data tools (Hadoop Stack
based on analysis by data analysts and data such as HDFS, M/R, Hive, Pig, etc.)
scientists. - Ability to write, analyze, and debug SQL queries
- Building and maintaining data pipelines
- Solid understanding of ETL (Extract, Transfer,
Load) tools, NoSQL, Apache Spark System, and
21 relational DBMS.
Data Architect
24
Data Analytics Manager
The data analytics manager coordinates the various
tasks that their team must complete for a big data
project. Tasks may include researching and developing
effective data collection techniques, evaluating data,
and offering solutions to a firm. They also help data
science professionals to execute projects on time.
Roles and Responsibilities:
- The day-to-day responsibilities of the data and
analytic manager are as follows:
- Developing techniques for data analysis
- Analytics solution research and implementation
- Ensure the quality of all data analytics activities Key Skills
- Develop technologies and methods that convert
raw data into valuable business insights - Programming language proficiency (R, Python,
- Stay up to date with industry news and trends. Java, etc.)
- Manage the team of data analysts - Understanding of SQL and NoSQL database
systems
- Knowledge of data visualization tools and
operating systems
- Data mining
- Experience with machine learning
- Data Architecture
25
Business Analyst
Business analysts' tasks differ slightly from those of
other job roles. While they have an excellent grasp of
how data-oriented technologies function and manage
enormous amounts of data, they also filter the high-
quality data from the low-quality data. In other words,
they show how Big Data may be connected to
actionable business insights for a company's success.
2. Cloud Computing
Cloud computing is the delivery of computing
services—including servers, storage, databases,
networking, software, analytics, and intelligence—
over the Internet (“the cloud”) to offer faster
innovation, flexible resources, and economies of scale.
Types of cloud services: IaaS, PaaS, serverless, and
SaaS
34
Development trends
3. Big Data Analytics
The use of advanced analytic techniques against very
large, diverse datasets that include structured, semi-
structured and unstructured data, from different
sources, and in different sizes from terabytes to
zettabytes.
4. Blockchain
A blockchain is essentially a distributed database of
records or public ledger of all transactions or digital
events that have been executed and shared among
participating parties. Each transaction in the public
ledger is verified by consensus of a majority of the
participants in the system. And, once entered,
information can never be erased. The blockchain
contains a certain and verifiable record of every single
transaction ever made.
35
Development trends
5. Cyber Security
36
Demand for IT human resources
Vietnam is having a promising IT human resource in both quality and quantity
→ The advantage of an ideal software outsourcing destination
37
Demand for IT human resources
38
Demand for IT human resources
39
Demand for IT human resources
source: TopDev
40
Career Opportunities for IT
41
IT Revolution
IT has begun to affect (in both good and bad ways) Intellectual Property
community life, family life, human relationships,... • What is intellectual property?
• IT has altered many aspects of life - in banking and • The rights of creative workers in literary, artistic,
commerce, work and employment, medical care, industrial and scientific fields which can be protected
national defense, transportation and entertainment,... either by copyright, trademarks, patents, etc.
• New Technology → New Risks • Creative work: books, articles, plays, songs, works
of art, movies, and software.
Issues of using computer • Challenges of New Technologies
Privacy and personal information • The Soft copy!
• Theft of information • Storage media, Scanners.
• Inadvertent leakage of information
• Being tracked, followed, watched (Sophisticated Computer Crime
tools for surveillance and data analysis - GPS tracking • Hacking
device, Eagle eye) • Online scams
• Invisible information gathering → Secondary use • Auction Fraud
• Computer profiling • Forgery
• Selling your customer data • Identity theft and fraud
Freedom Of Speech • Phishing
• Fake and Malicious information • Vishing
• Anonymous, offensive speech • Pharming
• E-mail spam • Using job-hunting sites.
• Spyware • Trojan horse
• Phishing 42
• ....
Computer Ethics
Ten Commandments of Ethics
• Computer ethics is a new branch of ethics
that is growing and changing rapidly as 1. Not use a computer to harm other people.
computer technology also grows and develops.
2. Not interfere with other people's computer work.
3. Not snoop around in other people's computer files
• Computer ethics identifies and analyzes the
impacts of information technology upon human 4. Not use a computer to steal
values like health, opportunity, freedom,
privacy,... 5. Not use a computer to bear false witness
Moor 6. Not copy or use proprietary software for which you have
not paid (without permission)
7. Not use other people's computer resources without
authorization or proper compensation
8. Not appropriate other people's intellectual output
9. Think about the social consequences of the program you
are writing or the system you are designing
10. Always use a computer in ways that ensure consideration
and respect for your fellow humans
43
Q&A
44