0% found this document useful (0 votes)
11 views

Introduction to Datascience (en)

The document provides an overview of data science, its applications across various industries, and the roles and responsibilities of professionals in the field. It highlights the importance of data science in sectors like banking, finance, healthcare, and e-commerce, and discusses the skills required for different job roles such as data scientist, data analyst, and machine learning engineer. Additionally, the document outlines future trends in data science and the growing demand for IT human resources.

Uploaded by

vqd2k6
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Introduction to Datascience (en)

The document provides an overview of data science, its applications across various industries, and the roles and responsibilities of professionals in the field. It highlights the importance of data science in sectors like banking, finance, healthcare, and e-commerce, and discusses the skills required for different job roles such as data scientist, data analyst, and machine learning engineer. Additionally, the document outlines future trends in data science and the growing demand for IT human resources.

Uploaded by

vqd2k6
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

TRƯỜNG ĐẠI HỌC GIAO THÔNG VẬN TẢI TP.

HCM
KHOA CÔNG NGHỆ THÔNG TIN

Chapter 1
Introduction to Data Science.
Careers in Data Science

1
Contents
1. Introduction to Data Science
2. Application of Data Science
3. Jobs Roles in the World of Data Science
4. Development trends
5. Demand for IT human resources and career opportunities for IT
6. Computer Ethics

2
The Roots of Data Science

3
What is data science?

Data science is the study of data to extract


meaningful insights for business.
It is a multidisciplinary approach that
combines principles and practices from the fields of
mathematics, statistics, artificial intelligence, and
computer engineering to analyze large amounts of
data.
This analysis helps data scientists to ask and
answer questions like what happened, why it
happened, what will happen, and what can be done
with the results.
Data Science Components: Statistics, Domain
expertise, Data engineering, Visualization,
Advanced computing, Mathematics, Machine
learning

4
Data Science Lifecycle
The data science lifecycle revolves around
using machine learning and various
analytics strategies to generate insights
and predictions from information to
achieve the goals of a commercial
enterprise.

The complete methodology includes


several steps such as data cleaning,
preparation, modeling, model evaluation,
etc.

This is a lengthy process and can take


quite a while to complete.

5
Data and Data Science

6
Data and Data Science

7
Applications of Data Science
Data science has become a go-to term for every industry, starting from eCommerce to healthcare and
transportation.

8
source from DataFlair
Applications of Data Science

9
source from DataFlair
Data Science in Banking

Banking is one of the biggest applications of Data Science. It helps in keeping up with the
competition and providing better services to their customers.
Data Science in banking plays a crucial role in various banking activities like fraud detection,
developing recommendation engines, providing efficient customer support services, etc.

10
Data Science in Finance

Data lies at the core of any business. But at the


same time, this data is of no use if we do not
know how to extract information out of it and
how to apply that information to solve our
problems.

Data Science in finance opens a number of


doors for the Financial Industries to reshape
their business. It is helping the financial sector
with risk management, fraud detection,
improving personalization, and much more.

11
Data science applications in the manufacturing industry

In the manufacturing industry, Data Science mainly provides insights to maximize profits, minimize risks
and evaluate productivity, with the aim of providing more flexibility to customers, suppliers and business
12
partners, increasing the scale and speed of development as well as competition of companies.
Data Science in Transport

- Data science is actively assisting in creating a


safer driving environment for drivers.
- DS has been actively increasing diversity with
the introduction of self-driving cars.
- In freight transport, DS helps optimize delivery
routes and provide reasonable resource
allocation.
- Companies (like Uber, Grab, etc.) can optimize
service prices and provide better experiences to
their customers by using prediction tools in DS to
predict parameters such as weather, shipping
capacity, customers, etc.

13
Data Science in Healthcare

Predictive analysis: Predictive analytics models

generate predictions about a patient's condition. It

analyzes various correlations between symptoms,

behaviors and diseases and makes relevant

predictions, making it possible to identify potential

risks before they appear.

Medical image analysis: Medical imaging is the

healthcare industry's most popular application of

data science. With the advent of deep learning, it is

now possible to identify defects in the human body

from X-ray, MRI and CT scan images and help

doctors develop successful treatment options.

14
Data Science in Healthcare
Drug Discovery: Today, the pharmaceutical industry depends

significantly on DS to deliver better medicines to patients. They collect

data, use patient information to develop models and improve the drug

discovery process.

Genomics: Before the development of data analysis techniques,

genomic research was a time-consuming task. Data science in healthcare

has made it more effortless and easier to study DNA sequences to

discover the relationship between the parameters within it and

genetically transmitted diseases.

Monitoring patient health: IoT devices are being used by some

patients and clinicians as heart rate and temperature monitors. Data

science in healthcare collects data and analyzes it and creates analytical

tools that enable doctors to monitor a patient's blood pressure, circadian


15
cycle, and calorie intake.
Data Science in E-Commerce
E-commerce and retail industries have been hugely benefited by
data science.
Usage of DS in E-Commerce:
- Predicting customer churn
- Customer Sentiment Analysis
- Forecasting demand
- Inventory Management
- Market Basket Analysis
- Warranty Analytics
- Fraud detection
- Recommendation systems
- Price optimization
- Improvisation of personalized customer service
Data science in the e-commerce sector is arguably one of the most
promising technologies today. The E-Commerce market is
booming and data science holds the possibility of improving
consumer shopping habits, which will benefit e-commerce
companies by upgrading their marketing strategies and
profitability.

16
Predictions about the future of Data Science
With cloud deployment and data analytics, data science
5. Data Science is moving into an era of
has made it easy to access data through serverless
becoming a team activity. It speaks not about
technology. More data scientists focus on using the hybrid
creating a model, but what would you use it for
cloud to solve complex business concerns at a faster pace.
once you build it. Data Science will grow more
Natural Language Processing (NLP), Artificial
conscious of the increased cybersecurity threats
Intelligence (AI), IoT, and ML algorithms in conjunction
with data science have been helping the business solve 6. Data Scientists will face a growing Cloud
huge datasets and empower human-machine interactions. Computing prevalence
1. The tasks of Data Scientists hired to augment 7. Data Scientist’s jobs become more
business processes could be automated in the near future operationalized with advanced tools to capture their
workflows and train enterprise on their best
2. Data Science will incorporate concepts from various practices
fields like sociology and psychology– it will soon become 8. Coding and AI skills will become more
interdisciplinary essential, and data scientists need to be more
business-minded
3. Social Media and other online platforms will become
the source for the collection of more data

4. Data Science will help businesses predict the


consumer behavior

17
Jobs Roles in the World of Data science

18
Data Scientist
Data scientists are professionals responsible for Data Scientist Skills
collecting, analyzing, and interpreting large amounts
of data. - Understanding of linear algebra and
multivariable calculus
The data scientist's major role is to deliver effective - Programming knowledge in Python, R,
actionable business insights based on their study of the Scala
data. This can be achieved by identifying patterns, - Knowledge of probability and statistics
anomalies, or trends in the data to implement the best - Advanced hands-on experience with
practices an organization can follow for profitable AI/ML development frameworks
business decision-making. - Understanding of relational database
management systems like SQL
The data scientist's roles and responsibilities are as - Knowledge of NLP algorithms and
follows: approaches
- Identifying the necessary data sets and variables - Excellent analytics and visualization skills
for the analysis - Experience in Deep Learning frameworks
- Collecting large data sets from various sources (e.g., TensorFlow)
- Perform predictive analysis - Strong communication and presentation
- Searching for patterns and trends in data that skills
might impact the business’s trajectory
- Using data visualization tools and approaches to
create charts and dashboards.
- Collaborating with IT and business teams.

19
Data Analyst
Data Analysts are responsible for collecting massive
amounts of data, preparing, transforming, managing,
processing, and visualizing the data for business
growth.
They often deal with big data (structured, unstructured,
and semi-structured) to generate reports to identify
patterns, gain valuable insights, and produce
visualizations easily deciphered by stakeholders and
non-technical business users.
Roles and Responsibilities:
- Conduct surveys to collect raw data.
Data Analyst Skills:
- Employ automated techniques to extract data
from primary and secondary data sources The essential skills to become a data analyst are:
- Analyze data and present it in the form of graphs Programming language skills in R/Python
and reports.
- Use statistical methodologies and procedures to Knowledge of relational database systems and
proficiency with SQL
make reports
- Work with online database systems Experience with data analysis and extraction from
- Improve data collection and quality procedures in many sources
collaboration with the rest of the team Tableau's experience with distributed computing and
20
massive data collections
Data Engineer
Data engineers are responsible for developing,
constructing, and managing data pipelines. They create
and test scalable big data ecosystems for businesses so
that most data scientists and machine learning
engineers may run their algorithms on reliable and
well-optimized data platforms. Data engineers also
process collected data in batches and match its format
to the stored data.
Roles and Responsibilities:
- The day-to-day tasks of a data engineer are as
follows:
- Using data to identify hidden patterns and predict
trends
- Creating reports and providing updates to
stakeholders based on data analytics. Key Skills :
- Developing technological solutions in
collaboration with data architects to increase data - Knowledge of at least one programming language,
accessibility and consumption. such as Python
- Evaluating company requirements and providing - Understanding of data modeling for both big data
solutions appropriately. and data warehousing
- Creating dashboards and tools for business users - Experience with Big Data tools (Hadoop Stack
based on analysis by data analysts and data such as HDFS, M/R, Hive, Pig, etc.)
scientists. - Ability to write, analyze, and debug SQL queries
- Building and maintaining data pipelines
- Solid understanding of ETL (Extract, Transfer,
Load) tools, NoSQL, Apache Spark System, and
21 relational DBMS.
Data Architect

A data architect develops the systems and tools that data


scientists, analysts, machine learning engineers, and
artificial intelligence experts utilize. They comprehend the
requirements of professionals, design systems, deploy new
architectures, and remain ahead of regulatory compliance.
Data architects are also responsible for design patterns, data
modeling, service-oriented integration, and business
intelligence. They frequently collaborate with data scientists
and IT professionals to achieve the company's data strategy
goals.
Key Skills :
Roles and Responsibilities:
- Solid understanding of programming
- Creating and implementing an overall data strategy as
per the requirements of the business/organization languages like Java, Python, R, or SQL.
- Auditing the performance of data management - Experience with data warehouses, data
systems regularly and making changes to improve governance, and big data analytics.
systems - Excellent command of visualization tools
- Working with cross-functional teams and stakeholders (for example, Tableau).
to ensure that database systems run smoothly. - Understanding of data retention principles
- Explain the complex technical issues to the and practices
department's non-technical staff. - Data flow and integration automation
- Ensuring the accessibility and accuracy of data
acquired by data analysts and data scientists.
22
Machine Learning Engineer

An ML Engineer is a highly experienced - Enhancing already used ML frameworks


programmer who focuses on creating AI systems to and libraries
undertake research and design algorithms for predictive - Improving the performance of ML models
analysis. They do it using big datasets acquired by data by changing various parameters.
analysts and data scientists. They develop highly
efficient machine learning models to assist data Key Skills :
scientists in assessing, analyzing, and organizing large
amounts of data. - Strong mathematical and statistical
foundation.
Roles and Responsibilities: - A solid grasp of natural language processing.
- Deep expertise in technologies like Python,
- Conducting research for machine learning
algorithms Java, SQL, Scala, or C++.
- Designing, building, and testing machine learning - Query processing data sets, building
systems regression models, and creating and testing
- Examining and presenting data to facilitate hypotheses.
understanding 23
- Understanding of various machine learning
algorithms.
Database Administrator
Database administrator manages the database of an
organization. They are responsible for continuously
monitoring the database to guarantee efficient
functioning, data security, user access, and permissions
to databases. In addition, they are also responsible for
data availability by performing frequent backups,
retrieving data when necessary, and testing databases to
ensure their reliable operation.
Roles and Responsibilities:
- The day-to-day responsibilities of a data
administrator include:
- Work on design and development of database
- Maintain and safeguard sensitive business data in Key Skills:
collaboration with the IT security team.
- Build database software to store and manage data. - Excellent knowledge of SQL
- Data archiving - Understanding of database backup, recovery,
- Work in collaboration with programmers, project security, and design
managers, and other team members - Proficient with at least one database
- Make the essential data available and accessible management system, such as IBM DB2, Oracle,
using cloud servers. Microsoft SQL Server, or MySQL
- Solid problem-solving and analytical skills

24
Data Analytics Manager
The data analytics manager coordinates the various
tasks that their team must complete for a big data
project. Tasks may include researching and developing
effective data collection techniques, evaluating data,
and offering solutions to a firm. They also help data
science professionals to execute projects on time.
Roles and Responsibilities:
- The day-to-day responsibilities of the data and
analytic manager are as follows:
- Developing techniques for data analysis
- Analytics solution research and implementation
- Ensure the quality of all data analytics activities Key Skills
- Develop technologies and methods that convert
raw data into valuable business insights - Programming language proficiency (R, Python,
- Stay up to date with industry news and trends. Java, etc.)
- Manage the team of data analysts - Understanding of SQL and NoSQL database
systems
- Knowledge of data visualization tools and
operating systems
- Data mining
- Experience with machine learning
- Data Architecture
25
Business Analyst
Business analysts' tasks differ slightly from those of
other job roles. While they have an excellent grasp of
how data-oriented technologies function and manage
enormous amounts of data, they also filter the high-
quality data from the low-quality data. In other words,
they show how Big Data may be connected to
actionable business insights for a company's success.

Roles and Responsibilities:


- Evaluate, create, and implement various Key Skills:
innovative technologies and processes
- Work to enhance existing business procedures. - Excellent skills in programming languages like
- Understand the organization's operations Python, SAS, R, and Java.
- Carry out a thorough analysis of the firm, - Knowledge of business
identifying issues, opportunities, and solutions - Understanding of data visualization tools such
- Perform tasks such as data mining, data cleansing, as Power BI and Tableau
and preparing reports. - Expertise in tools such as MS PowerPoint and
- Budget analysis MS Excel for documentation purposes
- Execute quality assurance - Excellent critical thinking, problem-solving,
- Share significant discoveries and ideas with the and decision-making skills
product team. - Knowledge of statistics and probability
- Familiarity with SQL, BPMN, Microsoft
26 Vision
Business Intelligence (BI) Analyst
Business intelligence analysts assist a company
in making better decisions by leveraging data.
They work with business teams to understand
the client’s requirements, design BI products
such as dashboards and infographics,
communicate to stakeholders how to use a
business intelligence tool, and even assist in
developing BI products.
Roles and Responsibilities:
- Analyze similar products, markets, or
trends, to prepare market strategy
- Use business intelligence tools and data to
monitor both existing and future customers
- Assemble business insights from a range
of sources such as company data, public
data, market, and industry reports Key Skills:
- Implement better business processes to
improve sales of already existing products - Advanced SQL skills
- Report data findings to the team of - Programming skills - Python or R
stakeholders. - Excellent with data visualization tools such as Power
BI and Tableau
- Strong reporting skills to present data insights
27 - Familiarity with cloud technology
- Data mining and analysis
Natural Language Processing (NLP) Engineer

An NLP Engineer applies machine learning principles


to enable computers to understand textual data. A data
science professional specializing in NLP creates
software to comprehend human language or other
natural language data.
Roles and Responsibilities:
- Design and develop NLP applications
- Perform statistical analysis and improve machine
learning models
- Create NLP systems as per the client's
requirements
- Select and use appropriate tools for NLP projects Key Skills :
- Establish suitable datasets for language learning - Excellent Programming skills in Python, Java,
- Deal with AI voice recognition and speech or R
patterns - Solid understanding of text representation
methods, statistics, and algorithms
- Knowledge of machine learning frameworks
and libraries
- Familiarity with Big Data frameworks
- Semantic and Syntactic Parsing
- Familiarity with CI/CD workflows
- Excellent skills in statistical analysis, text
28 classification, and Clustering
Statistician

Statisticians are responsible for extracting valuable


insights from data and have a solid foundation in
statistical theories, techniques, and data structure.
They gather, organize, analyze, and evaluate data.
Roles and Responsibilities:
- Data collection, analysis, and interpretation
- Using statistical methodologies/tools for data
analysis, evaluating outcomes, and forecasting
trends.
- Create and manage databases with statistical
software such as SPSS, SAS, or Stata. Key Skills:
- Documenting procedures and staying up to date - Excellent knowledge of R, Python, SQL, and
on technical advances in statistical analysis. MATLAB.
- Utilize statistical tools, algorithms, and - Expertise in statistical theories, machine learning
computations to analyze data. methods, and database management models.
- Present statistical findings using charts, tables, - Proficiency with statistical software, such as SPSS.
and graphs. - Ability to understand data and analyze trends, as
well as prepare industry outlooks and projections
- Ability to communicate with other departments to
coordinate data collection
- Expertise in company operations and industry
knowledge
29
Data Storyteller

Data storytellers visualize data, create reports, search


for narratives that best characterize data, and design
innovative methods to convey that narrative.
Data storytelling is a creative job role that falls in
between data analysis and human-centered
communication. They reduce the data to focus on a
certain feature, evaluate the behavior, and create a
story that assists others in better understanding
business trends.
Roles and Responsibilities:
- Illustrate data, create reports, discover
narratives that best represent the facts, and
design innovative methods to communicate that Key Skills:
story.
- Assist a company in determining stories that - Excellent data visualization skills
may be conveyed using data - reporting, - Expertise in tools such as Microsoft Excel and
explanatory, prediction, causation, and Powerpoint
correlation stories. - Knowledge of BI and design tools
- Use narratives to build engaging stories for your - Communication skills
businesses. - Ability to craft a meaningful narrative from the
- Present the outcomes of the data analysis given dataset.
clearly and concisely.
30
Data Mining Expert

A data mining professional identifies and


extracts patterns from massive amounts of data.
They transform enormous amounts of raw data
into valuable insights. This process of
identifying correlations and abnormalities is
critical in predictive analytics and machine
learning algorithms.

Roles and Responsibilities:


- Create and implement data queries in
response to business user requirements.
- Create data models and methods for mining Key Skills:
production databases in collaboration with
data owners and department managers. - In-depth knowledge of data mining models, structures,
- Provide and implement best practices for theories, concepts, and methods.
quality assurance for data mining/analysis - Technical knowledge of relational database systems
services across the enterprise. - Knowledge of data management strategies.
- Create, develop, and manage change - Hands-on database optimization and troubleshooting
control and testing methods for data model knowledge.
updates. - Understanding of data preparation, processing,
- Identify the network components necessary categorization, and forecasting.
to provide data access and data consistency - Understanding of appropriate data privacy procedures
and integrity.
- Create routines for end users to help them and legislation
use data mining technologies in the best31
- Thorough understanding of data modeling
way possible. technologies
Data Science Manager
The data science manager is responsible for
assisting the company in using data by working
with a team of data scientists and engineers to give
valuable direction and insight to management for
making well-informed decisions. They also use
data to provide strategic direction and knowledge
needed to identify and answer significant business
problems.

Roles and Responsibilities:


- Lead the team of data scientists Key Skills:
- Apply in-depth knowledge of data science
methodologies and libraries to solve business - Solid understanding of product/web analytics tools
challenges. and methodologies, including Python, machine
- Manage large-scale projects that make use of learning, and the Google Cloud Data platform
data transformation and machine learning
models. - Expertise in using big data technologies and script
- Monitor the team for the completion of all languages such as Python, Java, and C/C++
project deliverables. - Broad understanding of advanced statistical,
- Develop new projects and enhance current machine learning, and/or data mining methods
processes for the whole data science team in
collaboration with the management team. - Excellent communication, team building, and
- Ensure data quality across the organization leadership skills
- Provide suggestions for developing plans, - Advanced quantitative and analysis skills
initiatives, strategies, policies, and budgets.
32
Artificial Intelligence Engineer
Artificial intelligence (AI) engineers develop, test, and
deploy AI models using machine learning methods such as
random forest, logistic regression, and linear regression.
They are also responsible for developing and implementing
AI development and production infrastructure.
The responsibilities of an AI Engineer are:
- Build AI models to draw business insights.
- Create infrastructure for data intake and data
transformation.
- Convert the ML models into application programming
interfaces (APIs) so that other applications may use
them.
- Perform analysis and evaluate the data to assist and
optimize the organization's decision-making process.
- Collaborate across teams to aid in the implementation
of AI and best practices.
Key Skills :
- Excellent programming skills in Python, R, Java, - Expertise with machine learning and deep
C++,..
- Knowledge of linear algebra, probability, and learning algorithms
statistics - Excellent communication and problem-solving
- Understanding of big data technologies such as skills
Hadoop, Cassandra, and MongoDB.
- Knowledge of artificial intelligence frameworks such
as PyTorch, Theano, TensorFlow, and Caffe. 33
Development trends
1. Artificial Intelligence/Machine Learning
Artificial intelligence leverages computers and
machines to mimic the problem-solving and decision-
making capabilities of the human mind.
Example: Chatbots use AI to understand customer
problems faster and provide more efficient answers.
Recommendation engines can provide automated
recommendations for TV shows based on users’
viewing habits.

2. Cloud Computing
Cloud computing is the delivery of computing
services—including servers, storage, databases,
networking, software, analytics, and intelligence—
over the Internet (“the cloud”) to offer faster
innovation, flexible resources, and economies of scale.
Types of cloud services: IaaS, PaaS, serverless, and
SaaS

34
Development trends
3. Big Data Analytics
The use of advanced analytic techniques against very
large, diverse datasets that include structured, semi-
structured and unstructured data, from different
sources, and in different sizes from terabytes to
zettabytes.

4. Blockchain
A blockchain is essentially a distributed database of
records or public ledger of all transactions or digital
events that have been executed and shared among
participating parties. Each transaction in the public
ledger is verified by consensus of a majority of the
participants in the system. And, once entered,
information can never be erased. The blockchain
contains a certain and verifiable record of every single
transaction ever made.

35
Development trends
5. Cyber Security

Cyber security is the application of technologies,


processes, and controls to protect systems, networks,
programs, devices and data from cyber attacks.

6. Internet Of Things (IoT)


IoT describes the network of physical objects—
“things”—that are embedded with sensors, software,
and other technologies for the purpose of connecting
and exchanging data with other devices and systems
over the internet. These devices range from ordinary
household objects to sophisticated industrial tools.

36
Demand for IT human resources
Vietnam is having a promising IT human resource in both quality and quantity
→ The advantage of an ideal software outsourcing destination

37
Demand for IT human resources

38
Demand for IT human resources

39
Demand for IT human resources

source: TopDev

40
Career Opportunities for IT

41
IT Revolution
IT has begun to affect (in both good and bad ways) Intellectual Property
community life, family life, human relationships,... • What is intellectual property?
• IT has altered many aspects of life - in banking and • The rights of creative workers in literary, artistic,
commerce, work and employment, medical care, industrial and scientific fields which can be protected
national defense, transportation and entertainment,... either by copyright, trademarks, patents, etc.
• New Technology → New Risks • Creative work: books, articles, plays, songs, works
of art, movies, and software.
Issues of using computer • Challenges of New Technologies
Privacy and personal information • The Soft copy!
• Theft of information • Storage media, Scanners.
• Inadvertent leakage of information
• Being tracked, followed, watched (Sophisticated Computer Crime
tools for surveillance and data analysis - GPS tracking • Hacking
device, Eagle eye) • Online scams
• Invisible information gathering → Secondary use • Auction Fraud
• Computer profiling • Forgery
• Selling your customer data • Identity theft and fraud
Freedom Of Speech • Phishing
• Fake and Malicious information • Vishing
• Anonymous, offensive speech • Pharming
• E-mail spam • Using job-hunting sites.
• Spyware • Trojan horse
• Phishing 42
• ....
Computer Ethics
Ten Commandments of Ethics
• Computer ethics is a new branch of ethics
that is growing and changing rapidly as 1. Not use a computer to harm other people.
computer technology also grows and develops.
2. Not interfere with other people's computer work.
3. Not snoop around in other people's computer files
• Computer ethics identifies and analyzes the
impacts of information technology upon human 4. Not use a computer to steal
values like health, opportunity, freedom,
privacy,... 5. Not use a computer to bear false witness

Moor 6. Not copy or use proprietary software for which you have
not paid (without permission)
7. Not use other people's computer resources without
authorization or proper compensation
8. Not appropriate other people's intellectual output
9. Think about the social consequences of the program you
are writing or the system you are designing
10. Always use a computer in ways that ensure consideration
and respect for your fellow humans

43
Q&A

44

You might also like