Introduction To Data Science What Is Data Science?
Introduction To Data Science What Is Data Science?
Introduction To Data Science What Is Data Science?
● Data is the oil for today’s world. With the right tools, technologies, algorithms,
we can use data and convert it into a distinct business advantage
● Data Science can help you to detect fraud using advanced machine learning
algorithms
● It helps you to prevent any significant monetary losses
● Allows to build intelligence ability in machines
● You can perform sentiment analysis to gauge customer brand loyalty
● It enables you to take better and faster decisions
● It helps you to recommend the right product to the right customer to enhance
your business
Fig. 1 Evolution of Data Sciences
Statistics is the most critical unit of Data Science basics, and it is the method or
science of collecting and analyzing numerical data in large quantities to get useful
insights.
Visualization:
Visualization technique helps you access huge amounts of data in easy to understand
and digestible visuals.
Machine Learning:
Machine Learning explores the building and study of algorithms that learn to make
predictions about unforeseen/future data.
Deep Learning:
Deep Learning method is new machine learning research where the algorithm selects
the analysis model to follow.
1. Discovery:
Discovery step involves acquiring data from all the identified internal & external
sources, which helps you answer the business question.
Data can have many inconsistencies like missing values, blank columns, an incorrect
data format, which needs to be cleaned. You need to process, explore, and condition
data before modeling. The cleaner your data, the better are your predictions.
3. Model Planning:
In this stage, you need to determine the method and technique to draw the relation
between input variables. Planning for a model is performed by using different
statistical formulas and visualization tools. SQL analysis services, R, and SAS/access
are some of the tools used for this purpose.
4. Model Building:
In this step, the actual model building process starts. Here, Data scientists distribute
datasets for training and testing. Techniques like association, classification, and
clustering are applied to the training data set. The model, once prepared, is tested
against the “testing” dataset.
5. Operationalize:
You deliver the final baselined model with reports, code, and technical documents in
this stage. Model is deployed into a real-time production environment after thorough
testing.
6. Communicate Results:
In this stage, the key findings are communicated to all stakeholders. This helps you
decide if the project results are a success or a failure based on the inputs from the
model.
● Data Scientist
● Data Engineer
● Data Analyst
● Statistician
● Data Architect
● Data Admin
● Business Analyst
● Data/Analytics Manager
Data Scientist:
Data Engineer:
Role: The role of a data engineer is of working with large amounts of data. He
develops, constructs, tests, and maintains architectures like large scale processing
systems and databases.
Languages: SQL, Hive, R, SAS, Matlab, Python, Java, Ruby, C + +, and Perl
Data Analyst:
Role: A data analyst is responsible for mining vast amounts of data. They will look
for relationships, patterns, trends in data. Later he or she will deliver compelling
reporting and visualization for analyzing the data to take the most viable business
decisions.
Statistician:
Role: The statistician collects, analyses and understands qualitative and quantitative
data using statistical theories and methods.
Data Administrator:
Role: Data admin should ensure that the database is accessible to all relevant users.
He also ensures that it is performing correctly and keeps it safe from hacking.
Business Analyst:
Data Data
Data Analysis Machine Learning
Warehousing Visualization
R, Hadoop, Spark, Azure ML studio,
R, Tableau, Raw
Spark, Python and SAS SQL, Hive Mahout
Internet Search:
Google search uses Data science technology to search for a specific result within a
fraction of a second
Recommendation Systems:
Speech recognizes systems like Siri, Google Assistant, and Alexa run on the Data
science technique. Moreover, Facebook recognizes your friend when you upload a
photo with them, with the help of Data Science.
Gaming world:
EA Sports, Sony, Nintendo are using Data science technology. This enhances your
gaming experience. Games are now developed using Machine Learning techniques,
and they can update themselves when you move to higher levels.
Online Price Comparison:
PriceRunner, Junglee, Shopzilla work on the Data science mechanism. Here, data is
fetched from the relevant websites using APIs.
Summary
● Data Science is the area of study that involves extracting insights from vast
amounts of data by using various scientific methods, algorithms, and processes.
● Statistics, Visualization, Deep Learning, and Machine Learning are important
Data Science concepts.
● Data Science Process goes through Discovery, Data Preparation, Model
Planning, Model Building, Operationalize, and Communicate Results.
● Important Data Scientist job roles are: 1) Data Scientist 2) Data Engineer 3)
Data Analyst 4) Statistician 5) Data Architect 6) Data Admin 7) Business
Analyst 8) Data/Analytics Manager.
● R, SQL, Python, SaS are essential Data science tools.
● The predictions of Business Intelligence is looking backwards, while for Data
Science, it is looking forward.
● Important applications of Data science are 1) Internet Search 2)
Recommendation Systems 3) Image & Speech Recognition 4) Gaming world 5)
Online Price Comparison.
● The high variety of information & data is the biggest challenge of Data science
technology.