0% found this document useful (0 votes)
14 views14 pages

Unit-5 Data Science Notes

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views14 pages

Unit-5 Data Science Notes

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Unit 5 Notes

Applications of Data Science:


10 applications that build upon the concepts of Data Science:

1. Fraud and Risk Detection


2. Healthcare
3. Internet Search
4. Targeted Advertising
5. Website Recommendations
6. Advanced Image Recognition
7. Speech Recognition
8. Airline Route Planning
9. Gaming
10. Augmented Reality

1. Fraud and Risk Detection

The earliest applications of data science were in Finance. Companies were fed up of bad debts
and losses every year. However, they had a lot of data which use to get collected during the
initial paperwork while sanctioning loans. They decided to bring in data scientists in order to
rescue them out of losses.

Over the years, banking companies learned to divide and conquer data via customer profiling,
past expenditures, and other essential variables to analyze the probabilities of risk and default.
Moreover, it also helped them to push their banking products based on customer’s purchasing
power.

2. Healthcare

The healthcare sector, especially, receives great benefits from data science applications.

a. Medical Image Analysis

Procedures such as detecting tumors, artery stenosis, organ delineation employ various
methods and frameworks like MapReduce to find optimal parameters for tasks like lung texture
classification. It applies machine learning methods, support vector machines (SVM), content-

Page 1 of 14
Unit 5 Notes
based medical image indexing, and wavelet analysis for solid texture classification.

b. Genetics & Genomics

Data Science applications also enable an advanced level of treatment personalization through
research in genetics and genomics. The goal is to understand the impact of the DNA on our
health and find individual biological connections between genetics, diseases, and drug
response. Data science techniques allow integration of different kinds of data with genomic
data in the disease research, which provides a deeper understanding of genetic issues in
reactions to particular drugs and diseases. As soon as we acquire reliable personal genome data,
we will achieve a deeper understanding of the human DNA. The advanced genetic risk
prediction will be a major step towards more individual care.

c. Drug Development

The drug discovery process is highly complicated and involves many disciplines. The greatest
ideas are often bounded by billions of testing, huge financial and time expenditure. On average,
it takes twelve years to make an official submission.

Data science applications and machine learning algorithms simplify and shorten this process,
adding a perspective to each step from the initial screening of drug compounds to the prediction
of the success rate based on the biological factors. Such algorithms can forecast how the
compound will act in the body using advanced mathematical modeling and simulations instead
of the “lab experiments”. The idea behind the computational drug discovery is to create
computer model simulations as a biologically relevant network simplifying the prediction of
future outcomes with high accuracy.

d. Virtual assistance for patients and customer support

Optimization of the clinical process builds upon the concept that for many cases it is not
actually necessary for patients to visit doctors in person. A mobile application can give a more
effective solution by bringing the doctor to the patient instead.

Page 2 of 14
Unit 5 Notes
The AI-powered mobile apps can provide basic healthcare support, usually as chatbots. You
simply describe your symptoms, or ask questions, and then receive key information about your
medical condition derived from a wide network linking symptoms to causes. Apps can remind
you to take your medicine on time, and if necessary, assign an appointment with a doctor.

This approach promotes a healthy lifestyle by encouraging patients to make healthy decisions,
saves their time waiting in line for an appointment, and allows doctors to focus on more critical
cases.

The most popular applications nowadays are Your.MD and Ada.

3. Internet Search

Now, this is probably the first thing that strikes your mind when you think Data Science
Applications.

When we speak of search, we think ‘Google’. Right? But there are many other search engines
like Yahoo, Bing, Ask, AOL, and so on. All these search engines (including Google) make use
of data science algorithms to deliver the best result for our searched query in a fraction of
seconds. Considering the fact that, Google processes more than 20 petabytes of data every day.

Had there been no data science, Google wouldn’t have been the ‘Google’ we know today.

4. Targeted Advertising

If you thought Search would have been the biggest of all data science applications, here is a
challenger – the entire digital marketing spectrum. Starting from the display banners on various
websites to the digital billboards at the airports – almost all of them are decided by using data
science algorithms.

This is the reason why digital ads have been able to get a lot higher CTR (Call-Through Rate)
than traditional advertisements. They can be targeted based on a user’s past behavior.

This is the reason why you might see ads of Data Science Training Programs while I see an ad
of apparels in the same place at the same time.

5. Website Recommendations

Aren’t we all used to the suggestions about similar products on Amazon? They not only help
you find relevant products from billions of products available with them but also adds a lot to
the user experience.

A lot of companies have fervidly used this engine to promote their products in accordance with
user’s interest and relevance of information. Internet giants like Amazon, Twitter, Google Play,
Netflix, Linkedin, imdb and many more use this system to improve the user experience. The
recommendations are made based on previous search results for a user.

Page 3 of 14
Unit 5 Notes

6. Advanced Image Recognition

You upload your image with friends on Facebook and you start getting suggestions to tag your
friends. This automatic tag suggestion feature uses face recognition algorithm.

In their latest update, Facebook has outlined the additional progress they’ve made in this area,
making specific note of their advances in image recognition accuracy and capacity.

“We’ve witnessed massive advances in image classification (what is in the image?) as well as
object detection (where are the objects?), but this is just the beginning of understanding the
most relevant visual content of any image or video. Recently we’ve been designing
techniques that identify and segment each and every object in an image, a key capability that
will enable entirely new applications.”
In addition, Google provides you with the option to search for images by uploading them. It
uses image recognition and provides related search results.

7. Speech Recognition

Some of the best examples of speech recognition products are Google Voice, Siri, Cortana etc.
Using speech-recognition feature, even if you aren’t in a position to type a message, your life
wouldn’t stop. Simply speak out the message and it will be converted to text. However, at
times, you would realize, speech recognition doesn’t perform accurately.

8. Airline Route Planning

Airline Industry across the world is known to bear heavy losses. Except for a few airline service
providers, companies are struggling to maintain their occupancy ratio and operating profits.
With high rise in air-fuel prices and need to offer heavy discounts to customers has further
made the situation worse. It wasn’t for long when airlines companies started using data science

Page 4 of 14
Unit 5 Notes
to identify the strategic areas of improvements. Now using data science, the airline companies
can:

1. Predict flight delay


2. Decide which class of airplanes to buy
3. Whether to directly land at the destination or take a halt in between (For example, A
flight can have a direct route from New Delhi to New York. Alternatively, it can also
choose to halt in any country.)
4. Effectively drive customer loyalty programs

Southwest Airlines, Alaska Airlines are among the top companies who’ve embraced data
science to bring changes in their way of working.

You can get a better insight into it by referring to this video by our team, which vividly speaks
of all the various fields conquered by Data Science Applications.

9. Gaming

Games are now designed using machine learning algorithms which improve/upgrade
themselves as the player moves up to a higher level. In motion gaming also, your opponent
(computer) analyzes your previous moves and accordingly shapes up its game. EA Sports,
Zynga, Sony, Nintendo, Activision-Blizzard have led gaming experience to the next level using
data science.

10. Augmented Reality: This is the final of the data science applications which seems most
exciting in the future.
Data Science and Virtual Reality do have a relationship, considering a VR headset contains
computing knowledge, algorithms, and data to provide you with the best viewing
experience. A very small step towards this is the high trending game of Pokemon GO. The
ability to walk around things and look at Pokemon on walls, streets, things that aren’t really
there. The creators of this game used the data from Ingress, the last app from the same
company, to choose the locations of the Pokemon and gyms.

Page 5 of 14
Unit 5 Notes
Recent trends in various data collection and analysis techniques

It’s really important to stay updated with the hottest data science trends that could serve to be
a blessing to grow your business. Here are the top 10 data science trends for this decade.

1. Predictive analysis
For a business to prosper, it is critical to know what the future might look like. This is exactly
where predictive analysis comes into play. Organizations rely on their customers to a large
extent. Hence, being able to understand their behaviors helps in making better decisions
ahead. This technique is one of the smartest to come up with the best strategies to target the
customers that’d aid in retaining the older ones and also get newer customers.
2. Machine learning
Over the years, we have seen how much automation has transformed the world. This is why
machine learning has gained importance like never before. The coming years will see more
automation and hence the rise in the number of organizations adopting machine learning will
surpass one’s imagination for sure.

3. IoT
Gone are the days when IoT was considered to be something that would have limited
applications. Today, we are living in a world where our smartphones have the ability to
control appliances like TV, AC, etc. All of this is possible because of IoT. Google Assistant
is yet another remarkable innovation in the area of IoT. Thus, companies looking for ways to
invest in this technology come as no big surprise. This simply throws light on how rapidly the
IoT industry would grow in the days ahead.

4. Blockchain
Needless to say, cryptocurrencies like Bitcoin, Litecoin, etc. have become the talk of the
world. All of these currencies employ blockchain technology. With the world showing keen
interest in this field, it surely stands a far-reaching implementation in the coming time

5. Edge computing
Edge computing is known for faster processing of information and it also boasts of reducing
latency, cost and traffic. It is solely because of these features that the organizations are not
willing to sideline this option. With this computing in place, dealing with real-time
applications couldn’t have got any better. The coming years could see more of a considerable
shift from traditional methods to that of edge computing.

6. DataOps
Lets’ face the reality – the data pipeline has become more complex and thus requires even
more integration and governance tools. DataOps to our rescue it is! Tasks right from
collection to preparation to analysis, testing automation, implementing automated testing,
delivery for providing enhanced data quality and analysis are all covered. This trend will
continue for the years to come.

7. Artificial Intelligence
Be it a small enterprise or a tech giant, all of them have relied on AI in one way or the other.
All those complex tasks are no longer a concern for we now can rely on AI for the same.
Also, the reduction in errors is yet another strong reason to why AI stands apart. Now that
we’ve relied on AI so much, there’s no coming back!

Page 6 of 14
Unit 5 Notes
8. Data visualization
This is one of those prominent trends that we can trust with. This is because the organizations
are moving their conventional data warehouses to the cloud.

9. Better user experience


The extent to which user experience is given importance to talks volume about the success of
the company. This is why companies are leaving no stone unturned in providing the best
possible user experience – be it in the form of chatbots, personal assistance, or AI-driven
tools for that matter.

10. Data governance


This is yet another area that’s gaining a lot of importance. Numerous companies out there are
still struggling to comply with the rules and regulations. It is critical to not just comply with
these but also to understand the impact of the same on the present and future operations. Data
scientists who have sound knowledge about all of this is the need of the hour.
These trends show a clearer picture of what data science strategies need to be implemented to
retain your customers and also take your business to new heights.

Page 7 of 14
Unit 5 Notes
Application development methods used in data science
Agile Data Science is an approach to data science centered around web application
development. It asserts that the most effective output of the data science process suitable for
effecting change in an organization is the web application. It asserts that application
development is a fundamental skill of a data scientist. Therefore, doing data science becomes
about building applications that describe the applied research process: rapid prototyping,
exploratory data analysis, interactive visualization, and applied machine learning.

Agile software methods have become the de facto way software is delivered today. There are
a range of fully developed methodologies, such as Scrum, that give a framework within
which good software can be built in small increments. There have been some attempts to
apply agile software methods to data science, but these have had unsatisfactory results. There
is a fundamental difference between delivering production software and actionable insights
as artifacts of an agile process. The need for insights to be actionable creates an element of
uncertainty around the artifacts of data science—they might be “complete” in a software
sense, and yet lack any value because they don’t yield real, actionable insights. As data
scientist Daniel Tunkelang says, “The world of actionable insights is necessarily looser than
the world of software engineering.” Scrum and other agile software methodologies don’t
handle this uncertainty well. Simply put: agile software doesn’t make Agile Data Science.
This created the motivation for this book: to provide a new methodology suited to the
uncertainty of data science along with a guide on how to apply it that would demonstrate the
principles in real software.

Agile Data Science - Introduction


Agile data science is an approach of using data science with agile methodology for web
application development. It focusses on the output of the data science process suitable for
effecting change for an organization. Data science includes building applications that describe
research process with analysis, interactive visualization and now applied machine learning as
well.
The major goal of agile data science is to −
document and guide explanatory data analysis to discover and follow the critical path to a
compelling product.
Agile data science is organized with the following set of principles −
1. Continuous Iteration
This process involves continuous iteration with creation tables, charts, reports and predictions.
Building predictive models will require many iterations of feature engineering with extraction
and production of insight.
2. Intermediate Output
This is the track list of outputs generated. It is even said that failed experiments also have
output. Tracking output of every iteration will help creating better output in the next iteration.
3. Prototype Experiments
Prototype experiments involve assigning tasks and generating output as per the experiments.
In a given task, we must iterate to achieve insight and these iterations can be best explained
as experiments.
4. Integration of data
The software development life cycle includes different phases with data essential for −
 customers
 developers, and

Page 8 of 14
Unit 5 Notes
 the business
The integration of data paves way for better prospects and outputs.

Pyramid data value

The above pyramid value described the layers needed for “Agile data science” development.
It starts with a collection of records based on the requirements and plumbing individual
records. The charts are created after cleaning and aggregation of data. The aggregated data
can be used for data visualization. Reports are generated with proper structure, metadata and
tags of data. The second layer of pyramid from the top includes prediction analysis. The
prediction layer is where more value is created but helps in creating good predictions that
focus on feature engineering.
The topmost layer involves actions where the value of data is driven effectively. The best
illustration of this implementation is “Artificial Intelligence”.

Agile Data Science - Methodology Concepts


In this chapter, we will focus on the concepts of software development life cycle called
“agile”. The Agile software development methodology helps in building a software through
increment sessions in short iterations of 1 to 4 weeks so the development is aligned with
changing business requirements.
There are 12 principles that describe the Agile methodology in detail −
1. Satisfaction of customers
The highest priority is given to customers focusing on the requirements through early and
continuous delivery of valuable software.
2. Welcoming new changes
Changes are acceptable during software development. Agile processes is designed to work in
order to match the customer’s competitive advantage.
3. Delivery
Delivery of a working software is given to clients within a span of one to four weeks.
4. Collaboration
Business analysts, quality analysts and developers must work together during the entire life
cycle of project.
Page 9 of 14
Unit 5 Notes
5. Motivation
Projects should be designed with a clan of motivated individuals. It provides an environment
to support individual team members.
6. Personal conversation
Face-to-face conversation is the most efficient and effective method of sending information
to and within a development team.
7. Measuring progress
Measuring progress is the key that helps in defining the progress of project and software
development.
8. Maintaining constant pace
Agile process focusses on sustainable development. The business, the developers and the users
should be able to maintain a constant pace with the project.
9. Monitoring
It is mandatory to maintain regular attention to technical excellence and good design to
enhance the agile functionality.
10. Simplicity
Agile process keeps everything simple and uses simple terms to measure the work that is not
completed.
11. Self-organized terms
An agile team should be self-organized and should be independent with the best architecture;
requirements and designs emerge from self-organized teams.
12. Review the work
It is important to review the work at regular intervals so that the team can reflect on how the
work is progressing. Reviewing the module on a timely basis will improve performance.
Daily Stand-up
Daily stand-up refers to the daily status meeting among the team members. It provides updates
related to the software development. It also refers to addressing obstacles of project
development.
Daily stand-up is a mandatory practice, no matter how an agile team is established regardless
of its office location.
The list of features of a daily stand-up are as follows −
 The duration of daily stand-up meet should be roughly 15 minutes. It should not extend
for a longer duration.
 Stand-up should include discussions on status update.
 Participants of this meeting usually stand with the intention to end up meeting quickly.
User Story
A story is usually a requirement, which is formulated in few sentences in simple language and
it should be completed within an iteration. A user story should include the following
characteristics −
 All the related code should have related check-ins.
 The unit test cases for the specified iteration.
 All the acceptance test cases should be defined.
 Acceptance from product owner while defining the story.

Page 10 of 14
Unit 5 Notes

What is Scrum?
Scrum can be considered as a subset of agile methodology. It is a lightweight process and
includes the following features −
 It is a process framework, which includes a set of practices that need to be followed in
consistent order. The best illustration of Scrum is following iterations or sprints.
 It is a “lightweight” process meaning that the process is kept as small as possible, to
maximize the productive output in given duration specified.
Scrum process is known for its distinguishing process in comparison with other methodologies
of traditional agile approach. It is divided into the following three categories −
 Roles
 Artifacts
 Time Boxes
Roles define the team members and their roles included throughout the process. The Scrum
Team consists of the following three roles −
 Scrum Master
 Product Owner
 Team
The Scrum artifacts provide key information that each member should be aware of. The
information includes details of product, activities planned, and activities completed. The
artefacts defined in Scrum framework are as follows −
 Product backlog
 Sprint backlog
 Burn down chart
 Increment
Time boxes are the user stories which are planned for each iteration. These user stories help
in describing the product features which form part of the Scrum artefacts. The product backlog
is a list of user stories. These user stories are prioritized and forwarded to the user meetings
to decide which one should be taken up.
Why Scrum Master?
Scrum Master interacts with every member of the team. Let us now see the interaction of the
Scrum Master with other teams and resources.
Product Owner
The Scrum Master interacts the product owner in following ways −
 Finding techniques to achieve effective product backlog of user stories and managing
them.
 Helping team to understand the needs of clear and concise product backlog items.
 Product planning with specific environment.
 Ensuring that product owner knows how to increase the value of product.

Page 11 of 14
Unit 5 Notes
 Facilitating Scrum events as and when required.
Scrum Team
The Scrum Master interacts with the team in several ways −
 Coaching the organization in its Scrum adoption.
 Planning Scrum implementations to the specific organization.
 Helping employees and stakeholders to understand the requirement and phases of
product development.
 Working with Scrum Masters of other teams to increase effectiveness of the application
of Scrum of the specified team.
Organization
The Scrum Master interacts with organization in several ways. A few are mentioned below −
 Coaching and scrum team interacts with self-organization and includes a feature of
cross functionality.
 Coaching the organization and teams in such areas where Scrum is not fully adopted
yet or not accepted.
Benefits of Scrum
Scrum helps customers, team members and stakeholders collaborate. It includes timeboxed
approach and continuous feedback from the product owner ensuring that the product is in
working condition. Scrum provides benefits to different roles of the project.
Customer
The sprints or iterations are considered for shorter duration and user stories are designed as
per priority and are taken up at sprint planning. It ensures that every sprint delivery, customer
requirements are fulfilled. If not, the requirements are noted and are planned and taken for
sprint.
Organization
Organization with the help of Scrum and Scrum masters can focus on the efforts required for
development of user stories thus reducing work overload and avoiding rework if any. This
also helps in maintaining increased efficiency of development team and customer satisfaction.
This approach also helps in increasing the potential of the market.
Product Managers
The main responsibility of the product managers is to ensure that the quality of product is
maintained. With the help of Scrum Masters, it becomes easy to facilitate work, gather quick
responses and absorb changes if any. Product managers also verify that the designed product
is aligned as per the customer requirements in every sprint.
Development Team
With time-boxed nature and keeping sprints for a smaller duration of time, development team
becomes enthusiastic to see that the work is reflected and delivered properly. The working
product increments each level after every iteration or rather we can call them as “sprint”. The
user stories which are designed for every sprint become customer priority adding up more
value to the iteration.
Conclusion
Scrum is an efficient framework within which you can develop software in teamwork. It is
completely designed on agile principles. ScrumMaster is there to help and co-operate the team
of Scrum in every possible way. He acts like a personal trainer who helps you stick with
designed plan and perform all the activities as per the plan. The authority of ScrumMaster
should never extend beyond the process. He/she should be potentially capable to manage
every situation.

Page 12 of 14
Unit 5 Notes
Agile Data Science - Data Science Process
“Data science is the blend of data interface, algorithm development and technology in order
to solve analytical complex problems”.

Data science is an interdisciplinary field encompassing scientific methods, processes and


systems with categories included in it as Machine learning, math and statistics knowledge
with traditional research. It also includes a combination of hacking skills with substantive
expertise. Data science draws principles from mathematics, statistics, information science,
and computer science, data mining and predictive analysis.
The different roles that form part of the data science team are mentioned below −
Customers
Customers are the people who use the product. Their interest determines the success of project
and their feedback is very valuable in data science.
Business Development
This team of data science signs in early customers, either firsthand or through creation of
landing pages and promotions. Business development team delivers the value of product.
Product Managers
Product managers take in the importance to create best product, which is valuable in market.

Interaction designers
They focus on design interactions around data models so that users find appropriate value.

Page 13 of 14
Unit 5 Notes
Data scientists
Data scientists explore and transform the data in new ways to create and publish new features.
These scientists also combine data from diverse sources to create a new value. They play an
important role in creating visualizations with researchers, engineers and web developers.
Researchers
As the name specifies researchers are involved in research activities. They solve complicated
problems, which data scientists cannot do. These problems involve intense focus and time of
machine learning and statistics module.
Adapting to Change
All the team members of data science are required to adapt to new changes and work on the
basis of requirements. Several changes should be made for adopting agile methodology with
data science, which are mentioned as follows −
 Choosing generalists over specialists.
 Preference of small teams over large teams.
 Using high-level tools and platforms.
 Continuous and iterative sharing of intermediate work.
Note
In the Agile data science team, a small team of generalists uses high-level tools that are
scalable and refine data through iterations into increasingly higher states of value.
Consider the following examples related to the work of data science team members −
 Designers deliver CSS.
 Web developers build entire applications, understand the user experience, and interface
design.
 Data scientists should work on both research and building web services including web
applications.
 Researchers work in code base, which shows results explaining intermediate results.
 Product managers try identifying and understanding the flaws in all the related areas.
For complete information refer to the following link:
https://www.tutorialspoint.com/agile_data_science/agile_data_science_quick_guide.htm

Page 14 of 14

You might also like