Data Science Essay
Data Science Essay
Data Science Essay
The term data science was first used by academic statisticians to position the discipline with
respect to big data, data analysis, and broader trends. These early discussions emphasized
mathematical foundations and the new statistical methods made possible by an abundance of
data. From this academic origin, industrial data science emerged driven by new technology and
the ability to extract business value from data. Data science is the child of statistics and computer
science. While it has inherited some of their methods and thinking, it also seeks to blend them,
refocus them, and develop them to address the context and needs of modern scientific data
analysis.
Our ability to collect, store, and access large volumes of information is accelerating at
unprecedented rates with better sensor technologies, more powerful computing platforms, and
greater on-line connectivity. With the growing size of data, there has been a simultaneous
revolution in the computational and statistical methods for processing and analyzing data,
collectively referred to as the field of data science. These advances have made long lasting
impacts on the way we sense, communicate, and make decisions, a trend that is only expected to
grow in the foreseeable future.
Data science focuses on exploiting the modern deluge of data for prediction, exploration,
understanding, and intervention. It emphasizes the value and necessity of approximation and
simplification. It values effective communication of the results of a data analysis and of the
understanding about the world that we glean from it. It prioritizes an understanding of the
optimization algorithms and transparently managing the inevitable tradeoff between accuracy
and speed. It promotes domain-specific analyses, where data scientists and domain experts work
together to balance appropriate assumptions with computationally efficient methods.
All datasets involve uncertainty. There may be uncertainty about how they were collected, how
they were measured, or the process that created them. Statistical modeling helps quantify and
reason about uncertainties in a systematic way. It provides tools and theory that guide the
inferences and predictions for specific problems and real data. Statistics relates to data science
through multiple statistical subfields. The art of data science is to understand how to apply these
tools in the context of a specific dataset and for answering specific scientific questions. Data
science is an essential driving force in today’s digital world. In almost all areas of the economy,
large amounts of data are collected and generated.
Recently, data-driven methods have also found their way into various parts of the natural
sciences and humanities. The task of data science is to gain knowledge from ever bigger volumes
of data, which represents added value for the respective area. This requires not only the
development of efficient algorithms, but also a basic understanding of the interpretability and
reliability of results. Artificial intelligence and data science refers to set of rules and regulation,
and algorithms. Which are provided with some initial data known as training data. On the bases
of training data algorithms defines say a formula and apply this same formula on testing data and
generates results. There thousands of algorithms and the whole data science are based on these
algorithms.
The report acknowledges that “investments are needed to expand the current pipeline of support
to the field of data science;” as society is increasingly infused with data, librarians will have a
crucial role in the future development of the data science ecosystem. Data science exists on a
spectrum and can span work that requires deep statistical and software engineering skills, to
work focusing on advocacy, policy development, data management planning, and evidence-
based decision making. To help the enterprise prepare for the future of data science, there are the
following five key factors shaping the data science industry.
Making data actionable for data science: Poorly prepared data is one of the biggest obstacles
to data science success. In order to accelerate data science projects and reduce failures, CIOs and
CDOs must focus on improving the quality of data and in providing data to data science teams
that is relevant to projects at hand and is actionable
Shortage of data science talent: While data science remains one of the areas of highest growth
for new graduates, the need far surpasses available supply. The solution is to continue to
accelerate hiring, while also looking at alternative means of accelerating the data science process
and democratizing access to data science for other skilled professionals in areas like BI and
analytics. This is where automation in data science can have the biggest impact.
Accelerating "time to value": Data science is an iterative process. It involves creating a
"hypothesis" and then testing it. This back and forth approach involves a number of experts
ranging from data scientists to subject matter experts and data analysts. Enterprises must find
ways of accelerating the data science process to make this "try, test repeat" process faster and
more predictable.
Transparency for business users: One of the biggest barriers to adoption for data science
applications is the lack of trust on the part of business users. While machine learning models can
be very useful, many business users don't trust processes that they don't understand. Data science
must find ways of making ML models easier to explain to business users and easier for business
users to trust.
Improving operationalization: One of the other barriers to the growth of data science adoption
is how hard it can be too operationalize. Models that often work well in the lab don't work as
well in production environments. Even when models are deployed successfully, continuing
growth and changes in production data can negatively impact models over time. This means that
having an effective way of "fine tuning" ML models even after they are in production is a critical
part of the process.