Data Science CLASS 12 INVESTIGATORY PROJECT

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 9

OBJECTIVE:

To explore, sort and analyse mega data from various sources in order to take
advantage of them and reach conclusions to optimize business processes or for decision
support. To develop and implement data science project plans, ensuring that projects are
completed on time, within budget, and to quality standards.

INTRODUCTION:

Data science combines math and statistics, specialized programming, advanced


analytics, artificial intelligence (AI) and machine learning with specific subject matter
expertise to uncover actionable insights hidden in an organization’s data. These insights can
be used to guide decision making and strategic planning.

The accelerating volume of data sources, and subsequently data, has made data
science is one of the fastest growing field across every industry. Organizations are
increasingly reliant on them to interpret data and provide actionable recommendations to
improve business outcomes. The data science lifecycle involves various roles, tools, and
processes, which enables analysts to glean actionable insights.
HISTORY OF DATA SCIENCE:

While the term data science is not new, the meanings and connotations have
changed over time. The word first appeared in the ’60s as an alternative name for statistics.
In the late ’90s, computer science professionals formalized the term. A proposed definition
for data science saw it as a separate field with three aspects: data design, collection, and
analysis. It still took another decade for the term to be used outside of academia.

DIFFERENT STAGES OF A DATA SCIENCE PROJECT:

Typically, a data science project undergoes the following stages:

 Data ingestion
 Data storage and data processing
 Data analysis
 Communicate

 Data ingestion: The lifecycle begins with the data collection—both raw structured
and unstructured data from all relevant sources using a variety of methods. Data
sources can include structured data, such as customer data, along with unstructured
data like log files, video, audio, pictures, the Internet of Things (IoT), social media,
and more

 Data storage and data processing: Since data can have different formats and
structures, companies need to consider different storage systems based on the type of
data that needs to be captured. Data management teams help to set standards around
data storage and structure, which facilitate workflows around analytics, machine
learning and deep learning models. This stage includes cleaning data, deduplicating,
transforming and combining the data using ETL (extract, transform, load) jobs or
other data integration technologies. This data preparation is essential for promoting
data quality before loading into a data warehouse, data lake, or other repository.
 Data analysis: Here, data scientists conduct an exploratory data analysis to examine
biases, patterns, ranges, and distributions of values within the data. It also allows
analysts to determine the data’s relevance for use within modelling efforts for
predictive analytics, machine learning, and/or deep learning. Depending on a model’s
accuracy, organizations can become reliant on these insights for business decision
making, allowing them to drive more scalability.

 Communicate: Finally, insights are presented as reports and other data visualizations
that make the insights—and their impact on business—easier for business analysts and
other decision-makers to understand. A data science programming language such as R
or Python includes components for generating visualizations; alternately, data
scientists can use dedicated visualization tools.

DATA SCIENCE – THE PROMISING FIELD:

Data science continues to evolve as one of the most promising and in-demand career
paths for skilled professionals. Today, successful data professionals understand they must
advance past the traditional skills of analysing large amounts of data, data mining, and
programming skills. To uncover useful intelligence for their organizations, data scientists
must master the full spectrum of the data science life cycle and possess a level of
flexibility and understanding to maximize returns at each phase of the process.
The image represents the five stages of the data science life cycle: Capture, (data
acquisition, data entry, signal reception, data extraction); Maintain (data warehousing, data
cleansing, data staging, data processing, data architecture); Process (data mining,
clustering/classification, data modelling, data
summarization); Analyse (exploratory/confirmatory, predictive analysis, regression, text
mining, qualitative analysis); Communicate (data reporting, data visualization, business
intelligence, decision making).

WHY IS DATA SCIENCE IMPORTANT?

Data science is important because it combines tools, methods, and


technology to generate meaning from data. Modern organizations are inundated with data;
there is a proliferation of devices that can automatically collect and store information. Online
systems and payment portals capture more data in the fields of e-commerce, medicine,
finance, and every other aspect of human life. We have text, audio, video, and image data
available in vast quantities.

FUTURE OF DATA SCIENCE

Artificial intelligence and machine learning innovations have made data


processing faster and more efficient. Industry demand has created an ecosystem of courses,
degrees, and job positions within the field of data science. Because of the cross-functional
skillset and expertise required, data science shows strong projected growth over the coming
decades.

WHAT DOES A DATA SCIENTIST DO?

Data scientists are typically curious and result-oriented, with exceptional


industry-specific knowledge and communication skills that allow them to explain highly
technical results to their non-technical counterparts. They possess a strong quantitative
background in statistics and linear algebra as well as programming knowledge with focuses
in data warehousing, mining, and modelling to build and analyse algorithms.

They also use key technical tools and skills, including:

 R
 Python
 Apache Hadoop
 MapReduce
 Apache Spark
 NoSQL databases
 Cloud computing
 Data Scientist

Data scientists examine which questions need answering and where to


find the related data. They have business acumen and analytical skills as well as the ability to
mine, clean, and present data. Businesses use data scientists to source, manage, and analyse
large amounts of unstructured data. Data scientists also leverage machine learning
techniques to model information and interpret results effectively, a skill that differentiates
them from data analysts. Results are then synthesized and communicated to key stakeholders
to drive strategic decision making in the organization.

CONCLUSION:

Data science is a dynamic and rapidly evolving field that plays pivotal role in our
data-driven world. It combines a variety of skills including statistics, programming, domain
knowledge and data visualization to extract valuable insights from vast and complex data
sets. It’s clear that the power of data science is here to stay and its potential is limited only
by our imagination and innovation.

S
QUESTIONNAIRE:

1. What is meant by data science?


ANS: Data science is an interdisciplinary field that encompasses the extraction of
valuable knowledge and insights. Data scientists can identify patterns, trends, and
correlations that help businesses make informed decisions and solve complex
problems.

2. What are the key components of the data science process?

ANS: The key components of the data science process are data collection, data
cleaning and pre-processing, exploratory data analysis, modelling, evaluation, and
deployment, with each step contributing to the overall goal of deriving meaningful
insights.

3. How is Python Useful?

ANS: Python is widely recognized as an exceptionally advantageous programming


language due to its versatility and simplicity. Its extensive range of applications and
associated benefits have established it as a preferred choice among developers.

4. What is Supervised Learning?

Supervised learning is a machine learning approach in which an algorithm learns from


labeled training data to make predictions or classify new, unseen data.

5. What is Unsupervised Learning?

Unsupervised learning is a machine learning approach wherein an algorithm uncovers


patterns and structures within unlabelled data.

6. What is a confusion matrix?


The confusion matrix is a table that is used to estimate the performance of a model. It
tabulates the actual values and the predicted values in a 2×2 matrix.

BIBLIOGRAPHY:

1. https://www.ibm.com
2. https://aws.amazon.com
3. https://www.cousera.org
4. https://www.geeksforgeeks.org

You might also like