Data Science CLASS 12 INVESTIGATORY PROJECT
Data Science CLASS 12 INVESTIGATORY PROJECT
Data Science CLASS 12 INVESTIGATORY PROJECT
To explore, sort and analyse mega data from various sources in order to take
advantage of them and reach conclusions to optimize business processes or for decision
support. To develop and implement data science project plans, ensuring that projects are
completed on time, within budget, and to quality standards.
INTRODUCTION:
The accelerating volume of data sources, and subsequently data, has made data
science is one of the fastest growing field across every industry. Organizations are
increasingly reliant on them to interpret data and provide actionable recommendations to
improve business outcomes. The data science lifecycle involves various roles, tools, and
processes, which enables analysts to glean actionable insights.
HISTORY OF DATA SCIENCE:
While the term data science is not new, the meanings and connotations have
changed over time. The word first appeared in the ’60s as an alternative name for statistics.
In the late ’90s, computer science professionals formalized the term. A proposed definition
for data science saw it as a separate field with three aspects: data design, collection, and
analysis. It still took another decade for the term to be used outside of academia.
Data ingestion
Data storage and data processing
Data analysis
Communicate
Data ingestion: The lifecycle begins with the data collection—both raw structured
and unstructured data from all relevant sources using a variety of methods. Data
sources can include structured data, such as customer data, along with unstructured
data like log files, video, audio, pictures, the Internet of Things (IoT), social media,
and more
Data storage and data processing: Since data can have different formats and
structures, companies need to consider different storage systems based on the type of
data that needs to be captured. Data management teams help to set standards around
data storage and structure, which facilitate workflows around analytics, machine
learning and deep learning models. This stage includes cleaning data, deduplicating,
transforming and combining the data using ETL (extract, transform, load) jobs or
other data integration technologies. This data preparation is essential for promoting
data quality before loading into a data warehouse, data lake, or other repository.
Data analysis: Here, data scientists conduct an exploratory data analysis to examine
biases, patterns, ranges, and distributions of values within the data. It also allows
analysts to determine the data’s relevance for use within modelling efforts for
predictive analytics, machine learning, and/or deep learning. Depending on a model’s
accuracy, organizations can become reliant on these insights for business decision
making, allowing them to drive more scalability.
Communicate: Finally, insights are presented as reports and other data visualizations
that make the insights—and their impact on business—easier for business analysts and
other decision-makers to understand. A data science programming language such as R
or Python includes components for generating visualizations; alternately, data
scientists can use dedicated visualization tools.
Data science continues to evolve as one of the most promising and in-demand career
paths for skilled professionals. Today, successful data professionals understand they must
advance past the traditional skills of analysing large amounts of data, data mining, and
programming skills. To uncover useful intelligence for their organizations, data scientists
must master the full spectrum of the data science life cycle and possess a level of
flexibility and understanding to maximize returns at each phase of the process.
The image represents the five stages of the data science life cycle: Capture, (data
acquisition, data entry, signal reception, data extraction); Maintain (data warehousing, data
cleansing, data staging, data processing, data architecture); Process (data mining,
clustering/classification, data modelling, data
summarization); Analyse (exploratory/confirmatory, predictive analysis, regression, text
mining, qualitative analysis); Communicate (data reporting, data visualization, business
intelligence, decision making).
R
Python
Apache Hadoop
MapReduce
Apache Spark
NoSQL databases
Cloud computing
Data Scientist
CONCLUSION:
Data science is a dynamic and rapidly evolving field that plays pivotal role in our
data-driven world. It combines a variety of skills including statistics, programming, domain
knowledge and data visualization to extract valuable insights from vast and complex data
sets. It’s clear that the power of data science is here to stay and its potential is limited only
by our imagination and innovation.
S
QUESTIONNAIRE:
ANS: The key components of the data science process are data collection, data
cleaning and pre-processing, exploratory data analysis, modelling, evaluation, and
deployment, with each step contributing to the overall goal of deriving meaningful
insights.
BIBLIOGRAPHY:
1. https://www.ibm.com
2. https://aws.amazon.com
3. https://www.cousera.org
4. https://www.geeksforgeeks.org