DataScience Industry
DataScience Industry
DataScience Industry
INDUSTRY
PRATYUSH MOHANTY
Regd.no: 2221289054
Seminar Report on
Data Science in Industry
Submitted in Partial Fulfillment of
The Requirement for the 6th Sem. Seminar
B. Tech
In
Computer Science & Technology
Submitted by
Pratyush Mohanty
Regd.no: 2101289332
CERTIFICATE
This is to certify that this Seminar Report on the topic entitled Data Science and Analytics
which is submitted by Pratyush Mohanty bearing Registration No.: 2221289054 in partial
fulfillment of the requirement for the award of the of B. Tech in Computer Science &
Technology of Biju Patnaik University of Technology, Odisha, is a record of the candidate's
own work carried out by his under my supervision.
ABSTRACT
The seminar offers a comprehensive introduction to the fundamentals of Data science and
Analytics used to give a stand at the forefront of modern decision-making, offering powerful
tools and methodologies to extract insights from complex datasets. This abstract provides a
comprehensive overview of the field, beginning with an exploration of data collection techniques
and preprocessing methods. From traditional surveys to advanced sensor technologies and web
scraping, data collection plays a foundational role in acquiring the raw material for analysis.
Subsequently, the abstract delves into the critical phase of exploratory data analysis (EDA),
where visualizations and statistical techniques are employed to uncover patterns, trends, and
anomalies within the data. EDA serves as a crucial steppingstone towards understanding the
data's underlying structure, guiding subsequent analysis and modeling efforts.
Moving forward, the abstract navigates through the diverse landscape of machine learning
algorithms and techniques, ranging from supervised to unsupervised learning methods. These
algorithms enable predictive modeling, classification, clustering, and recommendation systems,
empowering organizations to make data-driven decisions and derive actionable insights. Model
evaluation and validation techniques are explored, emphasizing the importance of assessing
model performance and ensuring generalization to unseen data. Additionally, the abstract
addresses the challenges and considerations involved in deploying machine learning models into
production environments, such as scalability, performance optimization, and continuous
monitoring.
Ethical considerations in data science and analytics are also examined, highlighting the
importance of responsible data practices and societal impact. With the growing influence of data-
driven decision-making, ethical frameworks and guidelines play a crucial role in ensuring
fairness, transparency, and accountability in data use. Lastly, the abstract looks towards the
future of the field, exploring emerging trends such as explainable AI, automated machine
learning, and interdisciplinary collaborations. These trends promise to drive further innovation
and address complex challenges, shaping the trajectory of data science and analytics in the years
to come. Through a synthesis of theoretical concepts, practical applications, and prospects, this
abstract offers a holistic understanding of the transformative potential of data science and
analytics in today's data-centric world.
Bhubaneswar
Date: March 23, 2024, Pratyush Mohanty
ACKNOWLEDGEMENTS
I would like to express my special thanks of gratitude to my supervisor (Mr. Krusna
Lenka Sir) as well as our head of the department who gave me the golden opportunity to
do this wonderful seminar on the topic, “Data Science in Industry”, which also helped me
in doing a lot of research and I came to know about so many new things. I am thankful to
all the faculty members of our department who helped us get to know Data Science in
Industry better.
Place: Bhubaneswar
Date: March 23,2024 Pratyush Mohanty
Introduction
Overview of Data Science in Industry
1.1 What is Data Science?
Data science is a multidisciplinary field that combines statistics, computer
science, domain expertise, and data analysis techniques to extract valuable
insights from structured and unstructured data. At its core, data science
focuses on transforming raw data into actionable insights that can drive
decision-making across various industries.
The Early Stages: In the early 2000s, data science was often limited
to statistical modeling and business intelligence (BI) tools, primarily
used for reporting and descriptive analytics. Companies collected
historical data and used it for simple dashboards and reports to
understand business performance retrospectively.
The Rise of Big Data (2010s): As the volume, variety, and velocity
of data increased, industries began adopting big data technologies to
handle the exponential growth in data. Technologies like Hadoop,
Apache Spark, and cloud computing platforms enabled companies to
store and process vast amounts of data in real-time. This led to the rise
of predictive and prescriptive analytics, with data science models being
applied to forecast trends, optimize processes, and automate decision-
making.
Modern Era (2020s): Today, data science plays an integral role in
industries such as manufacturing, healthcare, retail, finance, and more.
With advancements in artificial intelligence (AI) and machine learning,
companies are moving towards more sophisticated data-driven
strategies. These include predictive maintenance in manufacturing,
personalized medicine in healthcare, fraud detection in finance, and
autonomous vehicles in transportation.
Big Data Technologies: Tools like Apache Hadoop and Apache Spark
are critical for processing large datasets. Cloud platforms like AWS,
Google Cloud, and Microsoft Azure provide scalable computing power
and storage solutions.
Machine Learning: Algorithms such as decision trees, random
forests, neural networks, and support vector machines (SVM) help
identify patterns, make predictions, and optimize decision-making
processes.
Data Visualization Tools: Tools such as Tableau, Power BI, and
Python libraries like Matplotlib and Seaborn allow businesses to create
intuitive visualizations that help non-technical stakeholders understand
complex data.
One of the primary reasons data sciences is crucial to modern business is its
ability to enable data-driven decision-making. In today’s highly competitive
and fast-paced global market, businesses that base decisions on data rather
than intuition or guesswork are more likely to succeed.
Real-Time Insights: Data science provides businesses with real-time
insights into customer behavior, market trends, and operational
inefficiencies. Companies can leverage these insights to make
informed decisions quickly, allowing them to react to changing market
conditions more effectively.
Predictive Capabilities: With predictive analytics, businesses can
forecast future trends, customer needs, and potential risks. For
instance, retailers use predictive models to anticipate demand spikes,
while manufacturers use predictive maintenance to avoid equipment
breakdowns and reduce downtime.
The rise of the Industrial Internet of Things (IIoT) has greatly facilitated
predictive maintenance by enabling machines to be embedded with sensors
that continuously capture and transmit data, such as temperature, vibration,
pressure, and speed. By leveraging data science techniques like Random
Forests and Neural Networks, industries can harness this data to predict
when and where maintenance should be conducted.
General Electric (GE) is a global industrial leader that has pioneered the use
of predictive maintenance across its business units, particularly in aviation
and energy. GE's aviation division focuses on the maintenance of jet engines,
while its energy division applies predictive maintenance to gas turbines used
in power generation.
Impact:
Algorithms Used:
2.2 Caterpillar
Impact:
Caterpillar has reduced downtime by over 20% by using predictive
analytics to schedule maintenance at optimal times.
Maintenance costs have decreased as machinery components are
serviced just before they wear out, avoiding unnecessary
replacements.
The company has also extended the lifespan of its heavy equipment,
leading to long-term cost savings for customers.
Algorithms Used:
Healthcare Industry
Personalized Medicine in Healthcare
1. Introduction to Precision Health
Advantages in Healthcare:
Advantages in Healthcare:
Watson for Oncology is one of the key offerings of IBM Watson Health and
is used to assist oncologists in developing personalized cancer treatment
plans. The AI system integrates data from medical literature, clinical trials,
and individual patient records, helping doctors identify treatments based on
the latest evidence and tailored to a patient's specific genomic profile.
How it Works:
Results:
How it Works:
Results:
Retail Industry
Demand Forecasting
In retail, Time Series Analysis helps predict future sales based on past
trends. Retailers can analyze daily, weekly, or monthly sales patterns,
account for seasonality, and adjust inventory or promotional activities
accordingly.
ARIMA is one of the most widely used algorithms for time series forecasting.
It models future values of a time series as a linear function of past values
(autoregressive component), the differences between consecutive
observations (integrated component), and the moving average of past
forecast errors (moving average component).
Advantages of LSTM:
Results:
Customer Personalization
Advantages:
3. Case Studies
3.1 Netflix
Netflix, a global streaming service, relies heavily on customer
personalization to recommend content to its users. Netflix’s recommendation
engine is designed to predict what shows or movies a user is likely to enjoy
based on their past viewing history, preferences, and behavior.
Results:
3.2 Alibaba
Results:
Price Optimization
Data-driven pricing models leverage historical sales data, market trends, and
competitor pricing to forecast the impact of price changes on demand. These
models can identify the most profitable price points for different customer
segments and adjust prices in real time.
Walmart, one of the largest global retailers, uses data-driven pricing models
to optimize prices across its extensive product range. Walmart collects
massive amounts of sales data, competitor pricing information, and
customer behavior data to make real-time pricing decisions.
Results:
Finance Industry
Fraud Detection
Results:
HFT firms analyze large sets of historical and real-time market data using
statistical and machine learning models. These models detect price trends,
market inefficiencies, and arbitrage opportunities that can be exploited for
profit. The key advantage of HFT lies in its ability to process vast amounts of
data and execute trades in real-time, often before human traders can react
to market changes.
2. Techniques in Algorithmic Trading
Results:
Consistent Profits: Renaissance Technologies’ algorithmic trading
strategies have allowed it to consistently generate profits, even in
highly volatile markets.
Data-Driven Decisions: By relying on data science and machine
learning, Renaissance Technologies can execute trades based on data-
driven insights, reducing reliance on human intuition.
Predictive models for credit risk utilize both traditional financial data (such as
loan amounts, income, and credit history) and alternative data (such as
social behavior and purchasing patterns) to predict the likelihood of a
borrower defaulting. These models help lenders make better decisions,
reducing the risk of bad loans and optimizing interest rates based on risk.
Results:
The future of data science is heavily intertwined with the rise of artificial
intelligence (AI) and automation. As industries increasingly rely on data-
driven decision-making, the role of AI and automation in streamlining data
science processes is becoming more critical. Data science tools and
platforms are evolving to automate many aspects of the data pipeline, from
data collection and preprocessing to model selection and deployment.
Data Privacy: The collection and use of vast amounts of personal data
pose significant risks to privacy. Companies need to ensure that
customer data is handled responsibly and in compliance with data
protection regulations like GDPR (General Data Protection Regulation)
and CCPA (California Consumer Privacy Act).
Bias and Fairness: Machine learning models can unintentionally
perpetuate or amplify biases present in the training data. For example,
biased credit scoring models could unfairly disadvantage certain
demographic groups. Addressing bias and ensuring fairness is a crucial
ethical responsibility in data science, particularly in industries like
finance, healthcare, and criminal justice.
Transparency and Explainability: The growing use of complex
models, such as deep learning, has raised concerns about the black-
box nature of AI. Decision-makers need to understand how models
arrive at their predictions, especially in high-stakes applications like
loan approvals, medical diagnoses, or criminal sentencing. Model
transparency and explainability are essential for trust and
accountability.
Data Quality
Data Security
2. Model Interpretability
As machine learning models grow more complex, particularly with the rise of
deep learning and ensemble methods, the challenge of model
interpretability becomes more pronounced. In industries where decision-
makers must trust and understand model predictions—such as finance,
healthcare, and law enforcement—interpretability is critical for regulatory
compliance, transparency, and user confidence.
Interpretability Challenges:
Industries Affected:
1. Predictive Maintenance
2. Healthcare
3. Retail
Companies like Amazon and Netflix use algorithms like ARIMA and
LSTM for accurate demand prediction and recommendation engines,
leading to improved inventory management and personalized
customer experiences.
Walmart leverages machine learning to optimize pricing strategies,
driving profitability and competitiveness.
4. Finance
Challenges