Final Report Sushil
Final Report Sushil
An Internship report on
DATA SCIENCE
Submitted By
Sushil Meher
[ Roll No : 21134501032 ]
[ B.Tech (C.S.E) VIIth ]
1
Session 2024-2025
2
STUDENT DECLARATION
I,Sushil Meher, hereby declare that the industrial training project report on Data
Science at Unified Mentor is my original work. I affirm that I have undertaken
this project with integrity and in accordance with the academic and ethical
standards set forth by Unified Mentor and any relevant professional guidelines.
To the best of my knowledge and belief, the work presented in this report is
authentic, and any contributions or ideas of others are properly cited and
acknowledged. I have not used any sources, texts, or materials without giving
appropriate credit to the authors or sources.
3
4
ACKNOWLEDGEMENT
5
ABOUT ORGANIZATION: UNIFIED MENTOR
6
ABSTRACT
The area of Machine Learning deals with the design of programs that can learn
rules from data, adapt to changes, and improve performance with experience. In
addition to being one of the initial dreams of Computer Science, Machine
Learning has become crucial as computers are expected to solve increasingly
complex problems and become more integrated into our daily lives. This is a
hard problem, since making a machine learn from its computational tasks
requires work at several levels, and complexities and ambiguities arise at each of
those levels.
So, here we study how Machine learning takes place, what are the methods,
discuss various Projects (Implemented during Training) applications, present and
future status of machine learning.
7
CONTENTS
1. INTRODUCTION……………………………………………… 1
1.1 TECHNICAL TRAINING PLATFORM…………………. 2
2. DATA SCIENCE WITH AI AND ML………………………… 3
3. HARDWARE AND SOFTWARE REQUIREMENT…………. 4
4. TOOLS…………………………………………………………. 5
5. PYTHON………………………………………………………. 6-8
6. STATISTICS……………………………………………………. 9-11
7. GRAPHS……………………………………………………….. 12-14
8. FINAL PROJECT……………………………………………… 15-18
9. CONCLUSION………………………………………………… 19
10.REFERENCES………………………………………………….. 20
CHAPTER 1: INTRODUCTION
Training in data science with artificial intelligence (AI) and machine learning
(ML) is an exciting and dynamic field that equips individuals with the skills to
extract valuable insights, automate decision-making processes, and unlock the
potential of data. This training encompasses a wide array of knowledge and
practical expertise.
1
TECHNICAL TRAINING PLATFORM
VS Code
Visual Studio Code (VS Code) is increasingly important in data science due to
its versatility, ease of use, and extensive extension ecosystem. Data scientists
can leverage VS Code for several critical tasks. It supports various programming
languages commonly used in data science, such as Python and R, making it a
unified environment for coding, data manipulation, and analysis. VS Code's
extensions enable integration with Jupyter notebooks, version control systems,
and data visualization libraries. It offers a streamlined interface for writing code,
running experiments, and collaborating with teams, making it an invaluable tool
for data scientists seeking efficiency and productivity in their workflow.
Jupyter Notebook
Kaggle
2
CHAPTER 2: DATA SCIENCE WITH AI AND ML
Data Science, Artificial Intelligence (AI), and Machine Learning (ML) are
transformative fields that have revolutionized how businesses, organizations,
and researchers analyze and extract insights from data. In this introduction, we'll
explore the fundamental concepts and their interplay in these domains.
Before developing a web site once should keep several aspects in mind like:
● Data Collection: Gathering relevant data from various sources, such as
databases, APIs, and sensors.
● Data Cleaning and Preprocessing: Ensuring data is accurate, complete, and
ready for analysis.
● Exploratory Data Analysis (EDA): Examining data visually and statistically
to discover patterns.
● Feature Engineering: Selecting or creating relevant variables for analysis.
● Machine Learning: Building predictive models and making data-driven
decisions.
3
CHAPTER 3: HARDWARE AND SOFTWARE
REQUIREMENT
HARDWARE REQUIRED :
SOFTWARE REQUIRED:
4
CHAPTER 4: TOOLS
4.1 Introduction
Fundamental tools in data science serve as the cornerstone for various tasks in
data analysis and machine learning. Python, a versatile programming language,
is the linchpin of the data science toolkit. It's complemented by Jupyter
Notebook, an interactive environment perfect for data exploration and
documentation. Pandas, a robust library, takes center stage in data manipulation
and analysis, particularly suited for structured data. Data visualization is
achieved through Matplotlib, a versatile plotting library, and Seaborn, which
simplifies creating appealing statistical graphics. For machine learning
endeavors, Scikit-Learn provides essential algorithms and tools, making it
accessible for beginners and powerful for experts. Version control is essential,
and Git is the industry standard for tracking code changes and collaboration.
These core tools empower data scientists to clean, explore, and analyze data, as
well as develop machine learning models. While more specialized tools may be
necessary for certain projects, these foundational tools remain indispensable and
are the starting point for anyone venturing into the field of data science.
4.2 Features
5
CHAPTER 5:PYTHON
4.1 Introduction
6
Operators
Data Types
Variables:
Variables are used to store data. In Python, you can create a variable by
assigning a value to a name, like x = 10.
Conditional Statements:
Conditional statements allow you to make decisions in your code using if, elif
(else if), and else.
7
For example:
if x > 10:
print("x is greater than 10")
elif x == 10:
print("x is equal to 10")
else:
print("x is less than 10")
Loops:
Python supports for and while loops. For example:
for i in range(5):
print(x) x += 1
Functions:
Functions allow you to group code into reusable blocks. You can define a
function using the
def keyword. For example:
def greet(name):
return f"Hello, {name}!"
message = greet("Alice")
print(message)
8
CHAPTER 6: STATISTICS
Statistics plays a vital role in various fields, including science, economics, and
social sciences, enabling researchers and analysts to make informed decisions,
test hypotheses, and build predictive models based on empirical evidence and
data patterns. In data science,statistics forms the basis for deriving actionable
insights from large datasets.
9
➢ Bayesian Statistics (Optional): Deals with uncertainty and probabilistic
modeling.
➢ Time Series Analysis (If applicable): Models and forecasts time-dependent
data.
➢ Statistical Software: Proficiency in R or Python for analysis.
Mode
It is a number which occurs most frequently in the data series.It is robust and is
not generally affected much by addition of a couple of new values.
Code
import pandas as pd
data=pd.read_csv( "Mode.csv") //reads data from csv file
data.head() //print first five lines
mode_data=data['Subject'].mode() //to take mode of subject column
print(mode_data)
Mean
import pandas as pd
data=pd.read_csv( "mean.csv") //reads data from csv file
data.head() //print first five lines
mean_data=data[Overallmarks].mean() //to take mode of subject column
print(mean_data)
10
Median
Absolute central value of data set.
import pandas as pd
data=pd.read_csv( "data.csv") //reads data from csv file
data.head() //print first five lines
median_data=data[Overallmarks].median() //to take mode of subject column
print(median_data)
11
CHAPTER 7: GRAPHS
Graphs in data science refer to visual representations of data that help analysts
and data scientists better understand patterns, relationships, and insights within
datasets. They are powerful tools for data exploration, communication, and
analysis.
6.2 Types of Graphs Here are some common types of graphs used in data
science:
➢ Bar Charts: Bar charts are used to display and compare categorical data.
They represent categories on one axis and the corresponding values on the other,
typically using vertical or horizontal bars.
12
➢ Histograms: Histograms are used to visualize the distribution of continuous
data. They group data into bins and display the frequency or density of values
within each bin.
Scatter Plots:
13
➢ Line Charts:
Line charts display data points as connected lines, often used to show trends or
changes in data over time.
FINAL PROJECT
14
SNAPSHOT :
Helper Function :
15
Gradient Descent and Normal function :
Regularization Parameter :
16
Output :
17
Output :
CONCLUSION
18
In conclusion, training in data science equips individuals with valuable skills to
extract insights and make informed decisions from data. It encompasses a wide
range of topics, including data collection, cleaning, analysis, machine learning,
and data visualization. A well-rounded data science education often involves
learning programming languages like Python, mastering essential libraries and
tools, and gaining expertise in statistical and machine learning techniques.
Data science training also emphasizes the importance of critical thinking and
problem-solving, as well as effective communication of findings to stakeholders.
As the demand for data-driven insights continues to grow across industries, data
science training provides a pathway to exciting and rewarding career
opportunities.
REFERENCES
19
➢ https://www.kaggle.com/learn/overview
➢ https://www.edx.org/micromasters/data-science
➢ https://www.fast.ai/
➢ https://towardsdatascience.com/
➢ https://www.youtube.com/user/joshstarmer
➢ https://github.com/josephmisiti/awesome-machine-
learning
➢https://github.com/campusx-official/book-
recommender-system/commit/
678c7ab5a67adfcafaadf5b2924e4d04acafe9ac#diff5983
284b94671de74632c367234334917d7e2de10e4be9c255
afb37e33f5352e
➢ https://www.youtube.com/user/sentdex
➢ https://www.coursera.org/specializations/deep-learning
➢https://github.com/ChristosChristofidis/awesome-deep-
learning
➢ https://youtu.be/1YoD0fg3_EM?feature=shared
20