0% found this document useful (0 votes)
60 views28 pages

Final Report Sushil

Uploaded by

arpsin786
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
60 views28 pages

Final Report Sushil

Uploaded by

arpsin786
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 28

DEPARTMENT OF COMPUTER SCIENCE ENGINEERING

School of Engineering and Technology


Hemvati Nandan Bahuguna Garhwal University
(A Central University) Srinagar Garhwal, Uttarakhand 249161

An Internship report on
DATA SCIENCE

Submitted By

Sushil Meher
[ Roll No : 21134501032 ]
[ B.Tech (C.S.E) VIIth ]

Under the supervision and guidance of


Dr. Prem Nath
Professor at Dept. of Computer Science & Engineering School

Conducted at ‘UNIFIED MENTOR’

In the partial fulfilment of requirements for the award of Degree in


Bachelor of Technology

1
Session 2024-2025

2
STUDENT DECLARATION

I,Sushil Meher, hereby declare that the industrial training project report on Data
Science at Unified Mentor is my original work. I affirm that I have undertaken
this project with integrity and in accordance with the academic and ethical
standards set forth by Unified Mentor and any relevant professional guidelines.

To the best of my knowledge and belief, the work presented in this report is
authentic, and any contributions or ideas of others are properly cited and
acknowledged. I have not used any sources, texts, or materials without giving
appropriate credit to the authors or sources.

In the event that my work is found to be in violation of academic or ethical


standards, I am willing to accept any consequences or actions deemed appropriate
by Unified Mentor or relevant authorities.

By signing this declaration, I affirm my commitment to the principles of honesty


and integrity in academic and professional endeavors.

3
4
ACKNOWLEDGEMENT

I would like to express my sincere gratitude to Unified Mentor for providing


me with the invaluable opportunity to expand my knowledge and skills during
my internship. Their guidance and support throughout this project have been
instrumental in shaping my understanding and approach to this report.

I am deeply thankful to my parents for their unwavering support and


encouragement during the course of this project. Their patience in helping me
refine my work and eliminate unnecessary elements has been truly appreciated.

A special thanks to my friends, whose assistance in organizing and structuring


my work ensured that the report was well-presented and coherent until its
completion. I would also like to acknowledge Microsoft for developing the
powerful MS Word tool, which significantly contributed to the accuracy and
quality of my work by providing a platform to draft and finalize my report
efficiently.

Finally, I express my heartfelt thanks to the Almighty for granting me the


strength and perseverance to complete this report on time.

5
ABOUT ORGANIZATION: UNIFIED MENTOR

Unified Mentor is a prominent technology firm specializing in cutting-edge IT


solutions and web development services. Established with a vision of innovation
and excellence, Unified Mentor has become a trusted partner for businesses
seeking to navigate the complexities of the digital era.

The organization is committed to providing state-of-the-art solutions across a


range of IT domains, including Web Development, Cyber security, Cloud
Computing, Data Analytics, Internet of Things (IoT), and Application
Development. Unified Mentor emphasizes a client centric approach, tailoring its
services to meet the unique needs and objectives of its partners. Unified Mentor
also undertakes comprehensive capacity-building initiatives aimed at empowering
businesses with robust IT infrastructure and expertise. Through its end-to-end IT
services, the organization ensures the seamless integration of technology into
diverse business operations, enhancing productivity and growth.

The firm’s dedicated team of professionals is driven by a passion for excellence,


consistently achieving high client satisfaction and fostering long-term
collaborations. With a focus on innovation and security, Unified Mentor also
excels in delivering secure solutions that safeguard sensitive data, ensuring
compliance with the latest cyber security standards.

Guided by its mission to empower businesses through innovative IT solutions,


Unified Mentor continues to evolve in response to the rapidly changing digital
landscape, building a future where technology serves as a key enabler of success
and progress.

6
ABSTRACT

Present day computer applications require the representation of huge amounts of


complex knowledge and data in programs and thus require tremendous amounts
of work. Our ability to code the computers falls short of the demand for
applications. If the computers are endowed with the learning ability, then our
burden of coding the machine is eased (or at least reduced). This is particularly
true for developing expert systems where the "bottleneck" is to extract the
expert's knowledge and feed the knowledge to computers. The present-day
computer programs in general (with the exception of some Machine Learning
programs) cannot correct their own errors or improve from past mistakes, or
learn to perform a new task by analogy to a previously seen task. In contrast,
human beings are capable of all the above. Machine Learning will produce
smarter computers capable of all the above intelligent behavior.

The area of Machine Learning deals with the design of programs that can learn
rules from data, adapt to changes, and improve performance with experience. In
addition to being one of the initial dreams of Computer Science, Machine
Learning has become crucial as computers are expected to solve increasingly
complex problems and become more integrated into our daily lives. This is a
hard problem, since making a machine learn from its computational tasks
requires work at several levels, and complexities and ambiguities arise at each of
those levels.

So, here we study how Machine learning takes place, what are the methods,
discuss various Projects (Implemented during Training) applications, present and
future status of machine learning.

7
CONTENTS

1. INTRODUCTION……………………………………………… 1
1.1 TECHNICAL TRAINING PLATFORM…………………. 2
2. DATA SCIENCE WITH AI AND ML………………………… 3
3. HARDWARE AND SOFTWARE REQUIREMENT…………. 4
4. TOOLS…………………………………………………………. 5
5. PYTHON………………………………………………………. 6-8
6. STATISTICS……………………………………………………. 9-11
7. GRAPHS……………………………………………………….. 12-14
8. FINAL PROJECT……………………………………………… 15-18
9. CONCLUSION………………………………………………… 19
10.REFERENCES………………………………………………….. 20
CHAPTER 1: INTRODUCTION

Training in data science with artificial intelligence (AI) and machine learning
(ML) is an exciting and dynamic field that equips individuals with the skills to
extract valuable insights, automate decision-making processes, and unlock the
potential of data. This training encompasses a wide array of knowledge and
practical expertise.

Data science involves collecting, cleaning, and analyzing data to derive


actionable insights. AI and ML are subsets of data science that focus on creating
algorithms and models capable of learning patterns from data and making
predictions or decisions. These technologies are used in diverse applications,
from recommendation systems in e-commerce to predictive maintenance in
manufacturing.

A comprehensive data science, AI, and ML training program typically covers


fundamental concepts such as data manipulation, statistical analysis, and
programming in languages like Python. It delves into advanced topics like deep
learning, natural language processing, and reinforcement learning.

Hands-on experience is vital in this training, involving real-world projects,


model development, and evaluation. Familiarity with popular libraries and tools
like TensorFlow, scikit-learn, and Jupyter notebooks is essential.

Moreover, understanding the ethical implications, such as bias and fairness, is


increasingly crucial in data science with AI and ML. Interdisciplinary skills in
communication and domain knowledge enhance the effectiveness of data-driven
solutions.

1
TECHNICAL TRAINING PLATFORM

VS Code

Visual Studio Code (VS Code) is increasingly important in data science due to
its versatility, ease of use, and extensive extension ecosystem. Data scientists
can leverage VS Code for several critical tasks. It supports various programming
languages commonly used in data science, such as Python and R, making it a
unified environment for coding, data manipulation, and analysis. VS Code's
extensions enable integration with Jupyter notebooks, version control systems,
and data visualization libraries. It offers a streamlined interface for writing code,
running experiments, and collaborating with teams, making it an invaluable tool
for data scientists seeking efficiency and productivity in their workflow.

Jupyter Notebook

Jupyter Notebook is indispensable in data science due to its interactive and


collaborative nature. It provides an interactive environment where data scientists
can blend code, data, visualizations, and narrative explanations seamlessly. This
flexibility is crucial for exploratory data analysis, modeling, and sharing
insights. Jupyter Notebook supports various programming languages, with
Python being the most popular. It facilitates reproducibility by allowing
researchers to document their workflow step by step. Moreover, it's instrumental
in education, enabling instructors to teach data science concepts effectively. Its
ability to create and share interactive reports makes it a vital tool for data
scientists, researchers, and educators across various domains.

Kaggle

Kaggle is a pivotal platform in the data science community, offering several


vital contributions. It hosts data science competitions that push the boundaries of
innovation, allowing practitioners to apply their skills to real-world challenges.
Kaggle Kernels provide a collaborative environment for code sharing and
learning. Datasets and notebooks shared by the community facilitate knowledge
sharing and learning. Its Learn section provides extensive resources and courses
on data science topics. For those entering the field, Kaggle serves as a practical,
hands-on learning playground, while for experienced practitioners, it's a hub for
showcasing expertise and collaborating on impactful projects.

2
CHAPTER 2: DATA SCIENCE WITH AI AND ML

Data Science, Artificial Intelligence (AI), and Machine Learning (ML) are
transformative fields that have revolutionized how businesses, organizations,
and researchers analyze and extract insights from data. In this introduction, we'll
explore the fundamental concepts and their interplay in these domains.

1. Data Science: Data Science is an interdisciplinary field that combines


domain knowledge, statistics, programming, and data analysis to extract
valuable insights and knowledge from data. It involves collecting, cleaning, and
transforming data, followed by the application of various techniques to uncover
patterns, trends, and correlations.

2. Machine Learning (ML): Machine Learning is a subset of AI that focuses


on creating algorithms and models that enable computers to learn from data and
make predictions or decisions without explicit programming. ML algorithms are
categorized into supervised, unsupervised, and reinforcement learning,
depending on the learning process.

3. Artificial Intelligence (AI): Artificial Intelligence is a broader field that


encompasses the development of intelligent agents capable of performing tasks
that typically require human intelligence. AI can involve rule-based systems,
expert systems, natural language processing, and computer vision in addition to
Machine Learning.

1.1 Several Aspects Of Data Science

Before developing a web site once should keep several aspects in mind like:
● Data Collection: Gathering relevant data from various sources, such as
databases, APIs, and sensors.
● Data Cleaning and Preprocessing: Ensuring data is accurate, complete, and
ready for analysis.
● Exploratory Data Analysis (EDA): Examining data visually and statistically
to discover patterns.
● Feature Engineering: Selecting or creating relevant variables for analysis.
● Machine Learning: Building predictive models and making data-driven
decisions.

3
CHAPTER 3: HARDWARE AND SOFTWARE
REQUIREMENT

HARDWARE REQUIRED :

1. Pentium 4, Window XP/Window 7:The minimum system requirements


for learning data science include a Pentium 4 processor, Windows XP or
Windows 7, and 256 MB of RAM. These specifications provide basic
functionality but may limit performance for more advanced data science
tasks.
2. 256 MB RAM : A minimum of 256 MB of RAM is required for basic
data science learning, providing enough memory for running lightweight
applications and simple data processing tasks, though performance may
be constrained for more resource-intensive operations.

SOFTWARE REQUIRED:

1. Windows XP/7: These older operating systems provide a basic


environment for learning data science, though they may have limitations
in running modern software and handling large datasets.
2. VSCode: Visual Studio Code is a lightweight, versatile IDE that supports
multiple programming languages and extensions, making it ideal for data
science development and Python programming.
3. Jupyter Notebook: Jupyter Notebook is an interactive environment for
writing and running Python code, ideal for data analysis, visualization,
and creating reproducible research.
4. Matplotlib: Matplotlib is a powerful Python library for creating static,
animated, and interactive visualizations, commonly used in data science
for plotting charts and graphs.
5. IDE: An Integrated Development Environment (IDE) provides essential
tools like code editing, debugging, and execution, streamlining the coding
workflow for data science tasks.

4
CHAPTER 4: TOOLS

4.1 Introduction

Fundamental tools in data science serve as the cornerstone for various tasks in
data analysis and machine learning. Python, a versatile programming language,
is the linchpin of the data science toolkit. It's complemented by Jupyter
Notebook, an interactive environment perfect for data exploration and
documentation. Pandas, a robust library, takes center stage in data manipulation
and analysis, particularly suited for structured data. Data visualization is
achieved through Matplotlib, a versatile plotting library, and Seaborn, which
simplifies creating appealing statistical graphics. For machine learning
endeavors, Scikit-Learn provides essential algorithms and tools, making it
accessible for beginners and powerful for experts. Version control is essential,
and Git is the industry standard for tracking code changes and collaboration.
These core tools empower data scientists to clean, explore, and analyze data, as
well as develop machine learning models. While more specialized tools may be
necessary for certain projects, these foundational tools remain indispensable and
are the starting point for anyone venturing into the field of data science.

4.2 Features

Data science involves collecting and analyzing data to derive insights,


employing machine learning for predictions, and using visualization to
communicate results.

5
CHAPTER 5:PYTHON

4.1 Introduction

Python is a versatile and widely-used programming language in the field of data


science. Its rich ecosystem of libraries and tools, such as NumPy, Pandas,
Matplotlib, Seaborn, ScikitLearn, and more, make it a popular choice for data
analysis, manipulation, visualization, and machine learning. Python's simplicity
and readability make it accessible to both beginners and experienced data
scientists, enabling them to work with structured and unstructured data, develop
predictive models, and create informative data visualizations. Python's strong
community support, extensive documentation, and active development continue
to drive its prominence as the go-to language for data science projects.

4.2 Why Python?

➢ Rich Ecosystem: Python boasts a vast ecosystem of libraries


and frameworks tailored for data manipulation, analysis, and
machine learning, such as Pandas, NumPy, Matplotlib, and
Scikit-Learn.
➢ Ease of Learning: Python's clean and straightforward syntax
makes it accessible to beginners, allowing for a smooth learning
curve.
➢ Cross-Platform Compatibility: Python runs on various
platforms, making it versatile for different operating systems.
➢ Integration: Python seamlessly integrates with other
languages like C and Java, making it a preferred choice for
integrating data science solutions into existing applications.
➢ Open Source: Python is open-source, making it cost-effective
and allowing for extensive customization and collaboration.

6
Operators

Python supports a variety of operators, including arithmetic (+, -, *, /),


comparison (==, !=,<, >, <=, >=), logical (and, or, not), assignment (=), and
more.

Data Types

Python has several built-in data types, such as:


● int: for integers (e.g., 5)
● float: for floating-point numbers (e.g., 3.14)
● str: for strings (e.g., "Hello, World!")
● bool: for Boolean values (True or False)
● list: for ordered, mutable sequences
● tuple: for ordered, immutable sequences
● dict: for key-value mappings
● set: for unordered collections of unique elements

Variables:

Variables are used to store data. In Python, you can create a variable by
assigning a value to a name, like x = 10.

Conditional Statements:

Conditional statements allow you to make decisions in your code using if, elif
(else if), and else.

7
For example:
if x > 10:
print("x is greater than 10")
elif x == 10:
print("x is equal to 10")
else:
print("x is less than 10")

Loops:
Python supports for and while loops. For example:
for i in range(5):
print(x) x += 1
Functions:
Functions allow you to group code into reusable blocks. You can define a
function using the
def keyword. For example:
def greet(name):
return f"Hello, {name}!"
message = greet("Alice")

print(message)

8
CHAPTER 6: STATISTICS

6.1 What is Statistics?

Statistics is a branch of mathematics focused on collecting, organizing,


analyzing,interpreting, and presenting data. It encompasses a range of
techniques for summarizing data, assessing relationships, and drawing
meaningful conclusions from information.

Statistics plays a vital role in various fields, including science, economics, and
social sciences, enabling researchers and analysts to make informed decisions,
test hypotheses, and build predictive models based on empirical evidence and
data patterns. In data science,statistics forms the basis for deriving actionable
insights from large datasets.

6.2 Role of Statistics


➢ Descriptive Statistics: Summarizes data using measures like mean, median,
and
variance.
➢ Inferential Statistics: Makes predictions and inferences about populations
based on
samples.
➢ Probability Distributions: Models data characteristics using distributions
like normal
and binomial.
➢ Hypothesis Testing: Determines if observed differences are statistically
significant.
➢ Regression Analysis: Models relationships between variables and makes
predictions.
➢ ANOVA: Compares group means, useful in experiments.
➢ Non-parametric Statistics: Suitable for non-standard data distributions.

9
➢ Bayesian Statistics (Optional): Deals with uncertainty and probabilistic
modeling.
➢ Time Series Analysis (If applicable): Models and forecasts time-dependent
data.
➢ Statistical Software: Proficiency in R or Python for analysis.

6.3 Descriptive Statistics

Mode

It is a number which occurs most frequently in the data series.It is robust and is
not generally affected much by addition of a couple of new values.

Code
import pandas as pd
data=pd.read_csv( "Mode.csv") //reads data from csv file
data.head() //print first five lines
mode_data=data['Subject'].mode() //to take mode of subject column
print(mode_data)

Mean
import pandas as pd
data=pd.read_csv( "mean.csv") //reads data from csv file
data.head() //print first five lines
mean_data=data[Overallmarks].mean() //to take mode of subject column
print(mean_data)

10
Median
Absolute central value of data set.
import pandas as pd
data=pd.read_csv( "data.csv") //reads data from csv file
data.head() //print first five lines
median_data=data[Overallmarks].median() //to take mode of subject column
print(median_data)

5.4: Probability Distribution

In data science, a probability distribution is a mathematical representation of


how likely different values or outcomes are in a dataset or random process.
These distributions describe the inherent uncertainty in data and help make data-
driven decisions. Common distributions include the normal, binomial, and
Poisson distributions, each with its unique characteristics.

Parameters, such as mean and standard deviation, define these distributions'


shapes. Data scientists use probability distributions to model data, conduct
hypothesis testing, make predictions, and simulate scenarios. The Central Limit
Theorem is crucial, as it asserts that the sample mean of sufficiently large
samples follows a normal distribution, underpinning many statistical techniques
and analyses in data science. Understanding probability distributions is essential
for harnessing the power of data.

11
CHAPTER 7: GRAPHS

6.1 What is Graph?

Graphs in data science refer to visual representations of data that help analysts
and data scientists better understand patterns, relationships, and insights within
datasets. They are powerful tools for data exploration, communication, and
analysis.

6.2 Types of Graphs Here are some common types of graphs used in data
science:

➢ Bar Charts: Bar charts are used to display and compare categorical data.
They represent categories on one axis and the corresponding values on the other,
typically using vertical or horizontal bars.

12
➢ Histograms: Histograms are used to visualize the distribution of continuous
data. They group data into bins and display the frequency or density of values
within each bin.

Scatter Plots:

Scatter plots show individual data points as dots on a two-dimensional plane.


They are used to visualize the relationship between two continuous variables
and identify patterns or correlations.

13
➢ Line Charts:

Line charts display data points as connected lines, often used to show trends or
changes in data over time.

➢ Pie Charts:Pie charts represent parts of a whole, where each slice


corresponds to a percentage of the total. They are used to visualize composition
of a dataset.set.

FINAL PROJECT

14
SNAPSHOT :

Reading the CSV file and Mean and Standard deviation:

Splitting data into training and testing sets:

Helper Function :

15
Gradient Descent and Normal function :

Regularization Parameter :

16
Output :

17
Output :

CONCLUSION

18
In conclusion, training in data science equips individuals with valuable skills to
extract insights and make informed decisions from data. It encompasses a wide
range of topics, including data collection, cleaning, analysis, machine learning,
and data visualization. A well-rounded data science education often involves
learning programming languages like Python, mastering essential libraries and
tools, and gaining expertise in statistical and machine learning techniques.

Data science training also emphasizes the importance of critical thinking and
problem-solving, as well as effective communication of findings to stakeholders.
As the demand for data-driven insights continues to grow across industries, data
science training provides a pathway to exciting and rewarding career
opportunities.

Ultimately, data science training is an ongoing journey, as the field evolves


rapidly with emerging technologies and new data challenges. Continuous
learning and staying up-to-date with the latest developments are key to success
in this dynamic and high-demand profession.

REFERENCES

19
➢ https://www.kaggle.com/learn/overview
➢ https://www.edx.org/micromasters/data-science
➢ https://www.fast.ai/
➢ https://towardsdatascience.com/
➢ https://www.youtube.com/user/joshstarmer
➢ https://github.com/josephmisiti/awesome-machine-
learning
➢https://github.com/campusx-official/book-
recommender-system/commit/
678c7ab5a67adfcafaadf5b2924e4d04acafe9ac#diff5983
284b94671de74632c367234334917d7e2de10e4be9c255
afb37e33f5352e
➢ https://www.youtube.com/user/sentdex
➢ https://www.coursera.org/specializations/deep-learning
➢https://github.com/ChristosChristofidis/awesome-deep-
learning
➢ https://youtu.be/1YoD0fg3_EM?feature=shared

20

You might also like