Internship Report
Internship Report
07.06.2023 to 23.06.2023
Submitted by
HARIPRIYA.R (721721104030)
Vision of the Institute
To progress to become a center of excellence in Engineering and
Technology through creative and innovative practices in teaching-learning
and promoting research and development to produce globally competitive
and employable professionals who are psychologically strong and
emotionally balanced with social perception and professional ethics.
Department Mission
To provide quality education and promote research in the field of Electronics
and Communication Engineering and thereby rendering continuous service
to the society by imbibing leadership skills and moral values in the students.
INDEX
4. Photocopy of Certificates
INTRODUCTION:
The industrial training is very useful to acquire knowledge
from the various department of the company and also it is useful
like, management skills and organization the firm. Through the
training creates job opportunity for each of us uses to get job
preference. Therefore the training is useful for the academic and
personally helped to the various things. For many, the process of
recruiting and hiring is a drain on company resources. One
solution: Appeal to tomorrow's staff members when they're
looking for internships, and all you have to do is choose the best
of the bunch when it comes time to hire.
Customers across the world, who would like to start a business in these areas,
will start with the minimum viable product where we act as Technology
partners.
We provide a full range of support to our customer’s journey to their success.
Our commitment towards the work is the feeling of our team’s responsibility
which leads to success. We work on a result-driven approach to measure the
effectiveness of our projects which leads us to learn from our mistakes.
COMPANY PROFILE :
2
About the Training Program
DATA SCIENCE
Weekly Overview of In-Plant / Internship Training Activities
Mathematical Functions,Arithmetic
09.06.202 Friday operations, Sort , Search ,
3 Counting Functions
10.06.202 Saturday Introduction about Pandas
3
Pandas – Series, Data Frames, Basic
12.06.202 Monday Functionality, Reindexing, Iteration,
3 Sorting,
13.06.202 Tuesday Pandas - Working with Text Data
3
3
Detailed Description about Every Modules in
the Training Activity
Day 1 -Introduction about Data Science , AI , ML
5
Day 2 – NumPy introduction, NdArray
NumPy introduction:
NumPy is a Python library used for working with arrays.It also has functions for
working in domain of linear algebra, fourier transform, and matrices.NumPy was
created in 2005 by Travis Oliphant. It is an open source project and you can use
it freely.NumPy stands for Numerical Python. In Python we have lists that serve
the purpose of arrays, but they are slow to process.NumPy aims to provide an
array object that is up to 50x faster than traditional Python lists.The array object
in NumPy is called ndarray, it provides a lot of supporting functions that make
working with ndarray very easy.Arrays are very frequently used in data science,
where speed and resources are very important.
# Creating a rank 1
Array arr =
np.array([1, 2, 3])
print("Array with Rank 1: \n",arr)
6
Array with Rank 1:
[1 2 3]
Array with Rank 2:
[[1 2 3]
[4 5 6]]
Array created using passed
tuple: [1 3 2]
7
Day 3– Mathematical functions,sort,search
Mathematical Functions:
Sorting: Sorting refers to arranging data in a particular format. Sorting algorithm specifies
the way to arrange data in a particular order. Most common orders are in numerical or
lexicographical order. In Numpy, we can perform various sorting operations using the various
functions that are provided in the library like sort, lexsort, argsort etc.
Output :
Searching:
Searching is an operation or a technique that helps finds the place of a given element or value
in the list. Any search is said to be successful or unsuccessful depending upon whether the
element that is being searched is found or not. In Numpy, we can perform various searching
operations using the various functions that are provided in the library like argmax, argmin,
nanaargmax etc.
numpy.argmax() : This function returns indices of the max element of the array in a
particular axis.
# Python Program
illustrating # working of
argmax()
import numpy as
geek # Working on
2D array
array = geek.arange(12).reshape(3, 4)
print("INPUT ARRAY : \n", array)
Output :
9
INPUT ARRAY :
[[ 0 1 2 3]
[ 4
[ 8 5 6 7]
9 10 11]]
Max element : 11
1
Day 4 – Introduction about pandas
Pandas:
Pandas is a Python library used for working with data sets.
The name "Pandas" has a reference to both "Panel Data", and "Python Data
Analysis" and was created by Wes McKinney in 2008.
1
Day 5 – pandas- series,data frames,basic functionality:
Series: Pandas Series is a one-dimensional labeled array capable of
holding data of any type (integer, string, float, python objects, etc.).
The axis labels are collectively called indexes. Pandas Series is nothing
but a column in an excel sheet. Labels need not be unique but must
be a hashable type. The object supports both integer and label-based
indexing and provides a host of methods for performing operations
involving the index.
Example:
import pandas as pd
import numpy as np
# Creating empty
series ser =
pd.Series()
print(ser)
# simple array
data = np.array(['g', 'e', 'e', 'k', 's'])
ser =
pd.Series(data)
Output:
Series([], dtype: float64)
0 g
1 e
2 e
3 k
4 s
dtype: object
1
Pandas DataFrame is a two-dimensional size-mutable, potentially
heterogeneous tabular data structure with labeled axes (rows and
columns). A Data frame is a two-dimensional data structure, i.e., data
is aligned in a tabular fashion in rows and columns. Pandas DataFrame
consists of three principal components, the data, rows, and columns.
Creating a DataFrame:
import pandas as pd
# Calling DataFrame
constructor df =
pd.DataFrame()
print(df)
# list of strings
lst = ['Geeks', 'For', 'Geeks', 'is',
'portal', 'for', 'Geeks']
# Calling DataFrame constructor on
list df = pd.DataFrame(lst)
print(df)
Output:
Empty
DataFrame
Columns: []
Index: []
0
0 Geeks
1 For
2 Geeks
3 is
4 portal
5 for
1
6 Geeks
1
Day 6– Working with text data
Text data
types
There are two ways to store text data in pandas:
Prior to pandas 1.0, object dtype was the only option. This was unfortunate for many reasons:
1. You can accidentally store a mixture of strings and non-strings in an object dtype
array. It’s better to have a dedicated dtype.
2. object dtype breaks dtype-specific operations like DataFrame.select_dtypes(). There
isn’t a clear way to select just text while excluding non-text but still object-dtype
columns.
3. When reading code, the contents of an object dtype array is less clear than 'string'.
Currently, the performance of object dtype arrays of strings and arrays.StringArray are
about the same. We expect future enhancements to significantly increase the performance and
lower the memory overhead of StringArray.
For backwards-compatibility, object dtype remains the default type we infer a list of
strings to
>>>
In [1]: pd.Series(["a", "b", "c"])
Out[1]:
0 a
1 b
2 c
dtype: object
To explicitly request string dtype, specify the dtype
>>>
In [2]: pd.Series(["a", "b", "c"], dtype="string")
Out[2]:
0 a
1 b
2 c
dtype: string
1
Day 7- Aggregations ,group by
Aggregation in data mining is the process of finding, collecting, and
presenting the data in a summarized format to perform statistical analysis of
business schemes or analysis of human patterns. When numerous data is
collected from various datasets, it’s crucial to gather accurate data to
provide significant results. Data aggregation can help in taking prudent
decisions in marketing, finance, pricing the product, etc. Aggregated data
groups are replaced using statistical summaries. Aggregated data being
present in the data warehouse can help one solve rational problems which in
turn can reduce the time strain in solving queries from data sets.
In aggregation process, interval arithmetic operations of fuzzy numbers are used.
The presented FFTA approach can be applied to evaluate the failure probabilities of
those system components, statistical data of which are either inadequate or
unavailable.
The groupby is one of the most frequently used Pandas functions in data
analysis. It is used for grouping the data points (i.e. rows) based on the distinct
The focus of the article is to have an overview of the main categories of clustering
Partitional clustering
Hierarchical clustering
1
Density-based clustering
1
Day 8- introduction to Data analytics
Most companies are collecting loads of data all the time—but, in its
raw form, this data doesn’t really mean anything. This is where
data analytics comes in. Data analytics is the process of analyzing
raw data in order to draw out meaningful, actionable insights, which
are then used to inform and drive smart business decisions.
1
Day 9- Read Data from CSV and JSON, XLSX,
DATABASE
1
Python Data File Formats – Python JSON
JSON stands for JavaScript Object Notation and is an open standard
file format. While it holds attribute-value pairs and array data types,
it uses human- readable text for this. This Python data file format is
language-independent and
2
we can use it in asynchronous browser-server communication. The
extension for a Python JSON file is .json.
Here’s the JSON file in Python we will use for the demo-
{ "ID":
["1","2","3","4","5"],
"Title":["Dog with a Blog","Liv and Maddie","Girl Meets World","KC Undercover","Austin and Ally"],
"Timing":["17:30-18:00","18:00-18:30","18:30-19:00","19:00-19:30","19:30-20:00"],
"Genre":["Comedy","Comedy","Comedy","Comedy","Comedy"],
"Rating":["4.7","6.3","7.2","6.1","6"]
}
We save this as schedule.json on the Desktop.
Python Data File Formats – Python XLS
The extension for an Excel spreadsheet is .xlsx. This proves useful
for data science; we create a workbook with two sheets in Microsoft
Excel.
Sheet 1-
2
DAY 10-Linear Regression:
Linear regression analysis is used to predict the value of a variable based
on the value of another variable. The variable you want to predict is called
the dependent variable. The variable you are using to predict the other
variable's value is called the independent variable.
The concept is to draw a line through all the plotted data points. The line is
positioned in a way that it minimizes the distance to all of the data points.
The red dashed lines represents the distance from the data points to the drawn
mathematical function.
2
DAY 11-Polynomial Regression, Claasification
Polynomial íegíession is a kind of lineaí íegíession in which the íelationship shaíed between the
dependent and independent vaíiables Y and X is modeled as the nth degíee of the polynomial.
ľhis is done to look foí the best way of díawing a line using data points. Keep íeading to know
moíe about polynomial íegíession. Equation of the Polynomial Regíession Model:
Any lineaí equation is a polynomial íegíession that has a degíee of 1. ľhe veíy common and
usual equation used to define the íegíession is;
y = mx+b
In this equation, m is the slope, and b is the y-inteícept. One can easily
wíite this as f(x) = c0 + c1 x wheíe c1 is the slope and the c0 is the y-
inteícept.
Classification:
The Classification algorithm is a Supervised Learning technique that is used to identify the
category of new observations on the basis of training data. In Classification, a program learns
from the given dataset or observations and then classifies new observation into a number of
classes or groups. Such as, Yes or No, 0 or 1, Spam or Not Spam, cat or dog, etc.
Classes can be called as targets/labels or categories.
In classification algorithm, a discrete output function(y) is mapped to input variable(x).
1. y=f(x), where y = categorical output
Binary Classifier:
If the classification problem has only two possible outcomes, then i called as Binary
Classifier.
Examples: YES or NO, MALE or FEMALE, SPAM or NOT SPAM, CAT or DOG, etc.
Multi-class Classifier:
If a classification problem has more than two outcomes, then it is called as
Multi- classClassifier.
Example: Classifications of types of crops, Classification of types of music.
Types of ML Classification Algorithms:
Classification Algorithms can be further divided into the Mainly two category:
o Linear Models
o Logistic Regression
o Support Vector Machines
o Non-linear Models
o K-Nearest Neighbours
o Kernel SVM
o Naïve Bayes
o Decision Tree Classification
o Random Forest Classification
2
DAY 12- Random Forest, Support Vector
Machine Algorithm
2
DAY 13- k-nearest neighbour, k-means
K-Nearest Neighbor(KNN) Algorithm for
Machine Learning
o K-Nearest Neighbour is one of the simplest Machine Learning algorithms based on
Supervised Learning technique.
o K-NN algorithm assumes the similarity between the new case/data and available cases
and put the new case into the category that is most similar to the available categories.
o K-NN algorithm stores all the available data and classifies a new data point based on the
similarity.
This means when new data appears then it can be easily classified into a well suite
category by using K- NN algorithm.
Fitting K-NN classifier to the Training data:
Now we will fit the K-NN classifier to the training data. To do this we will import
the KNeighborsClassifier class of Sklearn Neighbors library. After importing the
class, we will create the Classifier object of the class. The Parameter of this class will
be
o metric='minkowski': This is the default parameter and it decides the distance between the
points.
o p=2: It is equivalent to the standard Euclidean
metric. #Fitting K-NN classifier to the training set
from sklearn.neighbors import KNeighborsClassifier
classifier= KNeighborsClassifier(n_neighbors=5,
metric='minkowski', p=2 ) classifier.fit(x_train, y_train)
output:
Out[10]:
KNeighborsClassifier(algorithm='auto',leaf_size=30,metric='minkowki',met
ric_params=None, n_jobs=None, n_neighbors=5, p=2,
weights='uniform')
2
K means algorithm
K-Means Clustering is an Unsupervised Learning algorithm, which groups the
unlabeled dataset into different clusters. Here K defines the number of pre-
defined clusters that need to be created in the process, as if K=2, there will be
two clusters, and for K=3, there will be three clusters, and so on.
It is an iterative algorithm that divides the unlabeled dataset into k different clusters in such a way
that each dataset belongs only one group that has similar properties.
2
mtp.title('The Elobw Method
Graph') mtp.xlabel('Number of
clusters(k)')
mtp.ylabel('wcss_list')
mtp.show()
Step- 3: Training the K-means algorithm on the training dataset
#training the K-means model on a dataset
kmeans = KMeans(n_clusters=5, init='k-means++',
random_state= 42) y_predict= kmeans.fit_predict(x)
2
DAY 14- decision tree, confusion matrix
1. Information Gain:
Information Gain= Entropy(S)- [(Weighted Avg) *Entropy(each feature)
2. Gini Index:
Gini Index= 1- ∑jPj2
2
Confusion Matrix:
The confusion matrix is a matrix used to determine the performance of the
classification models for a given set of test data. It can only be determined if the true
values for test data are known. The matrix itself can be easily understood, but the
related terminologies may be confusing. Since it shows the errors in the model
performance in the form of a matrix, hence also known as an error matrix
Calculations using Confusion Matrix:
2
DAY 15- NLP(Natural Language Processing)
Natural language processing (NLP) refers to the branch of computer science—
and more specifically, the branch of artificial intelligence or AI—concerned with
giving computers the ability to understand text and spoken words in much the
same way human beings can.
Components of NLP
There are two components of NLP as
Understanding (NLU)
Understanding involves the following
tasks −
It involves −
Text planning − It includes retrieving the relevant content from
knowledge
base.
Sentence planning − It includes choosing required words,
forming meaningful phrases, setting tone of the
sentence.
Text Realization − It is mapping sentence plan into sentence structure.
The NLU is harder than NLG.
3
3
Conclusion & Feedback
My internship in the field of data science has been an
incredible experience that has deepened my
understanding and passion for this rapidly evolving field. I
gained practical exposure and applied theoretical
knowledge to real-world projects, strengthening my skills
in statistical analysis, machine learning algorithms, and
data visualization. Working on challenging projects allowed
me to extract meaningful insights using advanced
analytics techniques. I also learned about the ethical
considerations in data science and developed strong
communication and collaboration skills. This internship has
solidified my career aspirations in data science and
provided a foundation for future endeavors. I am grateful
to all who supported and mentored me throughout this
journey.