0% found this document useful (0 votes)
35 views36 pages

Report

Uploaded by

Aastha Dewangan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views36 pages

Report

Uploaded by

Aastha Dewangan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 36

An

Internship Report
On

Artificial Intelligence Virtual Internship


Submitted To
Bhilai Institute of Technology, Durg
in partial fulfillment of requirement for the award of degree
of
Bachelor of Technology
in

Information Technology

By

Abhishek Dewangan

(300103322301)

BHILAI INSTITUTE OF TECHNOLOGY DURG,


CHHATTISGARH (INDIA)
DEPARTMENT OF INFORMATION TECHNOLOGY

Session: 2021-2025
DECLARATION BY THE CANDIDATE

I hereby declare that the Industrial Internship report entitled “Cognizant- Artificial
Intelligence” submitted by me to Bhilai Institute of Technology, Durg in partial
fulfilment of the requirement for the award of the degree of Bachelor of Technology in
Information Technology Engineering is a record of bonafide industrial training
undertaken by me under the supervision of Smith, Cognizant. I further declare that the
work reported in this report has not been submitted and will not be submitted, either
in part or in full, for the award of any other degree or diploma in this institute or any
other institute or university.

Signature of the student:


Name of student: Abhishek Dewangan
University roll no:300103322301
Enrollment no.:BH8068
ACKNOWLEDGEMENT

I would like to express my sincere gratitude and appreciation to everyone who has
helped me during my internship. First and foremost, I would like to thank Smith for
providing me with the opportunity to intern at Cognizant. Their support, guidance,
and encouragement have been instrumental in my learning and growth.

I would also like to thank my colleagues, who have welcomed me with open arms and
have been incredibly supportive throughout my internship. Their willingness to share
their knowledge and expertise has been invaluable.

Furthermore, I would like to express my appreciation to Smith, who has been my


mentor during my internship. Their feedback and constructive criticism have helped
me improve my skills and work more efficiently.

Finally, I would like to thank my family and friends for their unwavering support and
encouragement throughout my internship. Their constant motivation has helped me
stay focused and achieve my goals.

Name of student: Abhishek Dewangan

University roll no: 300103322301

Enrollment no.: BH8068


ABSTRACT

The AI Cognizant Virtual Internship plays a crucial role in bridging the gap between
academic knowledge and practical industry application, particularly for students and
professionals aspiring to enter the field of Artificial Intelligence. Traditional education
often emphasizes theoretical concepts, which, while essential, do not always equip
individuals with the hands-on skills needed in the industry. This internship addresses this
gap by providing participants with real-world projects that allow them to apply AI
techniques and tools to solve practical problems, deepening their understanding and
enhancing their practical skill set.

A significant benefit of the internship is the opportunity for participants to build a portfolio
of projects that demonstrates their ability to implement AI concepts in real-world
scenarios. This portfolio serves as a tangible asset in the competitive job market, setting
participants apart from others who may lack practical experience. Employers value this
hands-on experience, as it shows that candidates not only possess theoretical knowledge
but also the capability to apply it effectively in professional settings.

In addition to technical skills, the internship helps participants develop essential


competencies such as project management, teamwork, and communication. By engaging
with real-world challenges, participants gain insights into the practical application of AI
across various industries, while also honing their ability to collaborate and communicate
effectively. This comprehensive experience makes them well-rounded candidates who are
better prepared to meet the demands of the AI industry, significantly boosting their
competitiveness in the job market.
TABLE OF CONTENTS

S.NO. TITLE PAGE NO.


1 About the organization 1-3

2 Application of the gained 4-5


knowledge in/during the
training

3 Comparison of competency 6-7


levels before and after the
training

4 Learnt during training 8-24

5 Objectives 25

6 Technologies used 26-27

7 Purpose and importance 28

8 Area and scope 29

References---------------------------------------------------------------------------------------- 35

Conclusion---------------------------------------------------------------------------------------- 36

ABOUT THE ORGANIZATION

Cognizant is a prominent global IT services and consulting company, renowned for its
ability to help organizations navigate and thrive in the digital era. Founded in 1994 as an
in-house technology unit of Dun & Bradstreet, Cognizant has since evolved into a Fortune
500 company, listed among the world’s most admired and fastest-growing firms. The
company’s rapid ascent in the tech industry can be attributed to its strong focus on
innovation, customer-centricity, and its ability to leverage emerging technologies to deliver
measurable business outcomes.

With its headquarters in Teaneck, New Jersey, Cognizant operates in over 40 countries,
employing more than 300,000 professionals worldwide. The company’s global presence
and vast workforce enable it to serve a diverse client base that spans various industries,
including healthcare, financial services, insurance, manufacturing, retail, and
communications. This extensive industry expertise allows Cognizant to offer tailored
solutions that address the specific challenges and opportunities within each sector.

Cognizant’s service offerings are comprehensive, encompassing digital strategy, technology


consulting, IT infrastructure, and business process services. The company is particularly
well-known for its digital transformation initiatives, where it partners with clients to
modernize their operations, enhance their customer experiences, and drive innovation.
This is achieved through a deep understanding of the client’s business combined with
expertise in key technologies such as artificial intelligence (AI), machine learning, cloud
computing, big data analytics, Internet of Things (IoT), and cybersecurity.

One of Cognizant’s core strengths lies in its ability to integrate digital solutions with
traditional IT systems, creating seamless and scalable platforms that enable businesses to
operate more efficiently and competitively. The company’s focus on digital transformation
is not just about adopting new technologies but also about reimagining business models,
optimizing processes, and fostering a culture of continuous innovation.
Cognizant’s client-centric approach is a cornerstone of its business philosophy. The
company emphasizes close collaboration with clients, aiming to understand their unique
challenges and deliver solutions that are not only effective but also aligned with their
strategic goals. This commitment to client success has earned Cognizant a reputation for
reliability and excellence, leading to long-term partnerships with many of the world’s
leading organizations.

Beyond its commercial success, Cognizant is also dedicated to corporate social


responsibility (CSR) and sustainability. The company’s CSR initiatives focus on areas such
as education, workforce development, environmental sustainability, and community
engagement. Through its Cognizant Foundation and other philanthropic efforts, the
company supports numerous programs aimed at promoting digital literacy, STEM
education, and social equity. Cognizant’s environmental initiatives are designed to reduce
the company’s carbon footprint and promote sustainable business practices across its
global operations.

Cognizant also places a strong emphasis on ethical business conduct and governance. The
company adheres to strict ethical standards in all its operations and is committed to
maintaining transparency and accountability in its business practices. This ethical
framework not only strengthens Cognizant’s corporate reputation but also ensures that it
operates in a manner that is responsible and respectful of all its stakeholders.
APPLICATION OF THE GAINED KNOWLEDGE IN/DURING THE
TRAINING

● One of the foundational areas covered is Machine Learning Algorithms,


understand and implement various techniques such as supervised learning,
unsupervised learning, regression, and classification. These algorithms form the
backbone of AI, enabling systems to learn from data, make predictions, and improve
over time. Participants gain hands-on experience in applying these algorithms to
different types of data, understanding how to select the appropriate method based
on the problem at hand.

● Data Preprocessing is another critical area of focus during the training.


Participants learn techniques for cleaning, normalizing, and transforming raw data,
which is a crucial step before feeding it into machine learning models. This involves
handling missing values, encoding categorical variables, scaling numerical features,
and more. Effective data preprocessing is vital because the quality of data directly
impacts the performance of AI models. By mastering these techniques, participants
ensure that the data used for modeling is in its most optimal form.

● In addition to data preparation, Model Evaluation is a key component of the


training. Participants learn various methods to assess the performance of machine
learning models, using metrics such as accuracy, precision, recall, and F1-score.
Understanding these metrics allows participants to critically evaluate the
effectiveness of their models and make informed decisions about improvements.
This evaluation process is crucial for ensuring that the models not only perform
well on training data but also generalize effectively to new, unseen data.
● Python Programming is also a significant focus of the training, as Python is one of
the most widely used programming languages in AI. Participants learn to write
efficient and optimized code, utilizing powerful libraries such as NumPy, pandas,
and scikit-learn. These libraries are essential for tasks ranging from numerical
computations and data manipulation to implementing machine learning algorithms.
The training emphasizes best practices in Python programming, ensuring that
participants can develop scalable and maintainable AI applications.

● Finally, Project Management skills are integrated into the training, teaching
participants how to develop and manage AI projects from inception to deployment.
This includes planning the project timeline, managing resources, collaborating with
team members, and adhering to quality standards. Effective project management is
crucial in ensuring that AI projects are completed on time, within scope, and meet
the desired objectives. By gaining these skills, participants are better prepared to
take on leadership roles in AI projects, ensuring their successful execution in a real-
world setting.
COMPARISON OF COMPETENCY LEVELS BEFORE AND AFTER THE
TRAINING

Comparing competency levels before and after the data analyst internship program can
effectively highlight the growth and development achieved during the training. Below are some
key areas where this comparison can be made:

● Data Analyst Knowledge: Prior to the training, my understanding of data analysis was
primarily confined to the basic concepts covered in the academic curriculum. However,
after completing the program, I have gained a much deeper understanding of Python, its
functionalities, and how to effectively implement and develop applications using the
platform.

● Technical Skills: Before the training, my experience with programming languages and
tools used in data analysis, such as Python, Power BI, Tableau, and R, was limited. After
the internship, I have become proficient in these technologies and have acquired hands-
on experience in developing data analysis applications.

● Problem-Solving Skills: The internship involved tackling real-world problems and


challenges commonly encountered in data analysis. Before the training, my experience
with solving complex problems was limited. Now, I am much more skilled at problem-
solving and have developed a stronger understanding of how to approach and resolve
complex issues.

● Collaboration and Communication Skills: In real-world data analysis projects,


collaboration and communication skills are crucial. Before the training, my experience in
working within a team and effectively communicating was limited. After completing the
internship, I am now confident in my ability to collaborate with others and communicate
ideas and solutions effectively.
LEARNT DURING TRAINING

Exploratory Data Analysis

import pandas as pd

import seaborn as sns

df = pd.read_csv('/content/sample_sales_data

(1).csv') df.drop(columns=['Unnamed: 0'],

inplace=True) df.head()

Index(['transaction_id', 'timestamp', 'product_id', 'category',

'customer_type', 'unit_price', 'quantity', 'total', 'payment_type'],

dtype='object')

Distributions and Plots

def plot_continuous_distribution(data: pd.DataFrame = None, column: str =


None, height: int = 8):

_ = sns.displot(data, x=column, kde=True, height=height,


aspect=height/5).set(title=f'Distribution of {column}');

def get_unique_values(data, column):

num_unique_values = len(data[column].unique()) value_counts =

data[column].value_counts() print(f"Column: {column} has

{num_unique_values} unique values\n") print(value_counts)

def plot_categorical_distribution(data: pd.DataFrame = None, column: str =


None, height: int = 8, aspect: int = 2):

_ = sns.catplot(data=data, x=column, kind='count', height=height,


aspect=aspect).set(title=f'Distribution of {column}');
def correlation_plot(data: pd.DataFrame = None):

corr = df.corr()

corr.style.background_gradient(cmap='coolwarm')

plot_continuous_distribution(df, 'unit_price')

plot_continuous_distribution(df, 'quantity')

plot_continuous_distribution(df, 'total')

Handling Missing Data


Start by examining your dataset for any missing values. It's essential to identify where data might
be missing and decide how to handle it. Depending on the situation, you might drop rows or
columns with missing data or fill in the gaps with appropriate values, such as the mean or median
for numerical data or the mode for categorical data.

Outlier Detection
Outliers are data points that deviate significantly from other observations and can potentially
distort your analysis. You can identify outliers by visualizing the data or calculating statistical
measures. Once detected, you can decide whether to remove them, transform them, or keep them
based on their relevance to your analysis.

Feature Engineering
This involves creating new features or modifying existing ones to better capture the underlying
patterns in your data. For instance, you might derive new features such as the day of the week or
month from a timestamp or create interaction terms between existing variables to enhance
predictive models.

Categorical Data Analysis


For categorical variables, it's essential to explore their distribution and relationship with other
variables. Visualizing how different categories are distributed or how they relate to continuous
variables can reveal important insights. Additionally, encoding categorical data into numerical form
can be a critical step if you plan to use machine learning models.
Correlation Analysis
Exploring correlations between numerical variables helps to understand relationships within the
data. A correlation matrix, often visualized with a heatmap, can quickly show which variables are
positively or negatively correlated. This can be useful for feature selection or identifying potential
multicollinearity in regression models.

Pairwise Plotting
Pairwise plotting (or pair plots) allows you to visualize relationships between multiple pairs of
variables at once. This is particularly useful when exploring interactions and correlations between
variables, helping you spot trends, clusters, and potential outliers in the data.

Data Normalization and Scaling


If your data includes features with varying scales, normalization or scaling might be necessary,
especially before using distance-based algorithms like K-Nearest Neighbors (KNN) or clustering.
Normalization ensures that each feature contributes equally to the analysis, improving model
performance and interpretability.

Summary Statistics
Generating summary statistics gives a quick overview of the central tendency, variability, and
distribution of your data. This can include measures like mean, median, standard deviation, and
quartiles, providing insights into the overall structure and behavior of your dataset.

By incorporating these additional steps, your EDA will be more comprehensive, leadi
# method to build the plot def get_plot(stock_1,

stock_2, date, value): stock_1 =

dataset[dataset['symbol'] == stock_1] stock_2 =


dataset[dataset['symbol'] == stock_2]

stock_1_name=stock_1['symbol'].unique()[0]

stock_1_range=stock_1[(stock_1['short_date'] >= date[0]) &


(stock_1['short_date'] <= date[1])]
stock_2_name=stock_2['symbol'].unique()[0]

stock_2_range=stock_2[(stock_2['short_date'] >= date[0]) &


(stock_2['short_date'] <= date[1])]

plot=figure(title='Stock prices',
x_axis_label='Date',

x_range=stock_1_range['short_date']

, y_axis_label='Price in $USD',

plot_width=800, plot_height=500)

plot.xaxis.major_label_orientation = 1

plot.grid.grid_line_alpha=0.3

if value == 'open-close':

add_candle_plot(plot, stock_1_name, stock_1_range, 'blue')

add_candle_plot(plot, stock_2_name, stock_2_range, 'orange')

if value == 'volume':

plot.line(stock_1_range['short_date'], stock_1_range['volume'],

legend_label=stock_1_name, muted_alpha=0.2)

plot.line(stock_2_range['short_date'], stock_2_range['volume'],

legend_label=stock_2_name, muted_alpha=0.2,

line_color='orange')

plot.legend.click_policy="mute"

return plot
def add_candle_plot(plot, stock_name, stock_range, color):

inc_1 = stock_range.close > stock_range.open

dec_1 = stock_range.open > stock_range.close w

= 0.5

plot.segment(stock_range['short_date'], stock_range['high'],

stock_range['short_date'], stock_range['low'],

color="grey")

plot.vbar(stock_range['short_date'][inc_1], w,

stock_range['high'][inc_1], stock_range['close'][inc_1],

fill_color="green", line_color="black", legend_label=('Mean

price of ' + stock_name), muted_alpha=0.2)

plot.vbar(stock_range['short_date'][dec_1], w,

stock_range['high'][dec_1], stock_range['close'][dec_1],

fill_color="red", line_color="black", legend_label=('Mean

price of ' + stock_name), muted_alpha=0.2)

stock_mean_val=stock_range[['high', 'low']].mean(axis=1)

plot.line(stock_range['short_date'], stock_mean_val,

legend_label=('Mean price of ' + stock_name),

muted_alpha=0.2, line_color=color, alpha=0.5)

plot_categorical_distribution(df, 'category')

plot_categorical_distribution(df, 'customer_type')

plot_categorical_distribution(df, 'payment_type')
get_plot Method
● Purpose: This method generates a stock price plot for two selected stocks within a specified
date range, allowing for visualization of either the open-close price movement or the volume
of trades.
● Parameters:
1. stock_1, stock_2: Symbols of the two stocks to be compared.

2. date: A tuple representing the start and end dates for the data range.
3. value: Determines what aspect of the stock data to visualize (either 'open-close' for
candlestick charts or 'volume' for trade volume).
● Process:
1. Data Filtering: It filters the dataset to obtain data relevant to the selected stocks
within the specified date range.
2. Plot Initialization: A plot is created with appropriate labels, dimensions, and grid
settings.
3. Plotting:

■ If value is 'open-close', the add_candle_plot function is called to add


candlestick plots for both stocks.
■ If value is 'volume', it plots the trade volume for each stock over time using
line plots.
4. Interactivity: The legend allows users to mute and unmute the plot lines, enhancing
interactivity.

add_candle_plot Method

● Purpose: This function adds candlestick plots to the main plot for the specified stock,
indicating the opening, closing, high, and low prices.
● Parameters:
1. plot: The main plot object where the candlestick charts will be added.

2. stock_name: The name of the stock to display in the legend.

3. stock_range: The filtered data range for the stock.

4. color: The color used for the mean price line.

● Process:
1. Candlestick Calculation: Determines whether the stock closed higher or lower
than it opened (inc_1 and dec_1).
2. Drawing Candles:

■ Segments: Vertical lines representing the range between high and low prices.
■ Bars: Vertical bars filled with green or red, depending on whether the stock
price increased or decreased.
3. Mean Price Line: A line representing the average of the high and low prices is
drawn over the candlestick plot.

plot_categorical_distribution Function

● Purpose: This function visualizes the distribution of a categorical variable using a bar plot.
● Parameters:
0 data: The DataFrame containing the data.
○ column: The categorical column to be visualized.
○ height, aspect: Dimensions of the plot.
● Process:
0 It generates a bar plot showing the frequency of each category in the specified
column. This helps understand the distribution and prevalence of different
categories within the dataset.

Example Plots

● Category Distribution: The plot_categorical_distribution(df, 'category') will show the


distribution of different product categories within the dataset.
● Customer Type Distribution: The plot_categorical_distribution(df, 'customer_type') will
illustrate how different types of customers are represented.
● Payment Type Distribution: The plot_categorical_distribution(df, 'payment_type') will
display the frequency of different payment methods used by customers.
sns.countplot(df, y='category').set(title=f'Distribution of Category')

sns.countplot(df, y='category').set(title=f'Distribution of Category'

sns.heatmap(df.corr())
Data Merging

Parameters:

data (pd.DataFrame): The input DataFrame. column (str): The


name of the column containing timestamp data.
Returns:

A modified DataFrame with the timestamp rounded to the nearest hour.

def convert_timestamp_to_hourly(data: pd.DataFrame = None, column: str =


None):

dummy = data.copy() new_ts = dummy[column].tolist()

new_ts = [i.strftime('%Y-%m-%d %H:00:00') for i in

new_ts] new_ts = [datetime.strptime(i, '%Y-%m-%d

%H:00:00') for i in new_ts] dummy[column] = new_ts

return dummy sales_df =


convert_timestamp_to_hourly(sales_df, 'timestamp')

stock_df = convert_timestamp_to_hourly(stock_df,

'timestamp')

temp_df = convert_timestamp_to_hourly(temp_df, 'timestamp')

This step ensures that all timestamps in the sales, stock, and temperature data

are consistent and aligned on an hourly basis. sales_agg =

sales_df.groupby(['timestamp', 'product_id']).agg({'quantity':

'sum'}).reset_index()

This step consolidates the sales data, making it easier to analyze and model
trends on an hourly basis.

1. Timestamp Conversion: Align timestamps across datasets to the nearest hour for
consistency, making it easier to merge data from different sources.
2. Data Aggregation: Summarize sales data by aggregating quantities sold per hour
and product, simplifying analysis and trend detection.
3. Merging DataFrames: Combine sales, stock, and temperature datasets based on the
aligned timestamp to create a unified dataset for comprehensive analysis.
4. Handling Missing Data: Address any missing values post-merge using techniques
like forward-filling, interpolation, or dropping rows.
5. Final Analysis: Use the combined dataset to explore trends, perform correlation
analysis, and create visualizations such as time-series plots and heatmaps to derive
actionable insights.

Feature Engineering

merged_df['timestamp_day_of_month'] = merged_df['timestamp'].dt.day

merged_df['timestamp_day_of_week'] =

merged_df['timestamp'].dt.dayofweek merged_df['timestamp_hour'] =
merged_df['timestamp'].dt.hour merged_df.drop(columns=['timestamp'],

inplace=True) merged_df.head()

These time-based features help the model capture temporal patterns in the data, which
could be important for forecasting sales and stock levels.

Modeling

Model Setup:
Target Variable (y): estimated_stock_pct (percentage of estimated stock).

Features (X): All other columns in merged_df.

X =

merged_df.drop(columns=['estimated_stock_pct']) y

= merged_df['estimated_stock_pct'] accuracy = []

Cross-Validation and Model Training:


A Random Forest Regressor is used to predict the target variable. The data is split into
training and testing sets, and the model is evaluated using Mean Absolute Error (MAE)
across K folds.

for fold in range(0, K):

# Instantiate algorithm model

= RandomForestRegressor()

scaler = StandardScaler()

# Create training and test samples

X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=split,


random_state=42)

# Scale X data, we scale the data because it helps the algorithm to


converge

# and helps the algorithm to not be greedy with large values


scaler.fit(X_train)

X_train = scaler.transform(X_train)

X_test = scaler.transform(X_test)

# Train model trained_model =

model.fit(X_train, y_train) # Generate

predictions on test sample y_pred =

trained_model.predict(X_test)

# Compute accuracy, using mean absolute error mae =

mean_absolute_error(y_true=y_test, y_pred=y_pred)

accuracy.append(mae) print(f"Fold {fold + 1}: MAE =

{mae:.3f}")

print(f"Average MAE: {(sum(accuracy) / len(accuracy)):.2f}")


Model: A RandomForestRegressor is instantiated for each fold. This model is an ensemble
method that combines the predictions of multiple decision trees to improve accuracy and
robustness.

Scaler: StandardScaler is initialized to standardize the feature data by removing the mean
and scaling to unit variance. This ensures that the model's learning process is not biased by
features with larger scales.

Feature Importance Plot:

The relative importance of each feature is visualized to understand which variables


contribute most to the model’s predictions.

features = [i.split("__")[0] for i in

X.columns] importances =

model.feature_importances_ indices =

np.argsort(importances)

fig, ax = plt.subplots(figsize=(10, 20)) plt.title('Feature Importances')

plt.barh(range(len(indices)), importances[indices], color='y',

align='center') plt.yticks(range(len(indices)), [features[i] for i in

indices]) plt.xlabel('Relative Importance') plt.show()


Identifying important features helps in refining the model and provides insights into
which factors most influence the estimated stock levels.
OBJECTIVES

● Specialized Training: Participants receive in-depth training in specialized areas


such as natural language processing (NLP), computer vision, and deep learning,
allowing them to gain expertise in niche domains within AI.

● Tool Proficiency: The program ensures that participants become proficient in


essential tools like TensorFlow, Keras, PyTorch, and other AI frameworks, enabling
them to build and deploy sophisticated models.

● Problem-Solving Techniques: Interns are taught advanced problem-solving


techniques, such as feature engineering, hyperparameter tuning, and model
optimization, which are critical for improving model performance and accuracy.

● Hands-On Projects: Interns work on hands-on projects that require them to design,
develop, and deploy AI models, simulating real-world scenarios and challenges.

● End-to-End Implementation: Participants gain experience in the entire AI project


lifecycle, from data collection and preprocessing to model development, evaluation,
and deployment.

● Exposure to Diverse Use Cases: The internship provides exposure to a variety of


AI applications across different industries, such as healthcare, finance, and e-
commerce, allowing interns to understand the versatility of AI solutions.
TECHNOLOGIES USED

Programming Language: Python

Python is the cornerstone of AI and data science, recognized for its simplicity,
readability, and extensive ecosystem of libraries and frameworks. As the primary
programming language used in the AI Cognizant Virtual Internship, Python provides
a versatile platform for developing everything from simple scripts to complex
machine learning models. Its syntax is user-friendly, which makes it accessible to
beginners, while its vast array of libraries enables the handling of advanced AI tasks.
Python's extensive support for various data types, powerful in-built functions, and
ease of integration with other technologies make it the ideal language for AI
development.

Libraries and Frameworks

● NumPy: This fundamental library is essential for numerical computing in


Python. NumPy supports large, multi-dimensional arrays and matrices, along
with a collection of mathematical functions to operate on these arrays.

● pandas: pandas is a powerful library used for data manipulation and analysis.
It provides data structures like DataFrames.
Development Tools

● Jupyter Notebook: Jupyter Notebook is an open-source web application that


allows participants to create and share documents that contain live code,

● equations, visualizations, and narrative text. It’s widely used in data science
for its ability to combine code execution with rich text and visual outputs.
This tool is particularly valuable for iterating on ideas, documenting the
modeling process, and presenting results in an interactive format.

Data Visualization Tools

● Matplotlib and Seaborn: While already mentioned as libraries, it’s worth


emphasizing their role as data visualization tools. Matplotlib provides a solid
foundation for creating a wide range of static, animated, and interactive
visualizations.

Collaboration Tools

● GitHub: Beyond its role in code versioning, GitHub serves as a central hub for
project collaboration. Participants can use GitHub to share their code
repositories with team members, track issues, manage pull requests, and
collaborate on code development.
PURPOSE AND IMPORTANCE

The AI Cognizant Virtual Internship plays a pivotal role in closing the gap between
theoretical knowledge acquired in academic settings and the practical skills required in the
AI industry. For students and professionals aspiring to transition into AI roles, this
internship provides an invaluable platform to gain hands-on experience that is often
missing from traditional educational programs. While academic courses typically focus on
foundational theories, mathematical principles, and basic programming, they may not fully
prepare individuals for the complexities and challenges encountered in real-world AI
projects. This is where the AI Cognizant Virtual Internship becomes essential, offering a
structured environment where participants can apply what they’ve learned in a practical
context.

Through this internship, participants engage in a series of carefully designed projects that
mimic the kinds of challenges they will face in the industry. These projects cover a wide
range of AI applications, from machine learning and data analysis to natural language
processing and computer vision. By working on these projects, interns learn to navigate the
entire AI development lifecycle, from data preprocessing and model selection to
deployment and performance evaluation. This practical experience is crucial not only for
reinforcing theoretical knowledge but also for developing a deeper understanding of how
AI solutions are implemented in real-world scenarios. Participants learn to deal with the
nuances of real data, such as handling missing values, dealing with imbalanced datasets,
and optimizing models for performance and scalability.

In summary, the AI Cognizant Virtual Internship is more than just a learning experience—
it’s a transformative journey that equips participants with the skills, confidence, and
practical experience needed to excel in AI roles. It bridges the gap between academia and
industry, ensuring that participants are not only knowledgeable but also industry-ready.
This combination of theoretical understanding and practical application makes the
internship an essential step for anyone looking to succeed in the rapidly evolving field of AI.
AREA AND SCOPE

The scope of the AI Cognizant Virtual Internship is expansive, designed to provide


participants with a comprehensive understanding of artificial intelligence and its many
applications. This internship is meticulously structured to cover various critical aspects of
AI, making it an ideal program for those aspiring to build careers in AI, data science, and
software engineering.

Data Analysis
One of the foundational elements of the internship is data analysis, a crucial skill in AI that
involves extracting meaningful insights from complex datasets. Participants are introduced
to advanced data manipulation techniques using tools like pandas and NumPy, enabling
them to handle large datasets efficiently.

Machine Learning
The machine learning component of the internship is designed to immerse participants in
the core concepts and techniques of this rapidly growing field. Implement various machine
learning algorithms, ranging from simple linear regression to more complex models like
decision trees, support vector machines, and neural networks.

Project Management
Beyond technical skills, the internship also emphasizes the importance of project
management in the context of AI. Participants learn to manage AI projects from inception
to deployment, ensuring that they are delivered on time and meet quality standards.
REFERENCES

https://github.com/openlists/PythonResources

https://github.com/showcases/data-visualization

https://github.com/microsoft/ML-For-Beginners

https://github.com/PyGithub/PyGithub
CONCLUSION

In conclusion, the cross-validation process implemented using the RandomForestRegressor model


is a comprehensive approach to evaluating and fine-tuning the model's performance. By
systematically splitting the dataset into multiple training and testing subsets across K folds, the
model is trained and tested repeatedly on different portions of the data. This method ensures that
the model is not just overfitting to a single subset but is learning patterns that generalize well to
various data points.

The use of StandardScaler to normalize the features before training further enhances the model's
ability to converge efficiently, ensuring that all features contribute equally to the learning process.
This is particularly important in machine learning, where features with larger magnitudes can
otherwise disproportionately influence the model's predictions.

After training, the model's predictions are evaluated using the Mean Absolute Error (MAE) metric,
which provides a clear indication of the average error in the model's predictions. The MAE is
calculated for each fold, and the final average MAE across all folds offers a reliable measure of the
model's overall accuracy and performance.

By adopting this cross-validation approach, the code ensures that the RandomForestRegressor
model is robust, accurate, and capable of generalizing well to new, unseen data. This thorough
evaluation process ultimately leads to a more reliable and effective model, making it well-suited for
real-world applications where consistent and accurate predictions are critical.

You might also like