Summer Internship Report
Summer Internship Report
On
BACHELOR OF TECHNOLOGY
In
Submitted by
2021-2025
1
NRI INSTITUTE OF TECHNOLOGY
(Approved by AICTE, Affiliated to the JNTUK, Kakinada, A.P)
Visadala (P), Medikonduru (M), Guntur-522438
CERTIFICATE
This is to certify that the internship work embodied in this report entitled “Short-
Term Internship on Data Analysis” during May & June- 2024 was carried out
by Syed Tasleem Afroze(21KP1A44A9) of Data Science department, NRI
Institute of Technology, Guntur for partial fulfilment of B. Tech degree to be
awarded by JNTU Kakinada. This Internship work has been carried out is to the
satisfaction of department.
2
COPY OF INTERNSHIP OFFER LETTER
3
COPY OF INTERNSHIP COMPLETION CERTIFICATE
4
Acknowledgment
An industrial attachment cannot be completed without significant help from others. First, we
gratefully acknowledge the help and support from our parents, teachers, employers, friends and
others, whose support has been invaluable for me. I would like to thank the following people for
their contribution in this industrial attachment.
5
Abstract
During my Data Analysis Internship at Cognifyz Technologies from April to June 2024, I worked
independently from home on various data analysis tasks using Python. My responsibilities
included data cleaning, preprocessing, statistical analysis, and visualization. I applied Python
libraries such as Pandas, NumPy, Matplotlib, and Seaborn to manage large datasets and derive
actionable insights.
The internship provided me with an opportunity to refine my technical skills and gain practical
strong self-discipline and effective time management while tackling individual projects without
communication and initiative in a remote work setting. It also provided valuable insights into how
analyst.
6
Table of Contents:
1. Introduction
2. Problem Statement
3. Social Relevance of the Project
4. Training Description
5. Code
6. Analysis
7. Requirements
8. Conclusion
7
List of Figures:
1. Bar chart on top three most common cuisine
2. Bar chart on percentage of restaurants serving each cuisine
3. Pie chart on online delivery availability
4. Cluster map of restaurants in United Kingdom
5. Data flow diagram
8
1. INTRODUCTION
specializes in the dynamic field of data science and excels in delivering impactful projects and solutions.
The company offers a wide range of products and services, including artificial intelligence (AI), machine
learning (ML), and data analytics tools. Additionally, Cognifyz Technologies provides training programs
The internship was conducted remotely, which allowed me to work from home rather than at a physical
facility. Despite the remote nature of the work, I engaged closely with the company’s digital infrastructure
and utilized various online tools for task management and communication.
1. Data Cleaning and Preprocessing: I handled large datasets by performing tasks such as
removing duplicates, filling in missing values, and normalizing data to ensure accuracy and
consistency.
2. Statistical Analysis: I applied statistical techniques to analyze data trends, correlations, and
patterns, which contributed to deriving meaningful insights and supporting data-driven decision-
making.
3. Data Visualization: I created visual representations of data using Python libraries like Matplotlib
and Seaborn. These visualizations helped in effectively communicating complex data insights.
9
4. Independent Project Work: As the internship was conducted remotely, I worked independently
on assigned projects, which required a strong emphasis on self-motivation and effective time
management.
This internship allowed me to apply data analysis principles to a real-world dataset, enhancing my skills
in handling diverse data types and generating actionable insights. The experience improved my technical
proficiency with Python and provided a deeper understanding of data-driven decision-making processes.
10
2. PROBLEM STATEMENT
To perform a comprehensive analysis on a dataset of global cuisines, which includes detailed attributes
such as restaurant location, address, cost, ratings, availability of online delivery, and more. The analysis
1. Top Cuisines:
2. City Analysis:
• Create a histogram or bar chart to visualize the distribution of price ranges among
restaurants.
4. Online Delivery:
• Compare the average ratings of restaurants that offer online delivery versus those that do
not.
5. Restaurant Ratings:
• Analyze the distribution of aggregate ratings and determine the most common rating range.
6. Cuisine Combination:
11
• Determine if certain cuisine combinations tend to have higher ratings.
7. Geographic Analysis:
• Plot the locations of restaurants on a map using longitude and latitude coordinates.
8. Restaurant Chains:
These tasks required a thorough approach to data cleaning, statistical analysis, and visualization
trends. The analysis aimed to address challenges related to data quality, insight extraction, and
12
3. SOCIAL RELEVANCE OF THE PROJECT
The analysis of a global cuisine dataset carries significant social relevance across various dimensions:
• Enhanced Consumer Choices: By identifying popular cuisines, cities with high-quality dining
options, and price range distributions, the project helps consumers make informed dining
decisions. Knowledge about restaurant ratings and online delivery options also supports better
• Trend Identification: Understanding the popularity of certain cuisines and restaurant chains can
guide consumers towards trending dining experiences, potentially enhancing their satisfaction
2. Economic Impact:
• Business Strategy and Development: Insights into restaurant density, pricing, and geographic
distribution can assist restaurant owners and businesses in making strategic decisions regarding
new openings, marketing strategies, and menu offerings. This can drive economic growth by
• Support for Local Businesses: Highlighting city-specific restaurant data and online delivery
options can help local businesses improve their services and compete more effectively,
restaurant distribution and clustering, which can inform urban planning and development.
Planners and developers can use these insights to enhance infrastructure, such as creating food
13
• Community Engagement: Understanding where restaurants are concentrated can help local
governments and community organizations plan events, support local dining initiatives, and
• Promotion of Culinary Diversity: By analyzing and visualizing global cuisine data, the project
highlights the rich diversity of culinary offerings. This can foster greater cultural awareness and
• Cultural Exchange: Insights into popular cuisine combinations and ratings can encourage
cultural exchange and understanding, as consumers are exposed to diverse culinary traditions and
Overall, the social relevance of this project lies in its ability to provide valuable insights that benefit
promotes economic and cultural development, and contributes to a better understanding of dining trends
14
4. TRAINING DESCRIPTION
Project Overview: I worked on a comprehensive project analyzing a global cuisine dataset. This dataset
included key attributes such as restaurant location, address, cost, ratings, and online delivery options. The
primary goal was to extract actionable insights and deliver a detailed analysis that could support decision-
Project Goals:
1. Identify Popular Cuisines: Determine the most common cuisines and their prevalence in the
dataset.
2. Analyze City-Specific Data: Assess restaurant distribution and average ratings across different
cities.
3. Examine Price Range Distribution: Visualize and analyze the distribution of restaurant price
ranges.
4. Evaluate Online Delivery Impact: Determine the prevalence of online delivery options and
5. Analyze Restaurant Ratings: Investigate rating distributions and identify trends in restaurant
ratings.
6. Explore Cuisine Combinations: Identify common cuisine combinations and their impact on
ratings.
7. Conduct Geographic Analysis: Map restaurant locations to identify patterns and clusters.
8. Identify Restaurant Chains: Recognize and analyze the presence and performance of restaurant
chains.
15
Approach and Methods:
1. Data Preparation:
o Data Integration: Merged various data sources to create a comprehensive dataset for
analysis.
2. Analysis Techniques:
o Descriptive Statistics: Used Python libraries such as Pandas and NumPy to calculate
o Data Visualization: Employed tools like Matplotlib and Seaborn to create histograms,
bar charts, and geospatial maps to visualize data trends and distributions.
3. Task Execution:
o Top Cuisines: Identified and quantified the most common cuisines and calculated their
market share.
o City Analysis: Analyzed restaurant density and average ratings by city, determining
o Online Delivery: Calculated the percentage of restaurants offering online delivery and
16
o Restaurant Ratings: Analyzed rating distributions to find the most common rating
specific areas.
o Restaurant Chains: Identified chain restaurants and analyzed their ratings and
popularity.
Accomplishments:
1. Successfully cleaned and integrated a large dataset, ensuring high-quality data for analysis.
2. Developed a series of visualizations and statistical analyses that provided clear insights into
3. Delivered actionable insights into cuisine popularity, pricing, and online delivery impact,
4. Created a comprehensive final report that detailed findings and supported recommendations with
data-driven evidence.
trend identification.
17
Figures and Tables:
18
• Pie chart on online delivery availability
19
Data Flow
↓ ↓ ↓ ↓ ↓
(Top Cuisines) (City Analysis) (Price Range) (Online Delivery) (Ratings Analysis)
↓ ↓ ↓ ↓ ↓
20
5.CODE/TASKS:
#LEVEL-1
#TASK-1: Top Cuisines
Determine the top three most common cuisines in the dataset. Calculate the percentage of restaurants
that serve each of the top cuisines.
OUTPUT:
21
#TASK-2: City Analysis
Identify the city with the highest number of restaurants in the dataset. Calculate the average rating for
restaurants in each city. Determine the city with the highest average rating.
OUTPUT:
22
#TASK-3: Price Range Distribution
Create a histogram or bar chart to visualize the distribution of price ranges among the restaurants.
Calculate the percentage of restaurants in each price range category.
OUTPUT:
23
#TASK-4: Online Delivery
Determine the percentage of restaurants that offer online delivery. Compare the average ratings of
restaurants with and without online delivery.
OUTPUT:
24
#LEVEL-2
#TASK-1: Restaurant Ratings
Analyze the distribution of aggregate ratings and determine the most common rating range.
Calculate the average number of votes received by restaurants.
OUTPUT:
25
#TASK-2: Cuisine Combination
Identify the most common combinations of cuisines in the dataset. Determine if certain cuisine
combinations tend to have higher ratings.
OUTPUT:
26
#TASK-3: Geographic Analysis
Plot the locations of restaurants on a map using longitude and latitude coordinates. Identify any patterns
or clusters of restaurants in specific areas.
OUTPUT:
27
#TASK-4: Votes Analysis
Identify the restaurants with the highest and lowest number of votes. Analyze if there is a correlation
between the number of votes and the rating of a restaurant.
OUTPUT:
28
6. ANALYSIS
Technical Skills:
• Data Analysis: The training provided hands-on experience with data cleaning, transformation,
and analysis using Python. I became proficient in using libraries such as Pandas, NumPy, and
Matplotlib for processing and visualizing data. This experience enhanced my technical abilities
visualizations.
data, and evaluating online delivery impacts required critical thinking and problem-solving
skills. This helped me understand how to approach complex datasets, extract actionable
• Geospatial Analysis: Mapping restaurant locations and identifying patterns involved learning
new tools and techniques for geospatial analysis, enhancing my ability to work with geographic
Non-Technical Skills:
• Project Management: The internship required managing multiple tasks. I learned how to
prioritize tasks, manage time effectively, and keep track of project milestones, which is crucial
• Independence and Initiative: Given that the internship was conducted individually, I learned
to work independently, take initiative, and manage my own workflow without relying on a
team. This experience has prepared me for roles that require self-direction and responsibility.
29
• Problem-Solving and Critical Thinking: Addressing challenges such as data inconsistencies,
learned to think critically about the best methods to handle and analyze data.
Performance Analysis:
Strengths:
insights. For example, I accurately identified the top cuisines and analyzed their market
prevalence, which demonstrated my ability to work with large datasets and apply analytical
techniques.
• Visualization Skills: Created clear and effective visualizations that communicated data trends
and insights effectively. The visualizations I developed, such as histograms and geospatial
• Independence: Managed the entire project lifecycle independently, from data collection to final
reporting. This autonomy showed my capability to handle projects with minimal supervision
the need to deepen my understanding of more advanced statistical and machine learning
techniques. For future projects, I would focus on learning and applying these advanced methods
• Proficiency with Data Tools and Software: Expanding knowledge and experience with
various data analysis tools and software beyond Python, such as SQL databases, Matplotlib for
30
• Collaboration and Teamwork: Strengthening collaboration and teamwork skills, particularly
tools effectively, and develop strategies for effective communication and coordination in remote
settings.
31
7.REQUIREMENTS
https://www.anaconda.com/products/distribution
32
8. CONCLUSION
The summer training at Cognifyz Technologies was highly beneficial in advancing my technical and
experience, improved my communication skills, and increased my confidence in handling data analysis
projects. Addressing educational gaps and expanding the scope of the training program could further
enhance the learning experience and better prepare future interns for advanced roles in the field.
Technical Advancement:
data cleaning, transformation, and visualization using Python. I gained practical experience with
libraries such as Pandas, NumPy, and Matplotlib, which are essential tools for handling large
identifying top cuisines, analyzing city-specific restaurant data, and mapping geographic
patterns, allowed me to apply and refine my analytical skills. This experience deepened my
world problems.
• Learning New Tools: The training introduced me to new tools and methodologies, such as
geospatial analysis and data visualization techniques, expanding my proficiency beyond basic
Organizational Insight:
and analytics, which highlighted the importance of individual contributions within a structured
33
yet flexible framework. This experience underscored the value of aligning personal work with
• Work Independence: The individual nature of the internship emphasized the ability to work
autonomously, manage projects independently, and take initiative without constant supervision.
professional setting.
Benefits Acquired:
• Practical Experience: Gained hands-on experience in handling and analyzing real-world data,
• Increased Confidence: Gained confidence in managing and executing data analysis projects
• Professional Growth: Acquired a deeper understanding of the data science field and its
applications, enriching my knowledge and preparing me for advanced roles in engineering and
analytics.
• Educational Gaps: Although the training provided a solid foundation, there was limited
exposure to advanced analytical techniques such as machine learning and big data analytics.
34
• Tool Proficiency: While the training involved basic tools and techniques, there was a lack of
exposure to other data analysis and visualization tools, such as SQL databases, Tableau, or
cloud-based platforms.
• Expand Tool Coverage: Introducing a broader range of tools and platforms for data analysis
and visualization could help interns gain experience with industry-standard technologies.
• Interactive Learning: Providing opportunities for interactive and collaborative projects could
enhance the learning experience, particularly for developing teamwork and communication
skills.
35