0% found this document useful (0 votes)
126 views35 pages

Summer Internship Report

Uploaded by

syedtasleem07
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
126 views35 pages

Summer Internship Report

Uploaded by

syedtasleem07
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

INTERNSHIP

On

DATA ANALYSIS USING PYTHON


An intern report submitted in partial fulfilment of the requirements for award of the degree of

BACHELOR OF TECHNOLOGY

In

COMPUTER SCIENCE ENGINEERING-DATA SCIENCE

Submitted by

SYED TASLEEM AFROZE

Under the Esteemed Guidance of

Mr. KOTESWARA RAO

Department of Computer Science Engineering-DATA SCIENCE

NRI INSTITUTE OF TECHONOLOGY

(Approved by AICTE, affiliated to JNTU, Kakinada)

VISADALA (P.O), MEDIKONDURU MANDAL,

GUNTUR-522 438 ANDHRA PRADESH.

2021-2025

1
NRI INSTITUTE OF TECHNOLOGY
(Approved by AICTE, Affiliated to the JNTUK, Kakinada, A.P)
Visadala (P), Medikonduru (M), Guntur-522438

CERTIFICATE
This is to certify that the internship work embodied in this report entitled “Short-
Term Internship on Data Analysis” during May & June- 2024 was carried out
by Syed Tasleem Afroze(21KP1A44A9) of Data Science department, NRI
Institute of Technology, Guntur for partial fulfilment of B. Tech degree to be
awarded by JNTU Kakinada. This Internship work has been carried out is to the
satisfaction of department.

Place: NRIIT, Visadala


Date:

Signature of the Internal Guide Signature of the HOD

Signature of the External

2
COPY OF INTERNSHIP OFFER LETTER

3
COPY OF INTERNSHIP COMPLETION CERTIFICATE

4
Acknowledgment

An industrial attachment cannot be completed without significant help from others. First, we
gratefully acknowledge the help and support from our parents, teachers, employers, friends and
others, whose support has been invaluable for me. I would like to thank the following people for
their contribution in this industrial attachment.

5
Abstract

During my Data Analysis Internship at Cognifyz Technologies from April to June 2024, I worked

independently from home on various data analysis tasks using Python. My responsibilities

included data cleaning, preprocessing, statistical analysis, and visualization. I applied Python

libraries such as Pandas, NumPy, Matplotlib, and Seaborn to manage large datasets and derive

actionable insights.

The internship provided me with an opportunity to refine my technical skills and gain practical

experience in data manipulation and visualization. Working remotely allowed me to develop

strong self-discipline and effective time management while tackling individual projects without

direct team interaction.

From a non-technical perspective, the experience highlighted the significance of clear

communication and initiative in a remote work setting. It also provided valuable insights into how

to approach complex data challenges independently, enhancing my overall capability as a data

analyst.

6
Table of Contents:
1. Introduction
2. Problem Statement
3. Social Relevance of the Project
4. Training Description
5. Code
6. Analysis
7. Requirements
8. Conclusion

7
List of Figures:
1. Bar chart on top three most common cuisine
2. Bar chart on percentage of restaurants serving each cuisine
3. Pie chart on online delivery availability
4. Cluster map of restaurants in United Kingdom
5. Data flow diagram

8
1. INTRODUCTION

I completed my Data Analysis Internship with Cognifyz Technologies. Cognifyz Technologies

specializes in the dynamic field of data science and excels in delivering impactful projects and solutions.

The company offers a wide range of products and services, including artificial intelligence (AI), machine

learning (ML), and data analytics tools. Additionally, Cognifyz Technologies provides training programs

designed to enhance skills and knowledge in these advanced areas.

The internship was conducted remotely, which allowed me to work from home rather than at a physical

facility. Despite the remote nature of the work, I engaged closely with the company’s digital infrastructure

and utilized various online tools for task management and communication.

In my role as a Data Analysis Intern, I was responsible for:

1. Data Cleaning and Preprocessing: I handled large datasets by performing tasks such as

removing duplicates, filling in missing values, and normalizing data to ensure accuracy and

consistency.

2. Statistical Analysis: I applied statistical techniques to analyze data trends, correlations, and

patterns, which contributed to deriving meaningful insights and supporting data-driven decision-

making.

3. Data Visualization: I created visual representations of data using Python libraries like Matplotlib

and Seaborn. These visualizations helped in effectively communicating complex data insights.

9
4. Independent Project Work: As the internship was conducted remotely, I worked independently

on assigned projects, which required a strong emphasis on self-motivation and effective time

management.

This internship allowed me to apply data analysis principles to a real-world dataset, enhancing my skills

in handling diverse data types and generating actionable insights. The experience improved my technical

proficiency with Python and provided a deeper understanding of data-driven decision-making processes.

10
2. PROBLEM STATEMENT

To perform a comprehensive analysis on a dataset of global cuisines, which includes detailed attributes

such as restaurant location, address, cost, ratings, availability of online delivery, and more. The analysis

aimed to address several key questions and tasks:

1. Top Cuisines:

• Determine the top three most common cuisines in the dataset.

• Calculate the percentage of restaurants serving each of these top cuisines.

2. City Analysis:

• Identify the city with the highest number of restaurants.

• Calculate the average rating for restaurants in each city.

• Determine the city with the highest average rating.

3. Price Range Distribution:

• Create a histogram or bar chart to visualize the distribution of price ranges among

restaurants.

• Calculate the percentage of restaurants in each price range category.

4. Online Delivery:

• Determine the percentage of restaurants offering online delivery.

• Compare the average ratings of restaurants that offer online delivery versus those that do

not.

5. Restaurant Ratings:

• Analyze the distribution of aggregate ratings and determine the most common rating range.

• Calculate the average number of votes received by restaurants.

6. Cuisine Combination:

• Identify the most common combinations of cuisines present in the dataset.

11
• Determine if certain cuisine combinations tend to have higher ratings.

7. Geographic Analysis:

• Plot the locations of restaurants on a map using longitude and latitude coordinates.

• Identify any patterns or clusters of restaurants in specific geographic areas.

8. Restaurant Chains:

• Identify any restaurant chains present in the dataset.

• Analyze the ratings and popularity of different restaurant chains.

These tasks required a thorough approach to data cleaning, statistical analysis, and visualization

to provide actionable insights and enhance understanding of restaurant characteristics and

trends. The analysis aimed to address challenges related to data quality, insight extraction, and

effective communication of findings through visualizations and reports.

12
3. SOCIAL RELEVANCE OF THE PROJECT

The analysis of a global cuisine dataset carries significant social relevance across various dimensions:

1. Consumer Insights and Decision-Making:

• Enhanced Consumer Choices: By identifying popular cuisines, cities with high-quality dining

options, and price range distributions, the project helps consumers make informed dining

decisions. Knowledge about restaurant ratings and online delivery options also supports better

choices based on personal preferences and affordability.

• Trend Identification: Understanding the popularity of certain cuisines and restaurant chains can

guide consumers towards trending dining experiences, potentially enhancing their satisfaction

and overall dining experience.

2. Economic Impact:

• Business Strategy and Development: Insights into restaurant density, pricing, and geographic

distribution can assist restaurant owners and businesses in making strategic decisions regarding

new openings, marketing strategies, and menu offerings. This can drive economic growth by

optimizing resource allocation and identifying profitable market segments.

• Support for Local Businesses: Highlighting city-specific restaurant data and online delivery

options can help local businesses improve their services and compete more effectively,

contributing to the economic vitality of local communities.

3. Urban Planning and Development:

• Geographic Analysis: The project’s geographic analysis provides valuable information on

restaurant distribution and clustering, which can inform urban planning and development.

Planners and developers can use these insights to enhance infrastructure, such as creating food

hubs or improving access to popular dining areas.

13
• Community Engagement: Understanding where restaurants are concentrated can help local

governments and community organizations plan events, support local dining initiatives, and

foster community engagement around food and dining experiences.

4. Cultural Awareness and Diversity:

• Promotion of Culinary Diversity: By analyzing and visualizing global cuisine data, the project

highlights the rich diversity of culinary offerings. This can foster greater cultural awareness and

appreciation among consumers and promote the exploration of different cuisines.

• Cultural Exchange: Insights into popular cuisine combinations and ratings can encourage

cultural exchange and understanding, as consumers are exposed to diverse culinary traditions and

practices from around the world.

Overall, the social relevance of this project lies in its ability to provide valuable insights that benefit

consumers, businesses, urban planners, and communities. It supports informed decision-making,

promotes economic and cultural development, and contributes to a better understanding of dining trends

and preferences on a global scale.

14
4. TRAINING DESCRIPTION

Project Overview: I worked on a comprehensive project analyzing a global cuisine dataset. This dataset

included key attributes such as restaurant location, address, cost, ratings, and online delivery options. The

primary goal was to extract actionable insights and deliver a detailed analysis that could support decision-

making for various stakeholders.

Project Goals:

1. Identify Popular Cuisines: Determine the most common cuisines and their prevalence in the

dataset.

2. Analyze City-Specific Data: Assess restaurant distribution and average ratings across different

cities.

3. Examine Price Range Distribution: Visualize and analyze the distribution of restaurant price

ranges.

4. Evaluate Online Delivery Impact: Determine the prevalence of online delivery options and

compare ratings based on delivery availability.

5. Analyze Restaurant Ratings: Investigate rating distributions and identify trends in restaurant

ratings.

6. Explore Cuisine Combinations: Identify common cuisine combinations and their impact on

ratings.

7. Conduct Geographic Analysis: Map restaurant locations to identify patterns and clusters.

8. Identify Restaurant Chains: Recognize and analyze the presence and performance of restaurant

chains.

15
Approach and Methods:

1. Data Preparation:

o Data Cleaning: Addressed missing values, corrected inconsistencies, and handled

outliers to ensure data quality.

o Data Integration: Merged various data sources to create a comprehensive dataset for

analysis.

2. Analysis Techniques:

o Descriptive Statistics: Used Python libraries such as Pandas and NumPy to calculate

central tendencies and distribution metrics.

o Data Visualization: Employed tools like Matplotlib and Seaborn to create histograms,

bar charts, and geospatial maps to visualize data trends and distributions.

o Geospatial Analysis: Utilized geographic coordinates to plot restaurant locations and

identify geographic patterns.

3. Task Execution:

o Top Cuisines: Identified and quantified the most common cuisines and calculated their

market share.

o City Analysis: Analyzed restaurant density and average ratings by city, determining

which city had the highest rating.

o Price Range Distribution: Created visualizations to display the distribution of price

ranges and calculated the percentage of restaurants in each range.

o Online Delivery: Calculated the percentage of restaurants offering online delivery and

compared ratings between restaurants with and without delivery options.

16
o Restaurant Ratings: Analyzed rating distributions to find the most common rating

ranges and calculated average votes.

o Cuisine Combinations: Identified prevalent cuisine combinations and assessed their

correlation with higher ratings.

o Geographic Analysis: Mapped restaurant locations and identified clusters or patterns in

specific areas.

o Restaurant Chains: Identified chain restaurants and analyzed their ratings and

popularity.

Accomplishments:

1. Successfully cleaned and integrated a large dataset, ensuring high-quality data for analysis.

2. Developed a series of visualizations and statistical analyses that provided clear insights into

restaurant trends, ratings, and geographic distribution.

3. Delivered actionable insights into cuisine popularity, pricing, and online delivery impact,

contributing valuable information for business strategy and consumer decision-making.

4. Created a comprehensive final report that detailed findings and supported recommendations with

data-driven evidence.

Technical and Administrative Activities:


1. Technical: Data cleaning, statistical analysis, visualization creation, geographic plotting, and

trend identification.

2. Administrative: Managed project timeline, documented analysis procedures, and presented

findings through reports and visual aids.

17
Figures and Tables:

• Bar chart on top three most common cuisines

• Bar chart on percentage of restaurants serving each cuisine

18
• Pie chart on online delivery availability

• Cluster map of restaurants in United Kingdom

19
Data Flow

[Data Sources] → (Data Collection) → [Raw Data Store]

(Data Cleaning and Preparation) → [Processed Data Store]

(Data Analysis) → [Analysis Results Store]

↓ ↓ ↓ ↓ ↓

(Top Cuisines) (City Analysis) (Price Range) (Online Delivery) (Ratings Analysis)

↓ ↓ ↓ ↓ ↓

(Cuisine Combinations) (Geographic Analysis) (Restaurant Chains)

(Visualization and Reporting)

20
5.CODE/TASKS:
#LEVEL-1
#TASK-1: Top Cuisines
Determine the top three most common cuisines in the dataset. Calculate the percentage of restaurants
that serve each of the top cuisines.

OUTPUT:

21
#TASK-2: City Analysis
Identify the city with the highest number of restaurants in the dataset. Calculate the average rating for
restaurants in each city. Determine the city with the highest average rating.

OUTPUT:

22
#TASK-3: Price Range Distribution
Create a histogram or bar chart to visualize the distribution of price ranges among the restaurants.
Calculate the percentage of restaurants in each price range category.

OUTPUT:

23
#TASK-4: Online Delivery
Determine the percentage of restaurants that offer online delivery. Compare the average ratings of
restaurants with and without online delivery.

OUTPUT:

24
#LEVEL-2
#TASK-1: Restaurant Ratings
Analyze the distribution of aggregate ratings and determine the most common rating range.
Calculate the average number of votes received by restaurants.

OUTPUT:

25
#TASK-2: Cuisine Combination
Identify the most common combinations of cuisines in the dataset. Determine if certain cuisine
combinations tend to have higher ratings.

OUTPUT:

26
#TASK-3: Geographic Analysis
Plot the locations of restaurants on a map using longitude and latitude coordinates. Identify any patterns
or clusters of restaurants in specific areas.

OUTPUT:

27
#TASK-4: Votes Analysis
Identify the restaurants with the highest and lowest number of votes. Analyze if there is a correlation
between the number of votes and the rating of a restaurant.

OUTPUT:

28
6. ANALYSIS
Technical Skills:

• Data Analysis: The training provided hands-on experience with data cleaning, transformation,

and analysis using Python. I became proficient in using libraries such as Pandas, NumPy, and

Matplotlib for processing and visualizing data. This experience enhanced my technical abilities

in handling large datasets, performing statistical analysis, and creating meaningful

visualizations.

• Problem-Solving: Working on tasks such as determining top cuisines, analyzing city-specific

data, and evaluating online delivery impacts required critical thinking and problem-solving

skills. This helped me understand how to approach complex datasets, extract actionable

insights, and apply statistical methods to real-world problems.

• Geospatial Analysis: Mapping restaurant locations and identifying patterns involved learning

new tools and techniques for geospatial analysis, enhancing my ability to work with geographic

data and identify spatial trends.

Non-Technical Skills:

• Project Management: The internship required managing multiple tasks. I learned how to

prioritize tasks, manage time effectively, and keep track of project milestones, which is crucial

in any engineering role.

• Independence and Initiative: Given that the internship was conducted individually, I learned

to work independently, take initiative, and manage my own workflow without relying on a

team. This experience has prepared me for roles that require self-direction and responsibility.

29
• Problem-Solving and Critical Thinking: Addressing challenges such as data inconsistencies,

missing values, and analytical complexities required innovative problem-solving approaches. I

learned to think critically about the best methods to handle and analyze data.

Performance Analysis:

Strengths:

• Analytical Proficiency: Successfully analyzed complex datasets and derived actionable

insights. For example, I accurately identified the top cuisines and analyzed their market

prevalence, which demonstrated my ability to work with large datasets and apply analytical

techniques.

• Visualization Skills: Created clear and effective visualizations that communicated data trends

and insights effectively. The visualizations I developed, such as histograms and geospatial

maps, were well-received and useful for stakeholders.

• Independence: Managed the entire project lifecycle independently, from data collection to final

reporting. This autonomy showed my capability to handle projects with minimal supervision

and deliver results.

Areas for Improvement:

• Advanced Analytical Techniques: While I gained proficiency in basic analysis, I recognize

the need to deepen my understanding of more advanced statistical and machine learning

techniques. For future projects, I would focus on learning and applying these advanced methods

to enhance the depth of analysis.

• Proficiency with Data Tools and Software: Expanding knowledge and experience with

various data analysis tools and software beyond Python, such as SQL databases, Matplotlib for

data visualization, and cloud-based data platforms.

30
• Collaboration and Teamwork: Strengthening collaboration and teamwork skills, particularly

in a remote or distributed team setting. Participate in team-based projects, utilize collaboration

tools effectively, and develop strategies for effective communication and coordination in remote

settings.

31
7.REQUIREMENTS

1. Jupyter Project. (2023). *Jupyter Notebook. https://jupyter.org/

2. Anaconda, Inc. (2023). Anaconda Distribution 2023. Retrieved from

https://www.anaconda.com/products/distribution

3. Cognifyz. (2024). *Global restaurant data* Excel file. Provided by organization.

32
8. CONCLUSION

The summer training at Cognifyz Technologies was highly beneficial in advancing my technical and

organizational understanding of the engineering profession. It enriched my knowledge through practical

experience, improved my communication skills, and increased my confidence in handling data analysis

projects. Addressing educational gaps and expanding the scope of the training program could further

enhance the learning experience and better prepare future interns for advanced roles in the field.

Technical Advancement:

• Enhanced Analytical Skills: Advanced my technical expertise in data analysis, focusing on

data cleaning, transformation, and visualization using Python. I gained practical experience with

libraries such as Pandas, NumPy, and Matplotlib, which are essential tools for handling large

datasets and deriving insights.

• Application of Advanced Techniques: Exposure to various analytical tasks, such as

identifying top cuisines, analyzing city-specific restaurant data, and mapping geographic

patterns, allowed me to apply and refine my analytical skills. This experience deepened my

understanding of data-driven decision-making and the application of statistical methods to real-

world problems.

• Learning New Tools: The training introduced me to new tools and methodologies, such as

geospatial analysis and data visualization techniques, expanding my proficiency beyond basic

data analysis and enabling me to tackle more complex data challenges.

Organizational Insight:

• Understanding of Company Structure: I observed the organization’s focus on data science

and analytics, which highlighted the importance of individual contributions within a structured

33
yet flexible framework. This experience underscored the value of aligning personal work with

broader organizational goals.

• Work Independence: The individual nature of the internship emphasized the ability to work

autonomously, manage projects independently, and take initiative without constant supervision.

This experience illustrated the importance of self-motivation and self-discipline in a

professional setting.

Benefits Acquired:

• Practical Experience: Gained hands-on experience in handling and analyzing real-world data,

which enhanced my problem-solving skills and analytical capabilities.

• Increased Confidence: Gained confidence in managing and executing data analysis projects

independently, which will be beneficial for future professional roles.

• Professional Growth: Acquired a deeper understanding of the data science field and its

applications, enriching my knowledge and preparing me for advanced roles in engineering and

analytics.

Deficiencies and Suggestions for Improvement:

• Educational Gaps: Although the training provided a solid foundation, there was limited

exposure to advanced analytical techniques such as machine learning and big data analytics.

These areas were not covered in depth during the internship.

34
• Tool Proficiency: While the training involved basic tools and techniques, there was a lack of

exposure to other data analysis and visualization tools, such as SQL databases, Tableau, or

cloud-based platforms.

Training Program Enhancements:

• Expand Tool Coverage: Introducing a broader range of tools and platforms for data analysis

and visualization could help interns gain experience with industry-standard technologies.

• Interactive Learning: Providing opportunities for interactive and collaborative projects could

enhance the learning experience, particularly for developing teamwork and communication

skills.

35

You might also like