Finall Report Internship
Finall Report Internship
On
“Internship Report”
Submitted on 22/07/2024
By
Under Guidance of
“Alfiya Sayyed”
In
Three Years Diploma Program in Engineering & Technology of
Maharashtra State Board of Technical Education, Mumbai (Autonomous)
ISO 9001:2015
At
Anjuman-I-Islam’s Abdul Razzaq Kalsekar Polytechnic
1 Acknowledgement 2
2 Abstract 3
3 Python 4
4 Numpy 5
5 Pandas 6
6 Anaconda navigator 7
7 Weekly report 8
About Dataset
8 and 25
Data Insight
9 Conclusion 27
10 Project 28
1
Acknowledgement
I would like to extend my heartfelt thanks to my mentor, Ms. Alfiya Sayyed, whose
guidance, support, and encouragement have been instrumental throughout my
internship. Her insights and expertise have significantly contributed to my learning
and development in the field of Data Analytics.
This internship has been a pivotal step in my career, equipping me with practical
skills and hands-on experience in Data Analysis. I have gained a deeper
understanding of modern technologies, responsive design, and user experience
principles. The challenges and projects I encountered have enhanced my problem-
solving abilities and prepared me for future professional endeavors.
Lastly, I would like to thank my institute, AIRKP, for facilitating this internship and
supporting my academic and professional growth.
Thank you all for your continuous support and belief in my abilities
2
Abstract
This abstract presents a Data analysis of the industrial training experience in Python during an
internship at Milestone PLM Solutions, using Python with Anaconda Navigator, and leveraging
libraries such as NumPy and Pandas for data manipulation and analysis.
The internship aimed to provide practical knowledge and hands-on skills in Python within a corporate
environment. Beginning with an introduction to the internship context, the abstract emphasizes the
significance of web development in modern digital landscapes and its impact on business operations
and end-user experiences.
Throughout the internship, various projects and assignments served as practical datasets, enabling the
application of theoretical knowledge to real-world web development challenges. Python's Pandas
library facilitated the analysis of project outcomes, highlighting improvements in problem-solving
abilities and project management skills.
Moreover, the abstract explores the development of soft skills such as effective communication and
teamwork, crucial in a corporate setting like Milestone PLM Solutions. These skills were essential
for collaborating with team members, debugging complex code, and meeting project deadlines.
Reflecting on personal and professional growth, the abstract discusses the transformative impact of
the internship experience. It underscores the enhancement of self-confidence and motivation to
specialize further in specific areas of web development.
In conclusion, the abstract emphasizes the role of experiential learning in bridging theoretical
knowledge with practical applications, preparing interns for successful careers in the dynamic field
of web development. Python, Anaconda Navigator, NumPy, and Pandas emerged as essential tools
for analyzing and interpreting internship experiences, laying the foundation for future professional
endeavors.
3
Python
Python is a high-level, interpreted programming language known for its simplicity and readability. It
was created by Guido van Rossum and first released in 1991. Python's design philosophy emphasizes
code readability and a syntax that allows programmers to express concepts in fewer lines of code
compared to other programming languages like C++ or Java.
1. Easy to Learn and Use: Python's syntax is clear, making it accessible for beginners and
experienced programmers alike. Its simplicity encourages rapid development and
prototyping.
2. Interpreted and Interactive: Python is an interpreted language, meaning that code is
executed line by line, which allows for interactive testing and debugging.
3. Cross-platform: Python is available on multiple platforms (Windows, macOS, Linux) and is
compatible with major operating systems.
4. Extensive Standard Library: Python's standard library is large and comprehensive,
providing modules and packages for tasks ranging from web development and networking to
file operations and data manipulation.
1. Web Development: Python is widely used for server-side web development frameworks like
Django and Flask, facilitating rapid development of web applications.
2. Data Science and Machine Learning: Python's libraries such as NumPy, Pandas,
Matplotlib, and scikit-learn are popular in data analysis, visualization, and machine learning.
3. Scripting and Automation: Python is a preferred language for scripting tasks, automation of
repetitive tasks, and system administration.
4. Scientific Computing: Python is used in scientific computing environments for simulations,
numerical analysis, and computational mathematics.
1. Active Community: Python has a large and active community of developers and contributors
who collaborate on improving the language and its ecosystem.
2. Third-party Libraries: Python's package index, PyPI (Python Package Index), hosts
thousands of third-party libraries and frameworks, extending its capabilities for various
domains.
3. Versatility: Python's versatility and wide adoption across industries make it a valuable skill
for developers, data scientists, engineers, and researchers.
Conclusion
Python's versatility, simplicity, and robust ecosystem make it a popular choice for a wide range of
applications, from web development and data science to automation and scientific computing. Its
continued growth in popularity is driven by its ease of use, extensive libraries, and active community
support, making Python an excellent choice for both beginners and seasoned developers.
4
Numpy
NumPy (Numerical Python) is a fundamental library in Python for numerical computing. It provides
support for large, multi-dimensional arrays and matrices, along with a collection of mathematical
functions to operate on these arrays efficiently.
Key Features:
• ndarray: A powerful N-dimensional array object for efficient storage and manipulation of
data.
• Mathematical Functions: Comprehensive suite of mathematical functions for operations on
arrays.
• Broadcasting: Allows operations on arrays of different shapes without explicitly creating
copies of data.
• Integration: Seamless integration with other libraries like Pandas, Matplotlib, and scikit-
learn.
Applications:
Conclusion: NumPy's efficient array operations and mathematical functions make it indispensable
for scientific computing, data analysis, and machine learning in Python.
5
Pandas
Pandas is a powerful library in Python for data manipulation and analysis. It provides data structures
and operations for manipulating numerical tables and time series. Pandas is built on top of NumPy
and is often used in conjunction with it for data preprocessing and cleaning tasks.
Key Features:
Applications:
• Data Analysis: Essential for data cleaning, transformation, and exploration tasks.
• Time Series Analysis: Supports handling time series data and performing operations such as
resampling and time zone handling.
• Data Visualization: Integrates with libraries like Matplotlib and Seaborn for visualization of
data stored in Pandas structures.
• Data Import/Export: Capable of reading from and writing to various file formats, including
CSV, Excel, SQL databases, and JSON.
Conclusion: Pandas simplifies data manipulation and analysis tasks in Python, making it a popular
choice for data scientists, analysts, and researchers working with structured data.
6
Anaconda Navigator
Anaconda Navigator is a graphical user interface (GUI) included with the Anaconda distribution of
Python. It simplifies package management and deployment of data science and machine learning
environments.
Key Features:
• Package Management: Allows users to easily install, update, and manage Python packages,
including popular libraries like NumPy, Pandas, and TensorFlow.
• Environment Management: Facilitates the creation and management of Python
environments with different package versions, ensuring compatibility for different projects.
• Integrated Development Environments (IDEs): Provides access to popular Python IDEs
such as Jupyter Notebook, JupyterLab, Spyder, and VS Code, enhancing productivity for data
analysis and development tasks.
• Navigator Interface: Offers an intuitive interface for launching applications, managing
environments, and accessing documentation and support resources.
• Cross-Platform: Available for Windows, macOS, and Linux, ensuring consistency and
usability across different operating systems.
Applications:
• Data Science: Ideal for setting up and maintaining environments for data analysis, machine
learning, and scientific computing.
• Education and Training: Used in educational settings to provide a consistent environment
for teaching Python and data science concepts.
• Research and Development: Supports rapid prototyping and experimentation with different
libraries and tools required for research projects.
• Community and Support: Backed by a supportive community and comprehensive
documentation, offering resources for troubleshooting and learning.
Conclusion: Anaconda Navigator simplifies the management of Python environments and packages,
making it an essential tool for data scientists, researchers, and developers working in Python-based
data analysis and machine learning.
7
Weekly Report
Week 1
1. Web Development: Frameworks like Django and Flask enable rapid web application development.
2. Data Science and Machine Learning: Libraries like pandas, NumPy.
6. Game Development: Libraries like Pygame help in developing simple games.
8
Data Types in Python
1. Numeric Types: int, float, complex
2. Sequence Types: list, tuple, range
3. Text Type: str
4. Mapping Type: dict
5. Set Types: set, frozenset
6. Boolean Type: bool
7. Binary Types: bytes, bytearray, memoryview
2. if-else
Statement:
python if
condition:
# code to execute if condition is true else:
# code to execute if condition is false
3. elif
Statement:
python if
condition1:
# code to execute if condition1 is true elif
condition2:
# code to execute if condition2 is true else:
# code to execute if all conditions are false
9
Summary
Python is a powerful, easy-to-learn programming language used in various real-world applications
such as web development, data science, automation, and more. Writing Python code can be done in
different environments like IDEs, text editors, Jupyter Notebooks, and online platforms.
Understanding variables, data types, operators, and decisional statements is fundamental to solving
real-world problems efficiently
10
Week 2
Handling Multiple Conditions Problem Solving
Tax Calculation Based on Salary
To calculate tax based on salary, multiple conditions are used to apply different tax rates for different
salary ranges.
Example:
python def
calculate_tax(salary):
if salary <= 10000: tax
=0
elif salary <= 20000:
tax = salary *0.1
elif salary <= 50000:
tax = salary *0.2
else: tax = salary
*0.3
return tax
Example:
python def
calculate_electricity_bill(units):
if units <= 100:
bill = units *5
elif units <= 200: bill = 100 *5 +
(units - 100) *7
else:
bill = 100 *5 + 100 *7 + (units - 200) *10
return bill
Example usage:
units = 250 print(f"Electricity bill for {units} units is
{calculate_electricity_bill(units)}")
11
Types of Loops in Python
While Loop
- Use: Execute a block of code repeatedly as long as a condition is true.
- Syntax: python while condition:
# code block
python i = 1
while i <= 10:
print(i)
i += 1
For Loop
- Use: Iterate over a sequence (like a list, tuple, dictionary, set,
or string) with a definite start and end.
- Syntax: python for item in sequence:
# code block
- While Loop:
- Best for indefinite iterations until a condition is met.
- More flexible but can lead to infinite loops if the condition is not managed properly.
12
Understanding Infinite Loops and Controlling Loops
An infinite loop runs indefinitely because the terminating condition is never met.
To control loops, you can use break to exit the loop and continue to skip to the next iteration.
Summary
- Handling Multiple Conditions: Essential for decision-making in tax calculation, electricity bill
calculation, etc.
- Types of Loops: While loop (indefinite iterations) and For loop (definite iterations).
- While Loop: Best for conditions that require checking before each iteration.
- For Loop: Ideal for iterating over collections and known ranges.
- Infinite Loops: Must be controlled to prevent programs from running indefinitely using break and
continue.
Understanding and using these loops and conditions efficiently is crucial for solving various
computational problems.
13
Week 3
Mutable Collections:
- List: A dynamic array capable of storing heterogeneous items.
- Dictionary: A hash map storing key-value pairs.
- Set: An unordered collection of unique elements.
Immutable Collections:
- Tuple: A fixed-size, ordered collection.
- String: A sequence of characters.
List
Introduction
A list in Python is an ordered, mutable collection of items. It allows for dynamic sizing and supports
heterogeneous data types.
Functions
- Append: Add an item to the end.
python
my_list.append(5)
- Extend: Add multiple items.
python
my_list.extend([6, 7])
- Remove: Remove the first occurrence of an item. python my_list.remove(2)
14
Real-world Problem
- Task management: Lists can be used to manage to-do lists.
- Data aggregation: Combine data from different sources into a single list.
Tuple
Introduction
A tuple is an immutable, ordered collection of items. They are used to store multiple items in a single
variable.
15
Strings
Introduction
Strings are immutable sequences of characters used to store text data.
Functions
- Upper: Convert to uppercase. python my_string.upper() #
'HELLO'
16
Dictionary
Introduction
A dictionary is a mutable, unordered collection of key-value pairs. Keys are unique and immutable.
Functions
- Keys: Get all keys.
python
my_dict.keys() # dict_keys(['name', 'age'])
# Add items
inventory['oranges'] = 8
This project helps in understanding how dictionaries can be used to track and manage inventory
efficiently, utilizing their key-value structure to quickly access and update quantities.
17
Week 4
Data Analysis:
The process of inspecting, cleansing, transforming, and modeling data to discover useful
information, inform conclusions, and support decision-making. Life Cycle of a
18
Introduction to Numpy
- Creating Arrays: Arrays can be created using functions like np.array(), np.zeros(), np.ones(), and
np.arange().
- Dealing with Arrays: Operations include indexing, slicing, and reshaping.
Introduction to Pandas
- Applications: Used for data manipulation and analysis in Data Science projects. -
Data Structures: Includes Series and DataFrame.
19
Solving Analytical Questions on Dummy Data for Practice
- Practice solving analytical questions to improve problem-solving skills and understanding of data
handling.
- Use pandas and numpy to manipulate and analyze dummy data sets.
20
Week 5
21
Data Cleaning Operations
Removing Duplicates:
- Use .drop_duplicates() to remove duplicate rows.
- Example: df.drop_duplicates(inplace=True)
Renaming Columns:
- Use .rename() to rename columns for clarity.
String Operations:
- Apply string functions to clean textual data, such as .str.strip() to remove leading and
trailing spaces.
22
Employee Dataset Analysis
Loading the Dataset:
- Use pandas.read_csv() to load a CSV file.
- Example: df = pd.read_csv('employee_data.csv')
Basic Analysis:
- Descriptive statistics: df.describe()
- Value counts: df['department'].value_counts()
Basic Analysis:
- Top selling games: df.sort_values(by='global_sales', ascending=False).head(10)
- Sales by platform:
df.groupby('platform')['global_sales'].sum().sort_values(ascending=False)
Advanced Analysis:
- Year-wise trend: df.groupby('year')['global_sales'].sum().plot()
- Regional preferences: Analysis of sales in different regions (NA_sales, EU_sales, etc.)
23
Week 6
Creating Analytical Questions
Formulating Questions:
- Identify key metrics: revenue, growth, performance, etc.
- Examples:
Types of Questions:
- Descriptive: What happened? (e.g., "What are the total sales of each game?")
- Diagnostic: Why did it happen? (e.g., "Why did sales increase in 2020?")
- Predictive: What will happen? (e.g., "What will be the sales next year?")
- Prescriptive: What should we do? (e.g., "Which marketing strategy should be used to increase
sales?")
Summary
Data cleaning and analysis involve multiple steps to ensure data quality and derive meaningful
insights. Techniques like handling null values using fillna() and dropna(), removing duplicates, and
performing string operations are fundamental. Analyzing specific datasets like employee data and
video game sales provides a practical understanding of applying these techniques. Creating analytical
questions helps guide the analysis and extract actionable insights.
24
About Dataset
This file contains detailed information about data professionals, including their
salaries, designations, departments, and more. The data can be used for salary
prediction, trend analysis, and HR analytics.
The dataset was compiled from internal HR records of a hypothetical company. Each
record represents a unique data professional with various attributes collected from
their employment history.
The data spans from 2009 to 2016, capturing a snapshot as of January 7, 2016.
Data Insight
Column Descriptors
FIRST NAME: First name of the data professional (String)
SEX: Gender of the data professional (String: 'F' for Female, 'M' for Male)
DOJ (Date of Joining): The date when the data professional joined the company (Date
in MM/DD/YYYY format)
CURRENT DATE: The current date or the snapshot date of the data (Date in
MM/DD/YYYY format)
DESIGNATION: The job role or designation of the data professional (String: e.g.,
Analyst, Senior Analyst, Manager)
UNIT: Business unit or department the data professional works in (String: e.g., IT,
Finance, Marketing)
25
LEAVES REMAINING: Number of leaves remaining for the data professional
(Integer)
PAST EXP: Past work experience in years before joining the current company (Float)
26
Conclusion
The dataset provides a comprehensive overview of data professionals within a
hypothetical company, encompassing detailed information about their salaries,
designations, departments, and other pertinent attributes collected from their
employment history. Spanning from 2009 to 2016, this data offers valuable insights
into the career trajectories and compensation trends of data professionals over a
significant period.
27
Project
Q1: Display the first few rows to verify the new feature
28
Q3: Drop the null values
29
Conclusion: Duplicate values dropped
30
Q6: Convert 'DOJ' and 'CURRENT DATE' to date time
31
Q7: Replace the null value of age with its mean
32
Q10:Check how many male and females are in data
Conclusions: There are 1255 Females and 1215 Males present in data
33
Q13:Display the senior Analyst salary
Conclusions: In Age between 30 to 35 there are units like marketing, operation, finance, management and IT
Q15:How many years has each employee been with the company?
Conclusion: Employees has been for more than 1 year with company
34
Q16: What is the average number of leaves used and remaining across all employees?
Conclusion: Average number of leaves uses is 22.490 and Average number of leaves remaining if 7.509
35
Q19:How many employees are there in total.
Q22: How many employees have used more than half of their allowed leaves?
Conclusion: Number of employees with more than half of their leaves used is 2470
36
Q23: What is the average rating of employees?
Conclusion: Analyst, Senior analyst, Associate , Senior manager , Manager , director are unique designations
in the company
Conclusion: There are 16.84% units of finance, 17.65% units of IT, 16.07 units percent of management,
16.07 units percent of web, 16.56 units percent of marketing, 16.80 units percent of operations
38
Q27 Display the Designation using Pie Chart
Conclusion: There are 74.82% Analyst,12.19 Senior Analyst,6.28% Associate,3.20% Manager,2.43% Senior
manager, 1.09% Director
Conclusion: There are 50.81% females in company and 49.19% of males in company
39
Q29 Display the salary of units y using Bar graph
Conclusion: This Bar graph shows the salary of Finance units is between 350000 to 400000
Salary of IT units is 350000
Salary of marketing units is between 350000 to 400000
Salary of Operation units is between 300000 to 350000
Salary of web unit is between 300000 to 350000
Salary of management units is between 350000 to 400000
40
Q30 Display the salary of designations y using Bar graph
Conclusion: This Bar graph shows the salary of Analyst is between 50000 to 100000
Salary of Senior Analyst is between 50000 to 100000
Salary of Associate is 100000
Salary of senior manager is 200000
Salary of manager is 150000
Salary of Director is between 350000 to 400000
41