0% found this document useful (0 votes)
41 views45 pages

Finall Report Internship

Uploaded by

owaissid001
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views45 pages

Finall Report Internship

Uploaded by

owaissid001
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

A Report

On
“Internship Report”
Submitted on 22/07/2024
By

Khan Huda Fatima 2205690306


30

Under Guidance of

“Alfiya Sayyed”
In
Three Years Diploma Program in Engineering & Technology of
Maharashtra State Board of Technical Education, Mumbai (Autonomous)

ISO 9001:2015

At
Anjuman-I-Islam’s Abdul Razzaq Kalsekar Polytechnic

Academic Year [2024 - 2025]


INDEX PAGE

SR.NO TOPIC Page.no

1 Acknowledgement 2

2 Abstract 3

3 Python 4

4 Numpy 5

5 Pandas 6

6 Anaconda navigator 7

7 Weekly report 8

About Dataset
8 and 25
Data Insight

9 Conclusion 27

10 Project 28

1
Acknowledgement

I am immensely grateful to Milestone PLM Solution PVT LTD for providing me


with the opportunity to intern as a Data Analytics. My deepest appreciation goes to
Mr. Chaitanya Sathe, IT head Manager of Milestone PLM Solution PVT LTD, for
offering me this invaluable experience and guiding the company with such expertise
and vision.

I would like to extend my heartfelt thanks to my mentor, Ms. Alfiya Sayyed, whose
guidance, support, and encouragement have been instrumental throughout my
internship. Her insights and expertise have significantly contributed to my learning
and development in the field of Data Analytics.

This internship has been a pivotal step in my career, equipping me with practical
skills and hands-on experience in Data Analysis. I have gained a deeper
understanding of modern technologies, responsive design, and user experience
principles. The challenges and projects I encountered have enhanced my problem-
solving abilities and prepared me for future professional endeavors.

Lastly, I would like to thank my institute, AIRKP, for facilitating this internship and
supporting my academic and professional growth.

Thank you all for your continuous support and belief in my abilities

2
Abstract
This abstract presents a Data analysis of the industrial training experience in Python during an
internship at Milestone PLM Solutions, using Python with Anaconda Navigator, and leveraging
libraries such as NumPy and Pandas for data manipulation and analysis.

The internship aimed to provide practical knowledge and hands-on skills in Python within a corporate
environment. Beginning with an introduction to the internship context, the abstract emphasizes the
significance of web development in modern digital landscapes and its impact on business operations
and end-user experiences.

Throughout the internship, various projects and assignments served as practical datasets, enabling the
application of theoretical knowledge to real-world web development challenges. Python's Pandas
library facilitated the analysis of project outcomes, highlighting improvements in problem-solving
abilities and project management skills.

Moreover, the abstract explores the development of soft skills such as effective communication and
teamwork, crucial in a corporate setting like Milestone PLM Solutions. These skills were essential
for collaborating with team members, debugging complex code, and meeting project deadlines.

Reflecting on personal and professional growth, the abstract discusses the transformative impact of
the internship experience. It underscores the enhancement of self-confidence and motivation to
specialize further in specific areas of web development.

In conclusion, the abstract emphasizes the role of experiential learning in bridging theoretical
knowledge with practical applications, preparing interns for successful careers in the dynamic field
of web development. Python, Anaconda Navigator, NumPy, and Pandas emerged as essential tools
for analyzing and interpreting internship experiences, laying the foundation for future professional
endeavors.

3
Python

Python is a high-level, interpreted programming language known for its simplicity and readability. It
was created by Guido van Rossum and first released in 1991. Python's design philosophy emphasizes
code readability and a syntax that allows programmers to express concepts in fewer lines of code
compared to other programming languages like C++ or Java.

Key Features and Characteristics

1. Easy to Learn and Use: Python's syntax is clear, making it accessible for beginners and
experienced programmers alike. Its simplicity encourages rapid development and
prototyping.
2. Interpreted and Interactive: Python is an interpreted language, meaning that code is
executed line by line, which allows for interactive testing and debugging.
3. Cross-platform: Python is available on multiple platforms (Windows, macOS, Linux) and is
compatible with major operating systems.
4. Extensive Standard Library: Python's standard library is large and comprehensive,
providing modules and packages for tasks ranging from web development and networking to
file operations and data manipulation.

Applications and Use Cases

1. Web Development: Python is widely used for server-side web development frameworks like
Django and Flask, facilitating rapid development of web applications.
2. Data Science and Machine Learning: Python's libraries such as NumPy, Pandas,
Matplotlib, and scikit-learn are popular in data analysis, visualization, and machine learning.
3. Scripting and Automation: Python is a preferred language for scripting tasks, automation of
repetitive tasks, and system administration.
4. Scientific Computing: Python is used in scientific computing environments for simulations,
numerical analysis, and computational mathematics.

Community and Ecosystem

1. Active Community: Python has a large and active community of developers and contributors
who collaborate on improving the language and its ecosystem.
2. Third-party Libraries: Python's package index, PyPI (Python Package Index), hosts
thousands of third-party libraries and frameworks, extending its capabilities for various
domains.
3. Versatility: Python's versatility and wide adoption across industries make it a valuable skill
for developers, data scientists, engineers, and researchers.

Conclusion

Python's versatility, simplicity, and robust ecosystem make it a popular choice for a wide range of
applications, from web development and data science to automation and scientific computing. Its
continued growth in popularity is driven by its ease of use, extensive libraries, and active community
support, making Python an excellent choice for both beginners and seasoned developers.

4
Numpy
NumPy (Numerical Python) is a fundamental library in Python for numerical computing. It provides
support for large, multi-dimensional arrays and matrices, along with a collection of mathematical
functions to operate on these arrays efficiently.

Key Features:

• ndarray: A powerful N-dimensional array object for efficient storage and manipulation of
data.
• Mathematical Functions: Comprehensive suite of mathematical functions for operations on
arrays.
• Broadcasting: Allows operations on arrays of different shapes without explicitly creating
copies of data.
• Integration: Seamless integration with other libraries like Pandas, Matplotlib, and scikit-
learn.

Applications:

• Data Analysis: Essential for manipulating and processing large datasets.


• Scientific Computing: Used extensively in simulations, modeling, and numerical
computations.
• Machine Learning: Backbone for algorithms in data preprocessing and model training.

Community and Support:

• Active Development: Continuously maintained and improved by a dedicated community.


• Resources: Extensive documentation and tutorials available for learning and support.

Conclusion: NumPy's efficient array operations and mathematical functions make it indispensable
for scientific computing, data analysis, and machine learning in Python.

5
Pandas
Pandas is a powerful library in Python for data manipulation and analysis. It provides data structures
and operations for manipulating numerical tables and time series. Pandas is built on top of NumPy
and is often used in conjunction with it for data preprocessing and cleaning tasks.

Key Features:

• DataFrame: A two-dimensional, labeled data structure with columns of potentially different


types. It is similar to a spreadsheet or SQL table.
• Series: A one-dimensional labeled array capable of holding any data type (integers, strings,
floating point numbers, Python objects, etc.).
• Data Alignment: Automatically aligns data from different data structures based on label.
• Indexing and Selection: Provides intuitive ways to slice, index, and subset data.
• GroupBy: Allows for splitting data into groups based on criteria and applying functions to
each group independently.
• Merge and Join: Enables combining data from different sources based on common columns
or indices.

Applications:

• Data Analysis: Essential for data cleaning, transformation, and exploration tasks.
• Time Series Analysis: Supports handling time series data and performing operations such as
resampling and time zone handling.
• Data Visualization: Integrates with libraries like Matplotlib and Seaborn for visualization of
data stored in Pandas structures.
• Data Import/Export: Capable of reading from and writing to various file formats, including
CSV, Excel, SQL databases, and JSON.

Community and Support:

• Active Development: Continuously updated and improved by a large community of


contributors.
• Documentation: Extensive documentation and resources available, including tutorials and
examples.

Conclusion: Pandas simplifies data manipulation and analysis tasks in Python, making it a popular
choice for data scientists, analysts, and researchers working with structured data.

6
Anaconda Navigator
Anaconda Navigator is a graphical user interface (GUI) included with the Anaconda distribution of
Python. It simplifies package management and deployment of data science and machine learning
environments.

Key Features:

• Package Management: Allows users to easily install, update, and manage Python packages,
including popular libraries like NumPy, Pandas, and TensorFlow.
• Environment Management: Facilitates the creation and management of Python
environments with different package versions, ensuring compatibility for different projects.
• Integrated Development Environments (IDEs): Provides access to popular Python IDEs
such as Jupyter Notebook, JupyterLab, Spyder, and VS Code, enhancing productivity for data
analysis and development tasks.
• Navigator Interface: Offers an intuitive interface for launching applications, managing
environments, and accessing documentation and support resources.
• Cross-Platform: Available for Windows, macOS, and Linux, ensuring consistency and
usability across different operating systems.

Applications:

• Data Science: Ideal for setting up and maintaining environments for data analysis, machine
learning, and scientific computing.
• Education and Training: Used in educational settings to provide a consistent environment
for teaching Python and data science concepts.
• Research and Development: Supports rapid prototyping and experimentation with different
libraries and tools required for research projects.
• Community and Support: Backed by a supportive community and comprehensive
documentation, offering resources for troubleshooting and learning.

Conclusion: Anaconda Navigator simplifies the management of Python environments and packages,
making it an essential tool for data scientists, researchers, and developers working in Python-based
data analysis and machine learning.

7
Weekly Report
Week 1

Understanding of Programming Languages


Programming languages are tools used to write instructions that a computer can execute. They
provide a means for programmers to communicate with computers to perform various tasks,
automate processes, analyze data, and more.

Real-World Uses of Python Programming


Python is a versatile and widely-used programming language in various fields:

1. Web Development: Frameworks like Django and Flask enable rapid web application development.
2. Data Science and Machine Learning: Libraries like pandas, NumPy.
6. Game Development: Libraries like Pygame help in developing simple games.

Why Python is better


Python is considered better in many contexts due to its:

- Ease of Learning: Simple and readable syntax.


- Versatility: Wide range of applications from web development to data science.
- Community and Support: Large community and extensive libraries. - Productivity: Fast
development cycle with less code.

Different Environments to Write Python Code


1. IDEs: Integrated Development Environments like PyCharm, VS Code, and Spyder.
2. Text Editors: Simpler options like Sublime Text, Atom, and Notepad++.
3. Jupyter Notebooks: Ideal for data analysis and visualization.
4. Online Platforms: Platforms like Google Colab, Repl.it, and JupyterHub.

Rules for Variables


1. Must start with a letter or underscore (_).
2. Cannot start with a number.
3. Can contain letters, numbers, and underscores.
4. Are case-sensitive.

8
Data Types in Python
1. Numeric Types: int, float, complex
2. Sequence Types: list, tuple, range
3. Text Type: str
4. Mapping Type: dict
5. Set Types: set, frozenset
6. Boolean Type: bool
7. Binary Types: bytes, bytearray, memoryview

Uses of Different Operators


1. Arithmetic Operators: +, -, *, /, %, //,
2. Comparison Operators: ==, !=, >, <, >=, <=
3. Logical Operators: and, or, not
4. Assignment Operators: =, +=, -=, *=, /=, %=
5. Bitwise Operators: &, |, ^, ~, <<, >> Understanding Decisional Statements

Decisional statements help in making decisions based on conditions.


They include:
1. if Statement:
python if
condition:
# code to execute if condition is true

2. if-else
Statement:
python if
condition:
# code to execute if condition is true else:
# code to execute if condition is false

3. elif
Statement:
python if
condition1:
# code to execute if condition1 is true elif
condition2:
# code to execute if condition2 is true else:
# code to execute if all conditions are false

9
Summary
Python is a powerful, easy-to-learn programming language used in various real-world applications
such as web development, data science, automation, and more. Writing Python code can be done in
different environments like IDEs, text editors, Jupyter Notebooks, and online platforms.
Understanding variables, data types, operators, and decisional statements is fundamental to solving
real-world problems efficiently

10
Week 2
Handling Multiple Conditions Problem Solving
Tax Calculation Based on Salary
To calculate tax based on salary, multiple conditions are used to apply different tax rates for different
salary ranges.

Example:

python def
calculate_tax(salary):
if salary <= 10000: tax
=0
elif salary <= 20000:
tax = salary *0.1
elif salary <= 50000:
tax = salary *0.2
else: tax = salary
*0.3
return tax

# Example usage: salary = 35000 print(f"Tax for salary


{salary} is {calculate_tax(salary)}")

Electricity Bill Calculation


Similarly, electricity bill calculations can use multiple conditions for different consumption slabs.

Example:

python def
calculate_electricity_bill(units):
if units <= 100:
bill = units *5
elif units <= 200: bill = 100 *5 +
(units - 100) *7
else:
bill = 100 *5 + 100 *7 + (units - 200) *10
return bill

Example usage:
units = 250 print(f"Electricity bill for {units} units is
{calculate_electricity_bill(units)}")

11
Types of Loops in Python
While Loop
- Use: Execute a block of code repeatedly as long as a condition is true.
- Syntax: python while condition:
# code block

Solving Problems Using While Loop Example:

Printing numbers from 1 to 10.

python i = 1
while i <= 10:
print(i)
i += 1

For Loop
- Use: Iterate over a sequence (like a list, tuple, dictionary, set,
or string) with a definite start and end.
- Syntax: python for item in sequence:
# code block

Traversing Collections Using For Loop

Example: Traversing a dictionary.

python data = {"a": 1, "b": 2, "c": 3} for


key, value in data.items(): print(f"Key:
{key}, Value: {value}")

Comparing For Loop and While Loop


- For Loop:
- Best for iterating over a known sequence or a range.
- More readable when the number of iterations is known.

- While Loop:
- Best for indefinite iterations until a condition is met.
- More flexible but can lead to infinite loops if the condition is not managed properly.

12
Understanding Infinite Loops and Controlling Loops
An infinite loop runs indefinitely because the terminating condition is never met.

Example of an infinite loop:

python while True: print("This


will run forever!")

To control loops, you can use break to exit the loop and continue to skip to the next iteration.

Example using break and continue:

python i = 0 while i < 10: i += 1 if i == 5: continue # skip


the rest of the code block when i is 5
if i == 8:
break # exit the loop when i is 8
print(i)

Summary
- Handling Multiple Conditions: Essential for decision-making in tax calculation, electricity bill
calculation, etc.
- Types of Loops: While loop (indefinite iterations) and For loop (definite iterations).
- While Loop: Best for conditions that require checking before each iteration.
- For Loop: Ideal for iterating over collections and known ranges.
- Infinite Loops: Must be controlled to prevent programs from running indefinitely using break and
continue.

Understanding and using these loops and conditions efficiently is crucial for solving various
computational problems.

13
Week 3

Importance of Collection and Types of Collection in Python


Collections in Python are essential for storing, organizing, and managing data efficiently. They come
in various types, each suited for different use cases. Collections can be mutable or immutable,
affecting how they can be manipulated.

Mutable Collections:
- List: A dynamic array capable of storing heterogeneous items.
- Dictionary: A hash map storing key-value pairs.
- Set: An unordered collection of unique elements.

Immutable Collections:
- Tuple: A fixed-size, ordered collection.
- String: A sequence of characters.

List
Introduction
A list in Python is an ordered, mutable collection of items. It allows for dynamic sizing and supports
heterogeneous data types.

Indexing and Slicing


- Indexing: Access elements using their position, starting from 0.
python
my_list = [1, 2, 3, 4] first_element
= my_list[0] # 1

- Slicing: Extract a subset of the list. python


sub_list = my_list[1:3] # [2, 3]

Functions
- Append: Add an item to the end.
python
my_list.append(5)
- Extend: Add multiple items.
python
my_list.extend([6, 7])
- Remove: Remove the first occurrence of an item. python my_list.remove(2)

14
Real-world Problem
- Task management: Lists can be used to manage to-do lists.
- Data aggregation: Combine data from different sources into a single list.

Tuple
Introduction
A tuple is an immutable, ordered collection of items. They are used to store multiple items in a single
variable.

Indexing and Slicing


- Indexing: Access elements similarly to lists.
python my_tuple = (1, 2, 3)
first_element = my_tuple[0] # 1

- Slicing: Extract parts of the tuple. python


sub_tuple = my_tuple[1:3] # (2, 3)

Comparing with List


- Mutability: Tuples are immutable; lists are mutable.
- Performance: Tuples are generally faster than lists due to immutability.

Deciding When to Use


- Lists: When you need a dynamic, modifiable collection.
- Tuples: When you need a fixed collection, especially for keys in dictionaries or elements in sets.

15
Strings
Introduction
Strings are immutable sequences of characters used to store text data.

Indexing and Slicing


- Indexing: Access characters using positions.
python my_string =
"Hello"
first_char = my_string[0] # 'H'

- Slicing: Extract substrings.


python
sub_string = my_string[1:4] # 'ell'

Functions
- Upper: Convert to uppercase. python my_string.upper() #
'HELLO'

- Find: Locate the position of a substring.


python
my_string.find('e') # 1

Importance in Data Domain


- Data Cleaning: Preprocessing text data for analysis.
- Pattern Matching: Searching for patterns using regular expressions.

16
Dictionary
Introduction
A dictionary is a mutable, unordered collection of key-value pairs. Keys are unique and immutable.

Accessing Values Using Keys


- Access: Retrieve a value by its key. python
my_dict = {'name': 'Alice', 'age': 25} age
= my_dict['age'] # 25

Functions
- Keys: Get all keys.
python
my_dict.keys() # dict_keys(['name', 'age'])

- Values: Get all values.


Python

my_dict.values() # dict_values(['Alice', 25])

Mini Project: Inventory Management


Using a dictionary to manage an
inventory: python inventory = {'apples':
10, 'bananas': 5}

# Add items
inventory['oranges'] = 8

# Update quantities inventory['apples']


+= 5

# Remove items del


inventory['bananas']

# Display inventory for item, quantity


in inventory.items():
print(f"{item}: {quantity}")

This project helps in understanding how dictionaries can be used to track and manage inventory
efficiently, utilizing their key-value structure to quickly access and update quantities.

17
Week 4

Understanding Data Science and Data Analysis


Data Science:
A field involving the use of scientific methods, algorithms, and systems to extract insights from
structured and unstructured data.

Data Analysis:
The process of inspecting, cleansing, transforming, and modeling data to discover useful
information, inform conclusions, and support decision-making. Life Cycle of a

Data Science Project


1. Problem Definition: Identifying and understanding the problem.
2. Data Collection: Gathering relevant data from various sources.
3. Data Cleaning and Preprocessing: Removing errors, handling missing values, and preparing data
for analysis.
4. Exploratory Data Analysis (EDA): Analyzing data sets to summarize their main characteristics.
5. Modeling: Applying statistical models or machine learning algorithms.
6. Evaluation: Assessing the performance of the models.
7. Deployment: Implementing the model in a production environment.
8. Monitoring and Maintenance: Ensuring the model remains accurate over time.

Importance of Data Cleaning and Preprocessing


Data cleaning and preprocessing are crucial as they:
- Ensure data quality and accuracy.
- Improve the performance of machine learning models.
- Help in discovering insights by removing noise and inconsistencies.

Analytical and Numeric Libraries for Data Science and Analysis


- Numpy: Provides support for large multi-dimensional arrays and matrices, along with a collection
of mathematical functions.
- Pandas: Offers data structures and data analysis tools for handling structured data.

18
Introduction to Numpy
- Creating Arrays: Arrays can be created using functions like np.array(), np.zeros(), np.ones(), and
np.arange().
- Dealing with Arrays: Operations include indexing, slicing, and reshaping.

Properties and Statistical Functions of Numpy


- Properties: Arrays have attributes such as shape, size, and dtype.
- Statistical Functions: Includes mean, median, standard deviation, variance, etc.

Creating 2D and 3D Arrays


- 2D Arrays: Created using nested lists or functions like np.zeros((rows, cols)).
- 3D Arrays: Created using nested lists or functions like np.zeros((depth, rows, cols)).

Introduction to Pandas
- Applications: Used for data manipulation and analysis in Data Science projects. -
Data Structures: Includes Series and DataFrame.

Creating Series and DataFrame


- Series: One-dimensional labeled array capable of holding data of any type.
- DataFrame: Two-dimensional labeled data structure with columns of potentially different types.

Important Properties and Functions for Preprocessing


- Properties: DataFrame has attributes like shape, columns, index, and dtypes.
- Functions: Includes head(), tail(), describe(), info(), dropna(), fillna(), etc.

19
Solving Analytical Questions on Dummy Data for Practice
- Practice solving analytical questions to improve problem-solving skills and understanding of data
handling.
- Use pandas and numpy to manipulate and analyze dummy data sets.

Preparation to Handle Large Data


- Techniques like data sampling, chunking, and using efficient libraries.
- Leveraging pandas and numpy capabilities to work with large datasets.

Loading CSV Files for Preprocessing and Analysis


- Loading Data: Using pd.read_csv() to load CSV files into a DataFrame.
- Preprocessing: Cleaning and transforming data for analysis using pandas functions.

20
Week 5

Finding and Handling Null Values


Identifying Null Values:
- Use methods like .isnull() or .isna() to detect missing values.
- Example: df.isnull().sum() gives the count of null values in each column.

Handling Null Values:


- fillna(): Replace null values with a specified value or method.
- Example: df['column'].fillna(value, inplace=True) replaces null values in 'column'
with 'value'.

- Methods include using mean, median, or mode:


df['column'].fillna(df['column'].mean(), inplace=True)

- dropna(): Remove rows or columns with null values.

- Example: df.dropna(axis=0, how='any', inplace=True) drops rows with any null


values.

- To drop columns: df.dropna(axis=1, how='any', inplace=True)

-prop(): Custom function to handle nulls based on specific conditions.

21
Data Cleaning Operations
Removing Duplicates:
- Use .drop_duplicates() to remove duplicate rows.

- Example: df.drop_duplicates(inplace=True)
Renaming Columns:
- Use .rename() to rename columns for clarity.

- Example: df.rename(columns={'old_name': 'new_name'}, inplace=True)

Converting Data Types:


- Use .astype() to change the data type of a column.

- Example: df['column'] = df['column'].astype('int')

String Operations:
- Apply string functions to clean textual data, such as .str.strip() to remove leading and
trailing spaces.

- Example: df['column'] = df['column'].str.strip()

22
Employee Dataset Analysis
Loading the Dataset:
- Use pandas.read_csv() to load a CSV file.
- Example: df = pd.read_csv('employee_data.csv')

Basic Analysis:
- Descriptive statistics: df.describe()
- Value counts: df['department'].value_counts()

Handling Specific Issues:


- Dealing with missing salaries: df['salary'].fillna(df['salary'].median(), inplace=True)
- Standardizing job titles: df['job_title'] = df['job_title'].str.lower()

Video Game Sales Analysis


Loading the Dataset:
- Use pandas.read_csv() to load the dataset.
- Example: df = pd.read_csv('video_game_sales.csv')

Basic Analysis:
- Top selling games: df.sort_values(by='global_sales', ascending=False).head(10)
- Sales by platform:
df.groupby('platform')['global_sales'].sum().sort_values(ascending=False)

Advanced Analysis:
- Year-wise trend: df.groupby('year')['global_sales'].sum().plot()
- Regional preferences: Analysis of sales in different regions (NA_sales, EU_sales, etc.)

23
Week 6
Creating Analytical Questions

Formulating Questions:
- Identify key metrics: revenue, growth, performance, etc.

- Examples:

- What are the factors influencing employee turnover?

- Which platforms have the highest video game sales?

Types of Questions:

- Descriptive: What happened? (e.g., "What are the total sales of each game?")

- Diagnostic: Why did it happen? (e.g., "Why did sales increase in 2020?")

- Predictive: What will happen? (e.g., "What will be the sales next year?")

- Prescriptive: What should we do? (e.g., "Which marketing strategy should be used to increase
sales?")

Summary
Data cleaning and analysis involve multiple steps to ensure data quality and derive meaningful
insights. Techniques like handling null values using fillna() and dropna(), removing duplicates, and
performing string operations are fundamental. Analyzing specific datasets like employee data and
video game sales provides a practical understanding of applying these techniques. Creating analytical
questions helps guide the analysis and extract actionable insights.

24
About Dataset
This file contains detailed information about data professionals, including their
salaries, designations, departments, and more. The data can be used for salary
prediction, trend analysis, and HR analytics.

The dataset was compiled from internal HR records of a hypothetical company. Each
record represents a unique data professional with various attributes collected from
their employment history.
The data spans from 2009 to 2016, capturing a snapshot as of January 7, 2016.

Data Insight
Column Descriptors
FIRST NAME: First name of the data professional (String)

LAST NAME: Last name of the data professional (String)

SEX: Gender of the data professional (String: 'F' for Female, 'M' for Male)

DOJ (Date of Joining): The date when the data professional joined the company (Date
in MM/DD/YYYY format)

CURRENT DATE: The current date or the snapshot date of the data (Date in
MM/DD/YYYY format)

DESIGNATION: The job role or designation of the data professional (String: e.g.,
Analyst, Senior Analyst, Manager)

AGE: Age of the data professional (Integer)

SALARY: Annual salary of the data professional (Float)

UNIT: Business unit or department the data professional works in (String: e.g., IT,
Finance, Marketing)

LEAVES USED: Number of leaves used by the data professional (Integer)

25
LEAVES REMAINING: Number of leaves remaining for the data professional
(Integer)

RATINGS: Performance ratings of the data professional (Float)

PAST EXP: Past work experience in years before joining the current company (Float)

26
Conclusion
The dataset provides a comprehensive overview of data professionals within a
hypothetical company, encompassing detailed information about their salaries,
designations, departments, and other pertinent attributes collected from their
employment history. Spanning from 2009 to 2016, this data offers valuable insights
into the career trajectories and compensation trends of data professionals over a
significant period.

In conclusion, the dataset serves as a crucial tool for understanding the


evolving landscape of data professionals within the company, providing a solid
foundation for data-driven decision-making in HR management and organizational
planning.

27
Project

Q1: Display the first few rows to verify the new feature

Conclusion: Rows Displayed Successfully

Q2: Check for missing values

Conclusion: There are 13 missing values

28
Q3: Drop the null values

Conclusion: Null values dropped

Q4: Drop the duplicate values

29
Conclusion: Duplicate values dropped

Q5: Provide salary increase recommendations based on PAST EXPERIENCE

Conclusion: Salary increase recommended based on PAST EXPERIENCE

30
Q6: Convert 'DOJ' and 'CURRENT DATE' to date time

Conclusion: 'DOJ' and 'CURRENT DATE' converted to date time

31
Q7: Replace the null value of age with its mean

Conclusions: Null values of age replaced with its mean

Q8:Replace null values of salary column with its mean

Conclusions: Null values of salary column replaced with its mean

Q9:Replace the name of cherry with strawberry

Conclusions: Successfully replaced cherry with strawberry

32
Q10:Check how many male and females are in data

Conclusions: There are 1255 Females and 1215 Males present in data

Q11:Check females salary

Conclusions: Hence this is females salary

Q12:Check Males salary

Conclusions: Hence this is males salary

33
Q13:Display the senior Analyst salary

Conclusions: Hence this is Senior analyst salary

Q14:The age between 30 to 35 display the unit

Conclusions: In Age between 30 to 35 there are units like marketing, operation, finance, management and IT

Q15:How many years has each employee been with the company?

Conclusion: Employees has been for more than 1 year with company

34
Q16: What is the average number of leaves used and remaining across all employees?

Conclusion: Average number of leaves uses is 22.490 and Average number of leaves remaining if 7.509

Q17:display the average salary or females.

Conclusion: Average salary of female employee is 58998.41

Q18:display the average salary or males.

Conclusion: Average salary of male employee is 58998.41

35
Q19:How many employees are there in total.

Conclusion: Total number of employees is 2470

Q20:What is the average age of employees?

Conclusion: Average age of employees is 24.73 year

Q21: What is the highest salary among employees?

Conclusion: Highest salary among employees is 388112

Q22: How many employees have used more than half of their allowed leaves?

Conclusion: Number of employees with more than half of their leaves used is 2470

36
Q23: What is the average rating of employees?

Conclusion: Average ratings of employee is 3.48

Q24: What are the unique designations in the company?

Conclusion: Analyst, Senior analyst, Associate , Senior manager , Manager , director are unique designations
in the company

Q25: Drop the columns

Conclusion: Columns are dropped


37
Q26 Display the units using Pie Chart

Conclusion: There are 16.84% units of finance, 17.65% units of IT, 16.07 units percent of management,
16.07 units percent of web, 16.56 units percent of marketing, 16.80 units percent of operations

38
Q27 Display the Designation using Pie Chart

Conclusion: There are 74.82% Analyst,12.19 Senior Analyst,6.28% Associate,3.20% Manager,2.43% Senior
manager, 1.09% Director

Q28 Display the Gender using Pie Chart

Conclusion: There are 50.81% females in company and 49.19% of males in company

39
Q29 Display the salary of units y using Bar graph

Conclusion: This Bar graph shows the salary of Finance units is between 350000 to 400000
Salary of IT units is 350000
Salary of marketing units is between 350000 to 400000
Salary of Operation units is between 300000 to 350000
Salary of web unit is between 300000 to 350000
Salary of management units is between 350000 to 400000

40
Q30 Display the salary of designations y using Bar graph

Conclusion: This Bar graph shows the salary of Analyst is between 50000 to 100000
Salary of Senior Analyst is between 50000 to 100000
Salary of Associate is 100000
Salary of senior manager is 200000
Salary of manager is 150000
Salary of Director is between 350000 to 400000

41

You might also like