Internship Report 2
Internship Report 2
BACHELOR OF ENGINEERING
in
COMPUTER ENGINEERING
by
Bhure Sangmeshwar Dattatray
Under Supervision of
Mr. Amol Pawar
Sumago Infotech Pvt. Ltd.,Pune
(Duration: 26th December, 2024 to 06th February 2025)
CERTIFICATE
............
This is to certify that the “Internship Report” submitted by Bhure Sangmeshwar Dattatray is
work done by her and submitted during 2024 – 2025 academic year, in partial fulfillment of the
requirements for the award of the degree of BACHELOR OF ENGINEERING in COMPUTER
ENGINEERING,at Sumago Infotech Pvt. Ltd. Pune .
First I would like to thank Mr. Amol Pawar , HR, Head, of Sumago Infotech Pvt.Ltd,Pune
for giving me the opportunity to do an internship within the organization.
I also would like all the people that worked along with Sumago Infotech Pvt.Ltd,Pune with
their patience and openness they created an enjoyable working environment.
It is indeed with a great sense of pleasure and immense sense of gratitude that I acknowledge
the help of these individuals.
I am highly indebted to Director Dr. Galhe Sir and Principal Dr. D.J.Garkal, for the facili-
ties provided to accomplish this internship.
I would like to thank my Head of the Department Dr. A.A.Khatri for his constructive criti-
cism throughout my internship.
I would like to thank Prof. S.Y.Mandlik, for their support and advices to get and complete
internship in above said organization.
I am extremely great full to my department staff members and friends who helped me in
successful completion of this internship..
This project focuses on the development of an interactive Drug Sales Report Dashboard using
Microsoft Power BI, aimed at analyzing and visualizing pharmaceutical sales data. Traditional
methods like spreadsheets or static reports often limit analytical depth and user engagement.
This dashboard addresses those limitations by offering powerful visualizations, real-time filter-
ing, and interactive elements that help stakeholders derive meaningful business insights.
The dashboard is structured into three core pages:
Sales Average Analysis
It incorporates advanced features like slicers, custom buttons, and multiple visual charts
(bar, pie, line, etc.), enabling users to explore drug sales trends across different markets and
timeframes. This project exemplifies how Business Intelligence tools can transform raw sales
data into actionable insights, benefiting pharma analysts, marketers, and strategic decision-
makers.
Organisation Information:
Since its establishment Soumago Infotech has constantly grown and expanded. With our
four year extensive research and market exposure ,we have earned reputation for delivering top
quality solutions on time within budget, resulting in long term customer relationship. While
designing and developing your product we do take care of your requirement but at the same we
suggest changes if required according to the latest technologies.
Smart Tech Software is a term of software developers working in the field of Application
development, Barcode solution,Web development, Mobile apps, E-commerce -and Educational
project development.
We specialize in Web design and Software development. If you are looking for upgrade
your website computable in mobiles and tablets, Even if you don’t have any website, then just
ii
remember us and makes your dream success. We are giving the best solution for your best value
of money..
Methodologies:
We follow a structured methodology for our projects which starts from designing the solu-
tion to the implementation phase. Well planned Project reduces the time to deliver the project
and any additional ad-hoc costs to our clients, hence we dedicate majority of our time un-
derstanding our clients business and gather requirements. This ground up approach helps us
deliver not only the solution to our clients but also add value to your investments.
iii
INDEX
Acknowledgement ii
Index iv
2 INTRODUCTION 5
2.1 Module Description: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3 SYSTEM ANALYSIS 7
5 TECHNOLOGY 9
6 Mini Project 19
6.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
6.2 Product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
6.3 ProductView . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
7 SCREENSHOT 21
7.1 Home Page . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
7.2 Customer Page . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
7.3 Trend Page . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
7.4 Tooltip Page . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
8 CONCLUSION 24
BIBLIOGRAPHY 25
iv
Chapter 1
• To understand the practical application of Python and Power BI in real-world data ana-
lytics.
• An objective for this position should emphasize the skills you already possess in the area
and your interest in learning more
• To explore interactive reporting features using slicers, filters, and custom visuals.
• Utilizing internships is a great way to build your resume and develop skills that can be
emphasized in your resume for future jobs. When you are applying for a Training Intern-
ship, make sure to highlight any special skills or talents that can make you stand apart
from the rest of the applicants so that you have an improved chance of landing the position.
1
Internship Report
2
Jaihind COE Kuran, Department of Computer Engineering - 2024-25
Internship Report
3
Jaihind COE Kuran, Department of Computer Engineering - 2024-25
Internship Report
4
Jaihind COE Kuran, Department of Computer Engineering - 2024-25
Chapter 2
INTRODUCTION
The pharmaceutical industry produces a large volume of sales data from various countries
and drug categories. A meaningful analysis of this data can offer insights into market behavior,
drug performance, and regional growth trends. However, static methods of data reporting often
make it difficult to explore such data deeply.
The Drug Sales Report Dashboard was created using Power BI, which is a powerful Business
Intelligence (BI) tool by Microsoft. This dashboard enables users to:
- Monitor drug-wise and region-wise sales performance.
- Analyze average sales across months/years.
- Visualize country-wise contributions to total revenue.
- Filter and navigate using slicers and interactive buttons.
This module is designed around the concept of trend and pattern analysis. It focuses on
the average sales of each drug across time (monthly/yearly) and region. Users can com-
pare the performance of multiple drugs using: Bar and line charts for visualizing average
sales trends. Slicers for filtering data by drug type, country, or time. Interactive KPIs that
highlight changes over time. This module helps in understanding product popularity and
identifying underperforming drugs, enabling companies to tailor their strategies effectively.
5
Internship Report
The revenue analysis module is based on financial analytics and KPI tracking. It high-
lights the overall revenue generated from various drugs, countries, and time periods. Key
components include: Dynamic cards showing Total Revenue, Highest-Earning Drug, and
Top-Selling Country. Stacked bar and column charts representing revenue distribution
across categories. Trend visuals for identifying sales spikes and dips. This module sup-
ports strategic planning by providing insights into the most profitable drugs and regions.
This module incorporates geospatial analysis and regional performance metrics. It helps
visualize how drug sales are distributed across different countries using:
Map visuals (filled and bubble maps) showing revenue per country. Region-based com-
parisons using data tables and heat maps. Interactive slicers to filter by country, drug,
or time. It supports decision-making for global supply chain optimization and targeted
marketing.
Enhancing user experience and flexibility, this module integrates interactive elements such
as:
Navigation buttons for switching between pages. Slicers and filters for customizing views
based on product, date, region, etc. Tooltips and drill-through options to provide deeper
data exploration. This layer turns the dashboard into an exploratory tool rather than a
static report, promoting user-driven analysis.
Following best practices of business intelligence and data storytelling, this module ensures:
Use of consistent color schemes and intuitive layout. Visual hierarchy to emphasize im-
portant metrics. Charts that highlight trends, outliers, and key comparisons. The goal
is to communicate complex data in a clear, meaningful, and visually appealing way that
aids interpretation and action.
6
Jaihind COE Kuran, Department of Computer Engineering - 2024-25
Chapter 3
SYSTEM ANALYSIS
Requirement Analysis
Existing System:
Traditional sales reporting methods rely heavily on Excel sheets and static charts, which limit
real-time interactivity and exploration. Users are often unable to drill down into granular data
or view customized reports.
Proposed System
The proposed system addresses these issues through a modern, Power BI-based dash-
board that:
- Provides interactive visual reports.
- Supports real-time data exploration.
- Enhances analytical depth via intuitive UI and filters.
- To automate data updates, reducing manual reporting efforts.
7
Chapter 4
System configurations
The software requirement specification can produce at the culmination of the analysis task.
The function and performance allocated to software as part of system engineering are refined
by established a complete information description, a detailed functional description, and indica-
tion of performance and design constrain, appropriate validate criteria, and other information
pertinent to requirements
Software Requirements:
Hardware Requirement:
• Ram : 4GB.
8
Chapter 5
TECHNOLOGY
PYTHON
Python is a high-level, general-purpose programming language. Its design philosophy em-
phasizes code readability with the use of significant indentation.
Python is dynamically type-checked and garbage-collected. It supports multiple program-
ming paradigms, including structured (particularly procedural), object-oriented and functional
programming. It is often described as a ”batteries included” language due to its comprehensive
standard library
Python is a multi-paradigm programming language. Object-oriented programming and
structured programming are fully supported, and many of their features support functional pro-
gramming and aspect-oriented programming (including metaprogramming and metaobjects).
Many other paradigms are supported via extensions, including design by contract and logic pro-
gramming. Python is often referred to as a ’glue language’ because it can seamlessly integrate
components written in other languages.
Python uses dynamic typing and a combination of reference counting and a cycle-detecting
garbage collector for memory management. It uses dynamic name resolution (late binding),
which binds method and variable names during program execution.
Its design offers some support for functional programming in the Lisp tradition. It has fil-
ter,mapandreduce functions; list comprehensions, dictionaries, sets, and generator expressions.
The standard library has two modules (itertools and functools) that implement functional tools
borrowed from Haskell and Standard ML.
Its core philosophy is summarized in the Zen of Python (PEP 20), which includes aphorisms
such as:
9
Internship Report
• Readability counts.
However, Python features regularly violate these principles and have received criticism
for adding unnecessary language bloat. Responses to these criticisms are that the Zen of
Python is a guideline rather than a rule .The addition of some new features had been so
controversial that Guido van Rossum resigned as Benevolent Dictator for Life following
vitriol over the addition of the assignment expression operator in Python 3.8
Python is meant to be an easily readable language. Its formatting is visually unclut-
tered and often uses English keywords where other languages use punctuation. Unlike
many other languages, it does not use curly brackets to delimit blocks, and semicolons
after statements are allowed but rarely used. It has fewer syntactic exceptions and special
cases than C or Pascal.
Indentation
Python uses whitespace indentation, rather than curly brackets or keywords, to de-
limit blocks. An increase in indentation comes after certain statements; a decrease in
indentation signifies the end of the current block. Thus, the program’s visual structure
accurately represents its semantic structure. This feature is sometimes termed the off-
side rule. Some other languages use indentation this way; but in most, indentation has
no semantic meaning. The recommended indent size is four spaces
Libraries of Python
Python is the language that has gained preference in data analytics due to simplic-
ity, versatility and a very powerful ecosystem of libraries. If you are dealing with large
data sets conducting statistical analysis or visualizing insights, it has a very wide range
of libraries to facilitate the process. From data manipulation using Pandas to the so-
phisticated application of machine learning through Scikit-learn, these libraries make the
extraction of meaningful insights more efficient for analysts and data scientists. This
guide highlights the 15 best Python libraries for data analytics making your data-driven
decision-making process that much easier.
10
Jaihind COE Kuran, Department of Computer Engineering - 2024-25
Internship Report
Numpy Lib:
Numpy is a general-purpose array-processing package. It provides a high-performance
multidimensional array object, and tools for working with these arrays. It is the funda-
mental package for scientific computing with Python.
Besides its obvious scientific uses, Numpy can also be used as an efficient multi-dimensional
container of generic data.
Array in Numpy is a table of elements (usually numbers), all of the same type, indexed
by a tuple of positive integers. In Numpy, number of dimensions of the array is called
rank of the array. A tuple of integers giving the size of the array along each dimension
is known as shape of the array. An array class in Numpy is called as ndarray. Elements
in Numpy arrays are accessed by using square brackets and can be initialized by using
nested Python Lists
Pandas Lib:
Pandas is a powerful and open-source Python library designed for data manipulation and
analysis. It was created by Wes McKinney in 2008 and is built on top of the NumPy
library . Pandas is well-suited for working with tabular data, such as spreadsheets or
SQL tables, and is an essential tool for data analysts, scientists, and engineers
1. Series: A one-dimensional labeled array capable of holding data of any type (integer,
string, float, Python objects, etc.). It is similar to a column in an Excel sheet.
2. DataFrame: A two-dimensional data structure with labeled axes (rows and columns),
similar to a table in a database or an Excel sheet
Matplot Lib:
11
Jaihind COE Kuran, Department of Computer Engineering - 2024-25
Internship Report
plots. It simplifies the process of adding plot elements such as lines, images, and text to
the axes of the current figure
Step In Pyplot:
• Customize Plot: Add titles, labels, and other elements using methods like plt.title(),
plt.xlabel(), and plt.ylabel().
12
Jaihind COE Kuran, Department of Computer Engineering - 2024-25
Internship Report
the number of columns we are analyzing we can divide EDA into three types: Univariate,
bivariate and multivariate.
1. Univariate Analysis
2. Bivariate Analysis
Bivariate analysis focuses on exploring the relationship between two variables to find
connections, correlations, and dependencies. It’s an important part of exploratory data
analysis that helps understand how two variables interact. Some key techniques used in
bivariate analysis include scatter plots, which visualize the relationship between two con-
tinuous variables; correlation coefficient, which measures how strongly two variables are re-
lated, commonly using Pearson’s correlation for linear relationships; and cross-tabulation,
or contingency tables, which show the frequency distribution of two categorical variables
and help understand their relationship.
Line graphs are useful for comparing two variables over time, especially in time series
data, to identify trends or patterns. Covariance measures how two variables change to-
gether, though it’s often supplemented by the correlation coefficient for a clearer, more
standardized view of the relationship.
3. Multivariate Analysis
Multivariate analysis examines the relationships between two or more variables in the
dataset. It aims to understand how variables interact with one another, which is crucial
for most statistical modeling techniques. It include Techniques like pair plots, which show
the relationships between multiple variables at once, helping to see how they interact. An-
other technique is Principal Component Analysis (PCA), which reduces the complexity of
large datasets by simplifying them, while keeping the most important information. Steps
for Performing Exploratory Data Analysis Performing Exploratory Data Analysis (EDA)
13
Jaihind COE Kuran, Department of Computer Engineering - 2024-25
Internship Report
involves a series of steps designed to help you understand the data you’re working with,
uncover underlying patterns, identify anomalies, test hypotheses, and ensure the data is
clean and suitable for further analysis.
The first step in any data analysis project is to clearly understand the problem you’re
trying to solve and the data you have. This involves asking key questions such as:
2. What are the variables in the data and what do they represent?
After clearly understanding the problem and the data, the next step is to import the data
into your analysis environment (like Python, R, or a spreadsheet tool). At this stage, it’s
crucial to examine the data to get an initial understanding of its structure, variable types,
and potential issues.
Here’s what you can do: Load the data into your environment carefully to avoid errors or
truncations. Examine the size of the data (number of rows and columns) to understand
its complexity. Check for missing values and see how they are distributed across vari-
ables, since missing data can impact the quality of your analysis. Identify data types for
each variable (like numerical, categorical, etc.), which will help in the next steps of data
manipulation and analysis. Look for errors or inconsistencies, such as invalid values, mis-
matched units, or outliers, which could signal deeper issues with the data. By completing
these tasks, you’ll be prepared to clean and analyze the data more effectively.
14
Jaihind COE Kuran, Department of Computer Engineering - 2024-25
Internship Report
Missing data is common in many datasets and can significantly affect the quality of your
analysis. During Exploratory Data Analysis (EDA), it’s important to identify and handle
missing data properly to avoid biased or misleading results.
Here’s how to handle it:
1.Understand the patterns and possible reasons for missing data. Is it missing completely
at random (MCAR), missing at random (MAR), or missing not at random (MNAR)?
Knowing this helps decide how to handle the missing data.
2.Decide whether to remove missing data (listwise deletion) or impute (fill in) the missing
values. Removing data can lead to biased outcomes, especially if the missing data isn’t
MCAR.
3.Imputing values helps preserve data but should be done carefully.
4.Use appropriate imputation methods like mean/median imputation, regression impu-
tation, or machine learning techniques like KNN or decision trees based on the data’s
characteristics.
5.Consider the impact of missing data. Even after imputing, missing data can cause un-
certainty and bias, so interpret the results with caution.
6.Properly handling missing data improves the accuracy of your analysis and prevents
misleading conclusions.
After addressing missing data, the next step in EDA is to explore the characteristics of
your data by examining the distribution, central tendency, and variability of your vari-
ables, as well as identifying any outliers or anomalies. This helps in selecting appropriate
analysis methods and spotting potential data issues. You should calculate summary statis-
tics like mean, median, mode, standard deviation, skewness, and kurtosis for numerical
variables. These provide an overview of the data’s distribution and help identify any ir-
regular patterns or issues.
15
Jaihind COE Kuran, Department of Computer Engineering - 2024-25
Internship Report
Data transformation is an essential step in EDA because it prepares your data for accu-
rate analysis and modeling. Depending on your data’s characteristics and analysis needs,
you may need to transform it to ensure it’s in the right format. Common transformation
techniques include: 1.Scaling or normalizing numerical variables (e.g., min-max scaling
or standardization). 2.Encoding categorical variables for machine learning (e.g., one-hot
encoding or label encoding). 3.Applying mathematical transformations (e.g., logarithmic
or square root) to correct skewness or non-linearity. 4.Creating new variables from exist-
ing ones (e.g., calculating ratios or combining variables). 5.Aggregating or grouping data
based on specific variables or conditions.
Step 6: Visualize Data Relationship
Visualization is a powerful tool in the EDA process, helping to uncover relationships be-
tween variables and identify patterns or trends that may not be obvious from summary
statistics alone. For categorical variables, create frequency tables, bar plots, and pie charts
to understand the distribution of categories and identify imbalances or unusual patterns.
For numerical variables, generate histograms, box plots, violin plots, and density plots
to visualize distribution, shape, spread, and potential outliers. To explore relationships
between variables, use scatter plots, correlation matrices, or statistical tests like Pearson’s
correlation coefficient or Spearman’s rank correlation
Outliers are data points that significantly differ from the rest of the data, often caused
by errors in measurement or data entry. Detecting and handling outliers is important
because they can skew your analysis and affect model performance. You can identify
outliers using methods like interquartile range (IQR), Z-scores, or domain-specific rules.
Once identified, outliers can be removed or adjusted depending on the context. Properly
managing outliers ensures your analysis is accurate and reliable.
16
Jaihind COE Kuran, Department of Computer Engineering - 2024-25
Internship Report
The final step in EDA is to communicate your findings clearly. This involves summa-
rizing your analysis, pointing out key discoveries, and presenting your results in a clear
and engaging way. Clearly state the goals and scope of your analysis. Provide context
and background to help others understand your approach. Use visualizations to support
your findings and make them easier to understand. Highlight key insights, patterns, or
anomalies discovered. Mention any limitations or challenges faced during the analysis.
Suggest next steps or areas that need further investigation.
17
Jaihind COE Kuran, Department of Computer Engineering - 2024-25
Internship Report
Steps-for-Performing-Exploratory-Data-Analysis
18
Jaihind COE Kuran, Department of Computer Engineering - 2024-25
Chapter 6
Mini Project
6.1 Overview
19
Internship Report
6.2 Product
6.3 ProductView
20
Jaihind COE Kuran, Department of Computer Engineering - 2024-25
Chapter 7
SCREENSHOT
21
Internship Report
22
Jaihind COE Kuran, Department of Computer Engineering - 2024-25
Internship Report
23
Jaihind COE Kuran, Department of Computer Engineering - 2024-25
Chapter 8
CONCLUSION
24
Bibliography
• Few, Stephen. Now You See It: Simple Visualization Techniques for Quanti-
tative Analysis. Analytics Press, 2009.
• McKinney, Wes. Python for Data Analysis: Data Wrangling with Pandas,
NumPy, and IPython. O’Reilly Media, 2017.
25