Project Report On Python
Project Report On Python
A Project Report On
BACHELOR OF ENGINEERING
IN
INFORMATION SCIENCE AND ENGINEERING
Submitted by
Goutham P
(1EW20IS032)
Chapter 1
Company profile
and employees. So that future human resources will be very beneficial, purposeful and
profitable to the nation.
1.5 Objectives
• AAPL had a trust in Skill India mission & vision, hence our utmost priority is to add skill to the
young Generation and make them Profitable and productive for the nation.
• Very eager to fetch solution for most complex industrial problems in a mode
Organization structure The organization structure is having three different departments such
as design department, software department and sales and marketing.
• All type of automation projects to companies using PLC’s, SCADA embedded systems.
• We provide robots and robotic solutions to small and medium scale companies.
Chapter 2
Introduction
Data analysis is a process of inspecting, cleansing, transforming and modelling data with the goal of
discovering useful information, informing conclusions, and supporting decision making. Every
product that is manufactured is supposed to have distinguishing physical characteristics, which
makes it attractive and provides usefulness and value to customers; these characteristics are known
as Design, and the process employed in this regard is known as product design. Product designs
clearly defines a problem , develops a proper solution for that problem and validates the solution
with the real users Product design is the process of creating a new product to be sold by a business
to its customers. It is essentially the efficient and effective generation and development of ideas
through a process that leads to new products
Data mining is a particular data analysis technique that focuses on statistical modelling and
knowledge discovery for predictive rather than purely descriptive purposes, while business
intelligence covers data analysis that relies heavily on aggregation, focusing mainly on business
information. In statistical applications, data analysis can be divided into descriptive statistics,
explorative data analysis (EDA), and confirmation data analysis (CDA). EDA focuses on
discovering new features in the data while CDA focuses on confirming or falsifying existing
hypothesis. Predictive analysis focuses on the application of statistical models for predictive
forecasting or classification, while text analytics applies statistical, linguistic, and structural
techniques to extract and classify information from textual sources, a species of unstructured data.
All of the above are varieties of data analysis.
Chapter 3
Tools exposed
The notebook dashboard is the component which is shown first when you launch
jupyter notebook app. The notebook dashboard is mainly used to open notebook documents
and manage the running kernels. The jupyter notebook extends the console based approach
to interactive computing in a qualitatively new direction, providing a web based application
suitable for capturing the whole computation process: developing, computing and executing
code as well as communicating the results. The jupyter notebook combines two components a
web application and notebook documents.
A web application: A web browser based tool for interactive authoring of documents
which combine explanatory text, mathematics, computations and their rich media output.
Notebook documents: A representation of all content visible in the web application, including
inputs and outputs of the computations, explanatory text, mathematics, images and rich
media representation of objects.
Colaboratory or colab for short, is a product from Google research. Colab allows anybody to write
and execute arbitrary python code through the browser and is especially well suited to machine
learning, data analysis and education. More technically colab is a hosted jupyter notebook service
that requires no setup to use, while providing access free of charge to computing resources
including GPUs.
Colab resources are not guaranteed and not unlimited, and the usage limits sometimes
fluctuate. This is necessary for colab to be able to provide resources free charge. Resources in colab
are prioritized for interactive use cases. We prohibit actions associated with bulk compute, actions
that negatively impact others as well as actions associated with bypassing the policies. Jupyter is the
open source project on which the colab is based. Colab allows you to use and share jupyter
notebooks with others without having to download, install or run anything.
You can search colab notes using google drive. Clicking on the colab logo at the top left of
the notebook view will show all notebooks in drive. You can also search for notebooks that you
have opened recently by clicking on file and then open notebook. Google drive operations can time
out when the number of folders or subfolders in a folder grows too large. If thousands of items are
directly contained in the top level “My drive” folder then mounting the drive will likely time out.
Repeated attempts may eventually succeed as failed attempts cache partial state locally before
timing out.
Colab is able to provide resources free of cost in part by having dynamic usage limits that
sometimes fluctuate this means that overall usage limits as well as idle timeout periods, maximum
VM lifetime, GPU types available and other factors vary over time. Colab does not publish these
limits in parts because they can vary quickly. This is necessary for colab to be able to provide
access these resources free of charge. Colab works with most of the major browsers and is most
thoroughly tested with the latest versions of Chrome, Firefox and Safari.
Chapter 4
Task performed: Data analysis of software salaries
While analysing data sets, it is important to define the objectives so that further steps become
clearer. Analysis lets us pose questions about data. For questioning data, it is important to have data
collection on which further operations will be carried out. After the above steps, “Data Analysis”
comes into picture. Data analysis o is the process of raw data cleaning and conversion so that
further operations become easier to carry on and then the conclusions can be drawn from the results.
For Today, data has become the backbone of all research in almost every field. Research and
analysis is no more limited to just the area of sciences, but has grown to be a part of businesses –
startups and established organisations, government works and more.
Data set :
A data set is a collection of similar and related data or information. It is organised for better
accessibility of an entity. Data sets are used for data analytics as they provide related information in
a united form. It can be structured or unstructured.
Data set link:
https://drive.google.com/drive/folders/1ceVcJahbNTxYFCJFcKI4
NH4Ldw7MJZxz
Chapter 5
Results and discussions
1. Scatter plot
2. Bar plot
3. Count plot
4. Bar plot
5. Bar plot
6. Heat map
Chapter 6
Reflection notes
• Machine learning involves computations on large data sets, hence we learnt strong basic
fundamental knowledge such as computer architecture, algorithms and data structure complexity.
Getting in depth into the python language and exploring new commands.
• Synthesize visual perception skills along with drawing skills to visually communicate ideas.
Deconstruction of designs for its motives and inspirations. To learn to synthesize data and make
connections within the data points using the available frameworks.
• To frame an appropriate actionable problem statement with reference to user needs and contextual
alignments.
• Data analysis of different data sets and to understand the concepts on a real world basis to
implement and make use of AI/ML in our upcoming career.
• To train different models and to make sure the requirement of the respective clients and
make to implement a model according to their requirements.
Time management helps you allocate time for the most important tasks. When we follow a schedule
we don’t have to spend time and energy on what to do. Instead we can focus on what matters and
do well. The quality of the work will suffer if we are constantly worrying about meeting the
deadlines. Time management helps to prioritize the tasks, so we can have enough time to focus on
each project to put in the effort and produce high quality outcomes.
Many software companies have to work against tight timelines. Proper time management will allow
us to allocate enough time to meet each deadline. Planning ahead also keeps us calm and think
freely to work more in an efficient way.
Confidence is the key to a positive personality. Exude confidence and positive aura wherever you
go. Personality development teaches you to be calm and composed even at stressful situations.
Never over react. Avoid finding faults in others. Learn to be a little broad minded and flexible.
Chapter 7
Conclusion
In conclusion, this internship has been a very useful experience for me. I can safely say that my
understanding of the job environment has increased greatly. However, I do think that there are some
aspects of the job that I could have done better and that I need to wort on. I have built more
confidence in usage of software tools. The two main things I learnt after my experience in this firm
are time management and being self-motivated. I have gained new knowledge and skills and met
new people. Usage of big data tools can improve operational efficiency. Data analysis helps
companies make informed decisions, create a more efficient marketing strategies, improve customer
experience, streamline operations , among many other things. Usage of charts, maps, other visual
representations of data to help present your findings in an easy-to-understand way. Improving the
data visualisation skills often means learning visualisation software.
References
1. Mittal P, Sharma M, Jain P. A detailed study of security and privacy concerns in big data. Int J
Appl Eng Res. 2018; 13:7406–11
2. Paul Z, Eaton C. Understanding big data: analytics for enterprise class Hadoop and streaming data.
McGraw-Hill Osborne Media;2011. P. 1–166.
3. Philip R. Big data analytics. TDWI best practices report, fourth quarter.
2011;19(4):1– 34.
6. G data: various R programming tools for data manipulation. [cited 2005 May].
7. The art of R programming: a tour of statistical software design. [cited 2012 Apr].
https://www.researchgate.net/publication/254296013_The_Art_of_R_Programming_A _ Tour _of
_Statistical _Software _Design _by _Norman _ Matloff.