Internship Report Big Data Analysis
Internship Report Big Data Analysis
COMPANY PROFILE
1.1. Introduction
Neubotz Technologies Pvt. Ltd is an AI based MedTech Company. At Neubotz, We
are building a world where patient’s mind and body wellness is Data driven, fast and Health
Prediction.
1.2. Scope
1. We Value our Clients:
At Neubotz Technologies Pvt Ltd, we value our clients and thus work in a flexible
environment for software development process which can be easily adjusted as per
clients’ requirements. High quality work is a prerequisite for every task we undertake, as
we consider that "every day counts".
2. We Believe in Quality:
Excellent and consistent quality at low cost is the key to success in outsourcing
business; and we stick to the basics.
At, Neubotz our goal is to democratize access to positive health lifestyle, by using
feedback learning technology. We have a strong Research and Development team having a
vsion to make health care more accessible and affordable.
1. Unnoticed Work
2. Uncooperative Mentor
5. Inadequate Compensation
7. Competitive Co-Interns
MCA jobs, better lifestyle, brighter future are possible with good performance in MCA
Course. Just as how important MCA course is, Internship too in the MCA course.
The internship is an opportunity for an engineering student to experience a real-time
environment and learn the concepts which they were taught with a practical touch.
Interns need to put a greater focus on learning things and impress the employer and get a
full-time job. When a college approves a student to go for an internship, the intern has the
responsibility to let the college know the daily routine in the work and daily job sheet, and
that is called an “Internship Report”
It gives the college management a clear understanding of how an intern is utilizing the
opportunity. There is a clear guideline to submit an Internship Report.
It may differ from a college to another slightly but mostly the same.
1.6. Objectives
To get preparing and trial learning openings for the improvement of abilities in specialized
programming ideas.
To get an expert work space that urges and offers space to proficient character advancement
and the improvement of expert capability.
Analytical skills to evaluate software and make improvements. Logical abilities to assess
programming and make upgrades. Order bugs into reports and suggest arrangements.
Effective expert relational abilities. Meticulous way to deal with programming plan.
Creative disapproved of expert. Broad critical thinking and basic reasoning capacities.
Excellent comprehension of plan standards and involvement with their application.
Background in quality affirmation division. Broad working information on equipment,
apparatuses, and programming dialects.
An internship is beneficial in defining success, which is why you need it in the first
place. Scope of the internship, what you wish to accomplish and how you will get it done.
An internship is a period of work experience offered by an organization for a limited period
of time. Once confined to medical graduates, internship is used for a wide range of
placements in businesses, non-profit organizations and government agencies.
Internships for professional careers are similar in some ways. Similar to internships,
apprenticeships transition students from vocational school into the workforce. The lack of
standardization and oversight leaves the term "internship" open to broad interpretation.
They are typically undertaken by students and graduates looking to gain relevant skills and
experience in a particular field.
An internship consists of an exchange of services for experience between the intern and
the organization. Internships are used to determine whether the intern still has an interest in
that field after the real-life experience. In addition, an internship can be used to create a
professional network that can assist with letters of recommendation or lead to future
employment opportunities.
TASK PERFORMED
2.1. Collection of data using Google forms.
The internship was focused on making the student aware about the constructs and
programming environment etc in python. Google Forms is a survey administration
app that is included in the Google Drive office suite along with Google Docs,
Google Sheets, and Google Slides. Forms features all of the collaboration and sharing
features found in Docs, Sheets, and Slides.
Big Data is used to describe the massive volume of both structured and unstructured data that
is so large it is difficult to process using traditional techniques. So Big Data is just what it
sounds like — a whole lot of data.
The concept of Big Data is a relatively new one and it represents both the increasing amount
and the varied types of data that is now being collected. Proponents of Big Data often refer to
this as the “Identification” of the world. As more and more of the world’s information moves
online and becomes digitized, it means that analysts can start to use it as data. Things like
social media, online books, music, videos and the increased amount of sensors have all added
to the astounding increase in the amount of data that has become available for analysis.
Everything you do online is now stored and tracked as data. Reading a book on your Kindle
generates data about what you’re reading, when you read it, how fast you read it and so on.
Similarly, listening to music generates data about what you’re listening to, when how often
and in what order. Your smart phone is constantly uploading data about where you are, how
fast you’re moving and what apps you’re using.
What’s also important to keep in mind is that Big Data isn’t just about the amount of data
we’re generating, it’s also about all the different types of data (text, video, search logs, sensor
logs, customer transactions, etc.). When thinking about Big Data, consider the “seven V’s:”
Volume : Big Data is, well … big! With the dramatic growth of the internet, mobile
devices, social media, and Internet of Things (IoT) technology, the amount of data
generated by all these sources has grown accordingly.
Velocity : In addition to getting bigger, the generation of data and organizations’ ability
to process it is accelerating.
Variety: In earlier times, most data types could be neatly captured in rows on a structured
table. In the Big Data world, data often comes in unstructured formats like social media
posts, server log data, lat-long geo-coordinates, photos, audio, video and free text.
Variability: The meaning of words in unstructured data can change based on context.
Veracity: With many different data types and data sources, data quality issues invariably
pop up in Big Data sets. Veracity deals with exploring a data set for data quality and
systematically cleansing that data to be useful for analysis.
Value: Data must be combined with rigorous processing and analysis to be useful.
Big Data technologies are very beneficial to the businesses in order to boost the efficiency
and develop new data driven services. There are a number of uses of big data. For example,
in analysing a set of data containing weather report to predict the next weeks weather.
Health Care
Detect Frauds
Social Media Analysis
Weather
Public sector.
of our past or dreams of our future. So that is nothing but data analysis. Now same thing
analyst does for business purposes, is called Data Analysis.
There are several types of data analysis techniques that exist based on business and
technology. The major types of data analysis are:
Text Analysis
Statistical Analysis
Diagnostic Analysis
Predictive Analysis
Prescriptive Analysis
TEXT ANALYSIS
STATISTICAL ANALYSIS
Statistical Analysis shows "What happen?" by using past data in the form of
dashboards. Statistical Analysis includes collection, Analysis, interpretation, presentation,
and modeling of data. It analyses a set of data or a sample of data. There are two categories of
this type of Analysis - Descriptive Analysis and Inferential Analysis.
DESCRIPTIVE ANALYSIS
INFERENTIAL ANALYSIS
Inferential analysis sample from complete data. In this type of Analysis, you can find
different conclusions from the same data by selecting different samples.
DIAGNOSTIC ANALYSIS
Diagnostic Analysis shows "Why did it happen?" by finding the cause from the
insight found in Statistical Analysis. This Analysis is useful to identify behavior patterns of
data. If a new problem arrives in your business process, then you can look into this Analysis
to find similar patterns of that problem. And it may have chances to use similar prescriptions
for the new problems.
PREDICTIVE ANALYSIS
Predictive Analysis shows "what is likely to happen" by using previous data. The
simplest example is like if last year I bought two dresses based on my savings and if this year
my salary is increasing double then I can buy four dresses. But of course it's not easy like this
because you have to think about other circumstances like chances of prices of clothes is
increased this year or maybe instead of dresses you want to buy a new bike, or you need to
buy a house! So here, this Analysis makes predictions about future outcomes based on
current or past data. Forecasting is just an estimate. Its accuracy is based on how much
detailed information you have and how much you dig it.
PRESCRIPTIVE ANALYSIS
Prescriptive Analysis combines the insight from all previous Analysis to determine
which action to take in a current problem or decision.
Data Collection
Data Cleaning
Data Analysis
Data Interpretation
Data Visualization
First of all, you have to think about why do you want to do this data analysis? All you
need to find out the purpose or aim of doing the Analysis. You have to decide which type of
data analysis you wanted to do! In this phase, you have to decide what to analyze and how to
measure it, you have to understand why you are investigating and what measures you have to
use to do this Analysis.
DATA COLLECTION
After requirement gathering, you will get a clear idea about what things you have to
measure and what should be your findings. Now it's time to collect your data based on
requirements. Once you collect your data, remember that the collected data must be
processed or organized for Analysis. As you collected data from various sources, you must
have to keep a log with a collection date and source of the data.
DATA CLEANING
Now whatever data is collected may not be useful or irrelevant to your aim of
Analysis, hence it should be cleaned. The data which is collected may contain duplicate
records, white spaces or errors. The data should be cleaned and error free. This phase must be
done before Analysis because based on data cleaning, your output of Analysis will be closer
to your expected outcome.
DATA ANALYSIS
Once the data is collected, cleaned, and processed, it is ready for Analysis. As you
manipulate data, you may find you have the exact information you need, or you might need to
collect more data. During this phase, you can use data analysis tools and software which will
help you to understand, interpret, and derive conclusions based on the requirements.
DATA INTERPRETATION
After analyzing your data, it's finally time to interpret your results. You can choose
the way to express or communicate your data analysis either you can use simply in words or
maybe a table or chart. Then use the results of your data analysis process to decide your best
course of action.
DATA VISUALIZATION
Data visualization is very common in your day to day life; they often appear in the
form of charts and graphs. In other words, data shown graphically so that it will be easier for
the human brain to understand and process it. Data visualization often used to discover
unknown facts and trends. By observing relationships and comparing datasets, you can find a
way to find out meaningful information. Effective visualization helps users analyze and
reason about data and evidence Tables are generally used where users will look up a specific
measurement, while charts of various types are used to show patterns or relationships in
the data for one or more variables.
WORKING PROCEDURE
It is very easy to setup Python environment for performing data analysis. The most
accessible way to start is to download the pyCharm IDE , as it contains the necessary libraries
including NumPy, Pandas, Matplotlib.
There are numerous ways to learn the basics of Python. A number of online courses
which offer free tutorials on Python for data science. These free courses consist of video
tutorials and documentation with practice exercises is a comprehensive way to learn by active
participation, as opposed to the traditional method of reading concepts and looking at
examples.
Being a general purpose language Python is often used beyond data analysis and data
science. Abundant availability of libraries makes Python remarkably useful for working with
data functionalities. The significant Python libraries that are used for working with data.
The best way to learn any programming language is to take a sample dataset and start
working with it. By practising on these sample datasets will help aspirants to apply new
techniques and experiment with learned methods and get to know about one’s strengths and
areas that need improvement.The StatsModels library of Python includes some preloaded
datasets that can be used. Once being familiar with working users can load a dataset from the
web or a CSV file.
The most important skills required to extract information from abundant data is data
administration. In most of the occasions, we get crude data which is not applicable for
analysis.
To make the data available for analysis we need to manipulate it. Python provides tools and
applications for transforming, formatting, cleaning and moulds it for examining.
Visuals are remarkably relevant for both exploratory data analysis and to
communicate results. Matplotlib is the regular Python library used for visualisation.
Analysing data is not just formatting and creating plots and graphs. The core aspects
of analytics are statistical modelling, machine learning algorithms, data mining techniques,
inferences. The Python programming language is an excellent tool for analysing data because
it has effective libraries such as Scikit-learn and StatsModels which contain the tools of the
models and algorithms that are essential for analy
TASK PERFORMED
This theory is based on the fact that the brain’s two hemispheres function differently.
This first came to light in the 1960s, thanks to the research of psycho biologist and Nobel
Prize winner Roger W. Sperry.
The theory is that people are either left-brained or right-brained, meaning that one
side of their brain is dominant. If you’re mostly analytical and methodical in your thinking,
you’re said to be left-brained. If you tend to be more creative or artistic, you’re thought to be
right-brained.
The left brain is more verbal, analytical, and orderly than the right brain. It’s
sometimes called the digital brain. It’s better at things like reading, writing, and
computations.
According to Sperry’s dated research, the left brain is also connected to:
Logic
Sequencing
Linear thinking
Mathematics
Facts
Thinking in words
The right brain is more visual and intuitive. It’s sometimes referred to as the analog
brain. It has a more creative and less organized way of thinking.
Sperry’s dated research suggests the right brain is also connected to:
Imagination
Holistic thinking
Intuition
Arts
Rhythm
Nonverbal cues
Feelings visualization
Daydreaming
We know the two sides of our brain are different, but does it necessarily follow that
we have a dominant brain just as we have a dominant hand?A team of neuroscientists set out
to test this premise. After a two year analysis Trusted Source, they found no proof that this
theory is correct. Magnetic resonance imaging of 1,000 people revealed that the human brain
doesn’t actually favor one side over the other. The networks on one side aren’t generally
stronger than the networks on the other side.
The two hemispheres are tied together by bundles of nerve fibers, creating an
information highway. Although the two sides function differently, they work together and
complement each other. You don’t use only one side of your brain at a time.
Whether you’re performing a logical or creative function, you’re receiving input from
both sides of your brain. For example, the left brain is credited with language, but the right
brain helps you understand context and tone. The left brain handles mathematical equations,
but right brain helps out with comparisons and rough estimates.
General personality traits, individual preferences, or learning style don’t translate into
the notion that you’re left-brained or right-brained.
Still, it’s a fact that the two sides of your brain are different, and certain areas of your
brain do have specialties. The exact areas of some functions can vary a bit from person to
person.
DIGITAL LIBRARY
Digital libraries can vary immensely in size and scope, and can be maintained by
individuals or organizations. The digital content may be stored locally, or accessed remotely
via computer networks. These information retrieval systems are able to exchange information
with each other through interoperability and sustainability.
The early history of libraries is poorly documented, but several key thinkers are
connected to the emergence of this concept. Predecessors include Paul Otlet and Henri La
Fontaine's Mundaneum, an attempt begun in 1895 to gather and systematically catalogue the
world's knowledge, the hope of bringing about world peace. The establishment of the digital
library was total dependent on the progress in the age of the internet. It not only provided the
means to compile the digital library but the access to the books by millions of individuals on
the World Wide Web.
The advantages of digital libraries as a means of easily and rapidly accessing books,
archives and images of various types are now widely recognized by commercial interests and
public bodies alike. Traditional libraries are limited by storage space; digital libraries have
the potential to store much more information, simply because digital information requires
very little physical space to contain it. As such, the cost of maintaining a digital library can be
much lower than that of a traditional library.
No physical boundary.
Multiple access
Information retrieval.
Space
Added value.
Easily accessible
Digital libraries, or at least their digital collections, unfortunately also have brought
their own problems and challenges in areas such as:
Copyright
Digital preservation
Equity of access
Interface design
Information organization
Quality of metadata
OFFLINE LIBRARY
In addition to providing materials, libraries also provide the services of librarians who
are experts at finding and organizing information and at interpreting information needs.
Libraries often provide quiet areas for studying, and they also often offer common areas to
facilitate group study and collaboration. Libraries often provide public facilities for access to
their electronic resources and the Internet.
The history of libraries began with the first efforts to organize collections of
documents. Topics of interest include accessibility of the collection, acquisition of materials,
arrangement and finding tools, the book trade, the influence of the physical properties of the
different writing materials, language distribution, role in education, rates of literacy, budgets,
staffing, libraries for specially targeted audiences, architectural merit, patterns of usage, and
the role of libraries in a nation's cultural heritage, and the role of government, church or
private sponsorship. Since the 1960s, issues of computerization and digitization have arisen.
TYPES OF LIBRARY
Academic libraries
Children's libraries
National libraries
Public lending libraries
Reference libraries
Research libraries
Digital libraries
Special libraries
TASK PERFORMED
Google Forms is a tool that allows collecting information from users via a
personalized survey or quiz. The information is then collected and automatically connected to
a spreadsheet.
Google Forms is a tool that allows collecting information from users via a
personalized survey or quiz. The information is then collected and automatically connected to
a spreadsheet. The spreadsheet is populated with the survey and quiz responses. The Forms
service has undergone several updates over the years. New features include, but are not
limited to, menu search, shuffle of questions for randomized order, limiting responses to once
per person, shorter URLs, custom themes, automatically generating answer suggestions when
creating forms, and an "Upload file" option for users answering questions that require them to
share content or files from their computer or Google Drive. The upload feature is only
available through G Suite. In October 2014, Google introduced add-ons for Google Forms,
that enable third-party developers to make new tools for more features in surveys.
Collected the right data using the following link through Google forms.
https://forms.gle/m3LNt97VQrREL3bX7
TECHNOLOGY BEHIND
PYTHON
What is Python?
Python is a popular programming language. It was created by Guido van Rossum, and
released in 1991.
It is used for:
Software development,
mathematics,
System scripting.
Python can connect to database systems. It can also read and modify files.
Python can be used to handle big data and perform complex mathematics.
Why Python?
Python works on different platforms (Windows, Mac, Linux, Raspberry Pi, etc).
Python has syntax that allows developers to write programs with fewer lines than
some other programming languages.
Python runs on an interpreter system, meaning that code can be executed as soon as it
is written. This means that prototyping can be very quick.
Good to know
The most recent major version of Python is Python 3, which we shall be using in this
tutorial. However, Python 2, although not being updated with anything other than
security updates, is still quite popular.
In this tutorial Python will be written in a text editor. It is possible to write Python in
an Integrated Development Environment, such as Thonny, Pycharm, Netbeans or
Eclipse which are particularly useful when managing larger collections of Python
files.
There are four collection data types in the Python programming language:
List: A list is a collection which is ordered and changeable. In Python lists are written with
square brackets.
Applications of Python
As mentioned before, Python is one of the most widely used language over the web. I'm
going to list few of them here:
Easy-to-learn − Python has few keywords, simple structure, and a clearly defined
syntax. This allows the student to pick up the language quickly.
Easy-to-read − Python code is more clearly defined and visible to the eyes.
A broad standard library − Python's bulk of the library is very portable and cross-
platform compatible on UNIX, Windows, and Macintosh.
Interactive Mode − Python has support for an interactive mode which allows
interactive testing and debugging of snippets of code.
Portable − Python can run on a wide variety of hardware platforms and has the same
interface on all platforms.
Extendable − You can add low-level modules to the Python interpreter. These
modules enable programmers to add to or customize their tools to be more efficient.
Dept. of MCA 26 | P a g e
GUI Programming − Python supports GUI applications that can be created and
ported to many system calls, libraries and windows systems, such as Windows MFC,
Macintosh, and the X Window system of Unix.
Scalable − Python provides a better structure and support for large programs than
shell scripting.
LIBRARIES USED
NumPy
NumPy is a Python package which stands for ‘Numerical Python’. It is the core
library for scientific computing, which contains a powerful n-dimensional array object,
provide tools for integrating C, C++ etc. It is also useful in linear algebra, random number
capability etc. NumPy array can also be used as an efficient multi-dimensional container for
generic data.
Pandas
Matplotlib
Matplotlib is a plotting library for the Python programming language and its
numerical mathematics extension NumPy. It provides an object-oriented API for embedding
plots into applications using general-purpose GUI toolkits like Tkinter, wxPython, Qt,
or GTK+.
Dept. of MCA 27 | P a g e
import numpy as np
import pandas as pd
one=np.array(range(0,9))
two=np.array(range(0,9))
one.shape=(3,3) two.shape=(3,3)
print(one)
print(two)
print(one+two)
data={"Name":["a","b","c"],"Marks":[10,20,30]}
td=pd.DataFrame(data) x_axis=[1,2,3,4,12,34,29,30]
bins=[10,20,30,40]
plt.title(" ")
plt.hist(x_axis,bins)
plt.xlabel("")
plt.ylabel(" ")
plt.show( )
pd.read_csv("D:/project/dataAnalysis.csv")
Dept. of MCA 28 | P a g e
Data analysis based on the collected data set through Google form and
implementation in python using the libraries and data set and plotting a graph.
Left brain and right brain theory prediction is done based on the questions asked in
Google form where the questions are like,
import numpy as np
import pandas as pd
data=data.dropna()
subjects"]
print(df_math_ppl)
no_of_math_ppl= ser_math.count()
tot_ppl=req_data.count() right_brain_music=tot_ppl-
no_of_math_ppl print(right_brain_music)
no_of_names_recog=ser_names.count() right_brain_face=tot_rec.count()-
no_of_names_recog no_left_brain_ppl=no_of_math_ppl+no_of_names_recog
no_right_brain_ppl=right_brain_music+right_brain_face x_axis=[0,1]
y_axis=[no_left_brain_ppl,no_right_brain_ppl] x_label=("left vs right brain
ppl")
plt.show()
online_prefereed=ser_online_ppl.count() offline_prefered=total_opinion.count()-
online_prefereed x_axis=[0,2]
y_axis=[online_prefereed,offline_prefered] x_label=("online vs
offline library") y_label=("no of ppl")
plt.bar(x_axis,y_axis)plt.show()
Dept. of MCA 33 | P a g e
CONCLUSION
Here we collected the data through the Google forms , mined the data collected and
given it as input , after some operations on data sets using particular libraries in python, we
visualize the output by plotting the graph based on given input .
REFERENCES
[1] "The Amazing Story of Kentucky's Horseback Librarians (10 Photos)". Archive
Project. Retrieved 19 May 2017.
[2] "St. George Library Workshops". utoronto.ca. Dowler, Lawrence (1997). Gateways
to knowledge: the role of academic libraries in teaching, learning, and research.
ISBN
[3] "The Role of Academic Libraries in Universal Access to Print and Electronic
Resources in the Developing Countries, Chinwe V. Anunobi, Ifeyinwa B. Okoye".
Unllib.unl.edu. Retrieved 9 September 2012.
[4] Witten, Ian H.; Bainbridge, David Nichols (2009). How to Build a Digital Library
(2nd ed.). Morgan Kaufman. ISBN 9780080890395.
[5] Lanagan, James; Smeaton, Alan F. (September 2012). "Video digital libraries:
contributive and decentralized". International Journal on Digital Libraries. 12 (4):
159– 178. doi:10.1007/s00799-012-0078-z.
[6] Wiederhold, Gio (1993). "Intelligent integration of information". ACM SIGMOD
Record. 22(2): 434–437. doi:10.1145/170036.170118.
[7] Besser, Howard (2004). "The Past, Present, and Future of Digital Libraries". In
Schreibman, Susan; Siemens, Ray; Unsworth, John (eds.). A Companion to Digital
Humanities.Blackwell.Publishing.Ltd.pp. 557575.
doi:10.1002/9780470999875.ch36. IS BN 9781405103213. Archived from the
original on 10 August 2017. Retrieved 30April2018.