0% found this document useful (0 votes)
39 views35 pages

Internship Report Big Data Analysis

Neubotz Technologies Pvt. Ltd is an AI-based MedTech company focused on leveraging AI, Machine Learning, and IoT to enhance healthcare through data-driven solutions. The document outlines the company's profile, objectives, and the scope of internships, emphasizing the importance of practical experience for students. Additionally, it discusses the concept of Big Data, its uses, and various data analysis techniques relevant to business decision-making.

Uploaded by

Kaif Khan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views35 pages

Internship Report Big Data Analysis

Neubotz Technologies Pvt. Ltd is an AI-based MedTech company focused on leveraging AI, Machine Learning, and IoT to enhance healthcare through data-driven solutions. The document outlines the company's profile, objectives, and the scope of internships, emphasizing the importance of practical experience for students. Additionally, it discusses the concept of Big Data, its uses, and various data analysis techniques relevant to business decision-making.

Uploaded by

Kaif Khan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

lOMoARcPSD|37347388

BIG DATA ANALYSIS USING PYTHON (18MCA61) 2021-22

COMPANY PROFILE
1.1. Introduction
Neubotz Technologies Pvt. Ltd is an AI based MedTech Company. At Neubotz, We
are building a world where patient’s mind and body wellness is Data driven, fast and Health
Prediction.

Neubotz is an AI-based company founded by a team of four industry experts in 2018.


We have been providing wearable technology solutions to the healthcare sector using AI,
Machine Learning and IOT and leveraging these technologies to Industrial IOT as well. The
motto of our team is to develop AI systems which track human health through machine
learning and IOT techniques to accurately find health problems, assist treatment using Data
mining and analytics.

1.2. Scope
1. We Value our Clients:
At Neubotz Technologies Pvt Ltd, we value our clients and thus work in a flexible
environment for software development process which can be easily adjusted as per
clients’ requirements. High quality work is a prerequisite for every task we undertake, as
we consider that "every day counts".

2. We Believe in Quality:
Excellent and consistent quality at low cost is the key to success in outsourcing
business; and we stick to the basics.

3. We Value our People:


People are the most important asset in Information Technology industry. We believe
that a happy employee equals a happy client. Thus, we recruit skilled engineers, groom
them and provide an enriching environment which expedites their overall growth and
boosts their performance.

Dept. of MCA 1|Page

Downloaded by Studocu ([email protected])


lOMoARcPSD|37347388

BIG DATA ANALYSIS USING PYTHON (18MCA61) 2021-22

1.3. About the Company

Neubotz Technologies Pvt. Ltd is an AI based MedTech Company. At Neubotz, We


are building a world where patient’s mind and body wellness is Data driven, fast and Health
Prediction.

At, Neubotz our goal is to democratize access to positive health lifestyle, by using
feedback learning technology. We have a strong Research and Development team having a
vsion to make health care more accessible and affordable.

As a technical expertise, we also conduct technical workshops and also encourage


candidates (Students, Research Scholars) for research activities on technologies like
Machne learning, IOT, signal and image processing etc, todeliver more smarter, quicker and
reliable systems that will help to overcome the existing challenges of the health care
systems.

Our patented AI product “Nadiswara’ a Pulse Diagnosis machine which captures


the patient’s radial artery and identifies the root cause of the disease by accurately analyzing
the pattern of the pule. Nadiswara uses Deep Learning technology to predict the health of a
person and intimates precaution measures.

1.4. The Problems

1. Unnoticed Work

2. Uncooperative Mentor

3. Issues with Time Management / Self-Management

4. Allotment of Trivial Work

5. Inadequate Compensation

6. Hesitant To Ask Questions

7. Competitive Co-Interns

8. Overwhelmed With Work

Dept. of MCA 2|Page

Downloaded by Studocu ([email protected])


lOMoARcPSD|37347388

BIG DATA ANALYSIS USING PYTHON (18MCA61) 2021-22

1.5. Benefits to the company

 MCA jobs, better lifestyle, brighter future are possible with good performance in MCA
Course. Just as how important MCA course is, Internship too in the MCA course.
 The internship is an opportunity for an engineering student to experience a real-time
environment and learn the concepts which they were taught with a practical touch.
 Interns need to put a greater focus on learning things and impress the employer and get a
full-time job. When a college approves a student to go for an internship, the intern has the
responsibility to let the college know the daily routine in the work and daily job sheet, and
that is called an “Internship Report”
 It gives the college management a clear understanding of how an intern is utilizing the
opportunity. There is a clear guideline to submit an Internship Report.
 It may differ from a college to another slightly but mostly the same.

1.6. Objectives

 To get preparing and trial learning openings for the improvement of abilities in specialized
programming ideas.
 To get an expert work space that urges and offers space to proficient character advancement
and the improvement of expert capability.
 Analytical skills to evaluate software and make improvements. Logical abilities to assess
programming and make upgrades. Order bugs into reports and suggest arrangements.
 Effective expert relational abilities. Meticulous way to deal with programming plan.
 Creative disapproved of expert. Broad critical thinking and basic reasoning capacities.
 Excellent comprehension of plan standards and involvement with their application.
 Background in quality affirmation division. Broad working information on equipment,
apparatuses, and programming dialects.

Dept. of MCA 3|Page

Downloaded by Studocu ([email protected])


lOMoARcPSD|37347388

BIG DATA ANALYSIS USING PYTHON (18MCA61) 2021-22

1.7. Scope of Internship

An internship is beneficial in defining success, which is why you need it in the first
place. Scope of the internship, what you wish to accomplish and how you will get it done.
An internship is a period of work experience offered by an organization for a limited period
of time. Once confined to medical graduates, internship is used for a wide range of
placements in businesses, non-profit organizations and government agencies.
Internships for professional careers are similar in some ways. Similar to internships,
apprenticeships transition students from vocational school into the workforce. The lack of
standardization and oversight leaves the term "internship" open to broad interpretation.
They are typically undertaken by students and graduates looking to gain relevant skills and
experience in a particular field.

An internship consists of an exchange of services for experience between the intern and
the organization. Internships are used to determine whether the intern still has an interest in
that field after the real-life experience. In addition, an internship can be used to create a
professional network that can assist with letters of recommendation or lead to future
employment opportunities.

1.8. Company Contact


• Website:
http://neubotz.in
• E-mail:
[email protected]
• Address:
#1592, 1st Floor, 1st ‘A’ Cross, 3rd Main, Chandra Layout, BCC Layout, Chandra
Layout, Bengaluru, Karnataka 560040

Dept. of MCA 4|Page

Downloaded by Studocu ([email protected])


lOMoARcPSD|37347388

BIG DATA ANALYSIS USING PYTHON (18MCA61) 2021-22

TASK PERFORMED
2.1. Collection of data using Google forms.
The internship was focused on making the student aware about the constructs and
programming environment etc in python. Google Forms is a survey administration
app that is included in the Google Drive office suite along with Google Docs,
Google Sheets, and Google Slides. Forms features all of the collaboration and sharing
features found in Docs, Sheets, and Slides.

 Week Wise Summary of the Internship

WEEK DATE DAY SUMMARY OF THE TOPIC/MODULE COMPLETED

15-02-2021 Monday About company


16-02-2021 Tuesday Introduction to Big Data
1st WEEK

17-02-2021 Wednesday Uses of big data


18-02-2021 Thursday Types of big data
19-02-2021 Friday Data Analysis Process

WEEK DATE DAY SUMMARY OF THE TOPIC/MODULE COMPLETED

22-02-2021 Monday Operating Envirnoment


23-02-2021 Tuesday Working Procedure
2nd WEEK

24-02-2021 Wednesday Task performed


25-02-2021 Thursday Library based data analysis
26-02-2021 Friday Types of libraries

WEEK DATE DAY SUMMARY OF THE TOPIC/MODULE COMPLETED

01-03-2021 Monday Python indroduction


02-03-2021 Tuesday Applications of python
3rd WEEK

03-03-2021 Wednesday Numpy,Pandas,Matplotlib


04-03-2021 Thursday Example python program
05-03-2021 Friday Data analysis python program

Dept. of MCA 5|Page

Downloaded by Studocu ([email protected])


lOMoARcPSD|37347388

BIG DATA ANALYSIS USING PYTHON (18MCA61) 2021-22

WEEK DATE DAY SUMMARY OF THE TOPIC/MODULE COMPLETED


4th WEE

08-03-2021 Monday Assessment


09-03-2021 Tuesday Assessment
10-03-2021 Wednesday Final brush up

Dept. of MCA 6|Page

Downloaded by Studocu ([email protected])


lOMoARcPSD|37347388

BIG DATA ANALYSIS USING PYTHON (18MCA61) 2021-22

INTRODUCTION TO BIG DATA ANALYSIS

Big Data is used to describe the massive volume of both structured and unstructured data that
is so large it is difficult to process using traditional techniques. So Big Data is just what it
sounds like — a whole lot of data.

The concept of Big Data is a relatively new one and it represents both the increasing amount
and the varied types of data that is now being collected. Proponents of Big Data often refer to
this as the “Identification” of the world. As more and more of the world’s information moves
online and becomes digitized, it means that analysts can start to use it as data. Things like
social media, online books, music, videos and the increased amount of sensors have all added
to the astounding increase in the amount of data that has become available for analysis.

Everything you do online is now stored and tracked as data. Reading a book on your Kindle
generates data about what you’re reading, when you read it, how fast you read it and so on.
Similarly, listening to music generates data about what you’re listening to, when how often
and in what order. Your smart phone is constantly uploading data about where you are, how
fast you’re moving and what apps you’re using.

What’s also important to keep in mind is that Big Data isn’t just about the amount of data
we’re generating, it’s also about all the different types of data (text, video, search logs, sensor
logs, customer transactions, etc.). When thinking about Big Data, consider the “seven V’s:”

 Volume : Big Data is, well … big! With the dramatic growth of the internet, mobile
devices, social media, and Internet of Things (IoT) technology, the amount of data
generated by all these sources has grown accordingly.

 Velocity : In addition to getting bigger, the generation of data and organizations’ ability
to process it is accelerating.

Dept. of MCA 7|Page

Downloaded by Studocu ([email protected])


lOMoARcPSD|37347388

BIG DATA ANALYSIS USING PYTHON (18MCA61) 2021-22

 Variety: In earlier times, most data types could be neatly captured in rows on a structured
table. In the Big Data world, data often comes in unstructured formats like social media
posts, server log data, lat-long geo-coordinates, photos, audio, video and free text.

 Variability: The meaning of words in unstructured data can change based on context.

 Veracity: With many different data types and data sources, data quality issues invariably
pop up in Big Data sets. Veracity deals with exploring a data set for data quality and
systematically cleansing that data to be useful for analysis.

 Visualization: Once data has been analyzed, it needs to be presented in a visualization


for end users to understand and act upon.

 Value: Data must be combined with rigorous processing and analysis to be useful.

USES OF BIG DATA

Big Data technologies are very beneficial to the businesses in order to boost the efficiency
and develop new data driven services. There are a number of uses of big data. For example,
in analysing a set of data containing weather report to predict the next weeks weather.

Here are some Uses of Big Data and where it is used

 Health Care
 Detect Frauds
 Social Media Analysis
 Weather
 Public sector.

Data analysis is defined as a process of cleaning, transforming, and modeling data to


discover useful information for business decision-making. The purpose of Data Analysis is to
extract useful information from data and taking the decision based upon the data analysis.
Whenever we take any decision in our day-to-day life is by thinking about what happened
last time or what will happen by choosing that particular decision. This is nothing but
analyzing our past or future and making decisions based on it. For that, we gather memories

Downloaded by Studocu ([email protected])


Dept. of MCA 8|Page
lOMoARcPSD|37347388

BIG DATA ANALYSIS USING PYTHON (18MCA61) 2021-22

of our past or dreams of our future. So that is nothing but data analysis. Now same thing
analyst does for business purposes, is called Data Analysis.

TYPES OF DATA ANALYSIS: TECHNIQUES AND METHODS

There are several types of data analysis techniques that exist based on business and
technology. The major types of data analysis are:

 Text Analysis

 Statistical Analysis

 Diagnostic Analysis

 Predictive Analysis

 Prescriptive Analysis

TEXT ANALYSIS

Text Analysis is also referred to as Data Mining. It is a method to discover a pattern in


large data sets using databases or data mining tools. It used to transform raw data into
business information. Business Intelligence tools are present in the market which is used to
take strategic business decisions. Overall it offers a way to extract and examine data and
deriving patterns and finally interpretation of the data.

STATISTICAL ANALYSIS

Statistical Analysis shows "What happen?" by using past data in the form of
dashboards. Statistical Analysis includes collection, Analysis, interpretation, presentation,
and modeling of data. It analyses a set of data or a sample of data. There are two categories of
this type of Analysis - Descriptive Analysis and Inferential Analysis.

DESCRIPTIVE ANALYSIS

Descriptive analysis complete data or a sample of summarized numerical data. It


shows mean and deviation for continuous data whereas percentage and frequency for
categorical data.

Downloaded by Studocu ([email protected])


Dept. of MCA 9|Page
lOMoARcPSD|37347388

BIG DATA ANALYSIS USING PYTHON (18MCA61) 2021-22

INFERENTIAL ANALYSIS

Inferential analysis sample from complete data. In this type of Analysis, you can find
different conclusions from the same data by selecting different samples.

DIAGNOSTIC ANALYSIS

Diagnostic Analysis shows "Why did it happen?" by finding the cause from the
insight found in Statistical Analysis. This Analysis is useful to identify behavior patterns of
data. If a new problem arrives in your business process, then you can look into this Analysis
to find similar patterns of that problem. And it may have chances to use similar prescriptions
for the new problems.

PREDICTIVE ANALYSIS

Predictive Analysis shows "what is likely to happen" by using previous data. The
simplest example is like if last year I bought two dresses based on my savings and if this year
my salary is increasing double then I can buy four dresses. But of course it's not easy like this
because you have to think about other circumstances like chances of prices of clothes is
increased this year or maybe instead of dresses you want to buy a new bike, or you need to
buy a house! So here, this Analysis makes predictions about future outcomes based on
current or past data. Forecasting is just an estimate. Its accuracy is based on how much
detailed information you have and how much you dig it.

PRESCRIPTIVE ANALYSIS

Prescriptive Analysis combines the insight from all previous Analysis to determine
which action to take in a current problem or decision.

DATA ANALYSIS PROCESS

Data Analysis Process is nothing but gathering information by using proper


application or tool which allows you to explore the data and find a pattern in it. Based on that,
you can take decisions, or you can get ultimate conclusions.

Downloaded by Studocu ([email protected])


Dept. of MCA 10 | P a g e
lOMoARcPSD|37347388

BIG DATA ANALYSIS USING PYTHON (18MCA61) 2021-22

Data Analysis consists of the following phases:

 Data Requirement Gathering

 Data Collection

 Data Cleaning

 Data Analysis

 Data Interpretation

 Data Visualization

DATA REQUIREMENT GATHERING

First of all, you have to think about why do you want to do this data analysis? All you
need to find out the purpose or aim of doing the Analysis. You have to decide which type of
data analysis you wanted to do! In this phase, you have to decide what to analyze and how to
measure it, you have to understand why you are investigating and what measures you have to
use to do this Analysis.

DATA COLLECTION

After requirement gathering, you will get a clear idea about what things you have to
measure and what should be your findings. Now it's time to collect your data based on
requirements. Once you collect your data, remember that the collected data must be
processed or organized for Analysis. As you collected data from various sources, you must
have to keep a log with a collection date and source of the data.

DATA CLEANING

Now whatever data is collected may not be useful or irrelevant to your aim of
Analysis, hence it should be cleaned. The data which is collected may contain duplicate
records, white spaces or errors. The data should be cleaned and error free. This phase must be
done before Analysis because based on data cleaning, your output of Analysis will be closer
to your expected outcome.

DATA ANALYSIS

Downloaded by Studocu ([email protected])


Dept. of MCA 11 | P a g e
lOMoARcPSD|37347388

BIG DATA ANALYSIS USING PYTHON (18MCA61) 2021-22

Once the data is collected, cleaned, and processed, it is ready for Analysis. As you
manipulate data, you may find you have the exact information you need, or you might need to
collect more data. During this phase, you can use data analysis tools and software which will
help you to understand, interpret, and derive conclusions based on the requirements.

DATA INTERPRETATION

After analyzing your data, it's finally time to interpret your results. You can choose
the way to express or communicate your data analysis either you can use simply in words or
maybe a table or chart. Then use the results of your data analysis process to decide your best
course of action.

DATA VISUALIZATION

Data visualization is very common in your day to day life; they often appear in the
form of charts and graphs. In other words, data shown graphically so that it will be easier for
the human brain to understand and process it. Data visualization often used to discover
unknown facts and trends. By observing relationships and comparing datasets, you can find a
way to find out meaningful information. Effective visualization helps users analyze and
reason about data and evidence Tables are generally used where users will look up a specific
measurement, while charts of various types are used to show patterns or relationships in
the data for one or more variables.

Downloaded by Studocu ([email protected])


Dept. of MCA 12 | P a g e
lOMoARcPSD|37347388

BIG DATA ANALYSIS USING PYTHON (18MCA61) 2021-22

Figure 2.1: Data Analysis process

Downloaded by Studocu ([email protected])


Dept. of MCA 13 | P a g e
lOMoARcPSD|37347388

BIG DATA ANALYSIS USING PYTHON (18MCA61) 2021-22

WORKING PROCEDURE

STEP 1: ESTABLISHING A PYTHON ENVIRONMENT FOR DATA ANALYSIS

It is very easy to setup Python environment for performing data analysis. The most
accessible way to start is to download the pyCharm IDE , as it contains the necessary libraries
including NumPy, Pandas, Matplotlib.

STEP 2: ACQUIRING THE BASICS AND FUNDAMENTALS

There are numerous ways to learn the basics of Python. A number of online courses
which offer free tutorials on Python for data science. These free courses consist of video
tutorials and documentation with practice exercises is a comprehensive way to learn by active
participation, as opposed to the traditional method of reading concepts and looking at
examples.

STEP 3: KNOWING ABOUT ESSENTIAL PYTHON PACKAGES FOR DATA


ANALYSIS

Being a general purpose language Python is often used beyond data analysis and data
science. Abundant availability of libraries makes Python remarkably useful for working with
data functionalities. The significant Python libraries that are used for working with data.

 Numpy – this library provides fundamental scientific computing.

 Matplotlib – used for plotting and visualization.

 Pandas – applied for data manipulation and analysis.

 Scikit-learn – library designed for machine learning and data mining.

 StatsModels – packed with statistical modelling, testing, and analysis.

 Scipy-SciPy is a bunch of mathematical algorithms and convenience functions built


on the Numpy extension of Python.

 Seaborn-Seaborn is mostly used for the visualisation of statistical models.

 Plotly-a web-based toolbox for constructing visualisations.


Downloaded by Studocu ([email protected])
Dept. of MCA 14 | P a g e
lOMoARcPSD|37347388

BIG DATA ANALYSIS USING PYTHON (18MCA61) 2021-22

 Theano-package that defines multi-dimensional arrays.

STEP 4: LOADING DATATO LEARN WITH

The best way to learn any programming language is to take a sample dataset and start
working with it. By practising on these sample datasets will help aspirants to apply new
techniques and experiment with learned methods and get to know about one’s strengths and
areas that need improvement.The StatsModels library of Python includes some preloaded
datasets that can be used. Once being familiar with working users can load a dataset from the
web or a CSV file.

STEP 5: OPERATIONS ON DATA

The most important skills required to extract information from abundant data is data
administration. In most of the occasions, we get crude data which is not applicable for
analysis.

To make the data available for analysis we need to manipulate it. Python provides tools and
applications for transforming, formatting, cleaning and moulds it for examining.

STEP 6: EFFECTIVE DATA VISUALISATION

Visuals are remarkably relevant for both exploratory data analysis and to
communicate results. Matplotlib is the regular Python library used for visualisation.

STEP 7: DATA ANALYTICS

Analysing data is not just formatting and creating plots and graphs. The core aspects
of analytics are statistical modelling, machine learning algorithms, data mining techniques,
inferences. The Python programming language is an excellent tool for analysing data because
it has effective libraries such as Scikit-learn and StatsModels which contain the tools of the
models and algorithms that are essential for analy

TASK PERFORMED

Downloaded by Studocu ([email protected])


Dept. of MCA 15 | P a g e
lOMoARcPSD|37347388

BIG DATA ANALYSIS USING PYTHON (18MCA61) 2021-22

THE LEFT BRAIN/RIGHT BRAIN THEORY

This theory is based on the fact that the brain’s two hemispheres function differently.
This first came to light in the 1960s, thanks to the research of psycho biologist and Nobel
Prize winner Roger W. Sperry.

The theory is that people are either left-brained or right-brained, meaning that one
side of their brain is dominant. If you’re mostly analytical and methodical in your thinking,
you’re said to be left-brained. If you tend to be more creative or artistic, you’re thought to be
right-brained.

Figure 4.1: The left brain/right brain theory

Downloaded by Studocu ([email protected])


Dept. of MCA 16 | P a g e
lOMoARcPSD|37347388

BIG DATA ANALYSIS USING PYTHON (18MCA61) 2021-22

The left brain is more verbal, analytical, and orderly than the right brain. It’s
sometimes called the digital brain. It’s better at things like reading, writing, and
computations.

According to Sperry’s dated research, the left brain is also connected to:

 Logic

 Sequencing

 Linear thinking

 Mathematics

 Facts

 Thinking in words

The right brain is more visual and intuitive. It’s sometimes referred to as the analog
brain. It has a more creative and less organized way of thinking.

Sperry’s dated research suggests the right brain is also connected to:

 Imagination

 Holistic thinking

 Intuition

 Arts

 Rhythm

 Nonverbal cues

 Feelings visualization

 Daydreaming

We know the two sides of our brain are different, but does it necessarily follow that
we have a dominant brain just as we have a dominant hand?A team of neuroscientists set out

Downloaded by Studocu ([email protected])


Dept. of MCA 17 | P a g e
lOMoARcPSD|37347388

BIG DATA ANALYSIS USING PYTHON (18MCA61) 2021-22

to test this premise. After a two year analysis Trusted Source, they found no proof that this
theory is correct. Magnetic resonance imaging of 1,000 people revealed that the human brain
doesn’t actually favor one side over the other. The networks on one side aren’t generally
stronger than the networks on the other side.

The two hemispheres are tied together by bundles of nerve fibers, creating an
information highway. Although the two sides function differently, they work together and
complement each other. You don’t use only one side of your brain at a time.

Whether you’re performing a logical or creative function, you’re receiving input from
both sides of your brain. For example, the left brain is credited with language, but the right
brain helps you understand context and tone. The left brain handles mathematical equations,
but right brain helps out with comparisons and rough estimates.

General personality traits, individual preferences, or learning style don’t translate into
the notion that you’re left-brained or right-brained.

Still, it’s a fact that the two sides of your brain are different, and certain areas of your
brain do have specialties. The exact areas of some functions can vary a bit from person to
person.

PREFERENCE OF LIBRARY BASED DATA ANALYSIS

DIGITAL LIBRARY

A digital library, digital repository, or digital collection, is an online database of


digital objects that can include text, still images, audio, video, or other digital media formats.
Objects can consist of digitized content like print or photographs, as well as originally
produced digital content like word processor files or social media posts. In addition to storing
content, digital libraries provide means for organizing, searching, and retrieving the content
contained in the collection.

Digital libraries can vary immensely in size and scope, and can be maintained by
individuals or organizations. The digital content may be stored locally, or accessed remotely
via computer networks. These information retrieval systems are able to exchange information
with each other through interoperability and sustainability.

Downloaded by Studocu ([email protected])


Dept. of MCA 18 | P a g e
lOMoARcPSD|37347388

BIG DATA ANALYSIS USING PYTHON (18MCA61) 2021-22

The early history of libraries is poorly documented, but several key thinkers are
connected to the emergence of this concept. Predecessors include Paul Otlet and Henri La
Fontaine's Mundaneum, an attempt begun in 1895 to gather and systematically catalogue the
world's knowledge, the hope of bringing about world peace. The establishment of the digital
library was total dependent on the progress in the age of the internet. It not only provided the
means to compile the digital library but the access to the books by millions of individuals on
the World Wide Web.

ADVANTAGES OF DIGITAL LIBRARIES

The advantages of digital libraries as a means of easily and rapidly accessing books,
archives and images of various types are now widely recognized by commercial interests and
public bodies alike. Traditional libraries are limited by storage space; digital libraries have
the potential to store much more information, simply because digital information requires
very little physical space to contain it. As such, the cost of maintaining a digital library can be
much lower than that of a traditional library.

 No physical boundary.

 Round the clock availability

 Multiple access

 Information retrieval.

 Preservation and conservation.

 Space

 Added value.

 Easily accessible

DRAWBACKS OF DIGITAL LIBRARIES

Digital libraries, or at least their digital collections, unfortunately also have brought
their own problems and challenges in areas such as:

 User authentication for access to collections

Downloaded by Studocu ([email protected])


Dept. of MCA 19 | P a g e
lOMoARcPSD|37347388

BIG DATA ANALYSIS USING PYTHON (18MCA61) 2021-22

 Copyright

 Digital preservation

 Equity of access

 Interface design

 Interoperability between systems and software

 Information organization

 Inefficient or non-existent taxonomy practices

 Training and development

 Quality of metadata

 Exorbitant cost of building/maintaining the terabytes of storage.

 Servers and redundancies necessary for a functional digital collection.

OFFLINE LIBRARY

A library is a curated collection of sources of information and similar resources,


selected by experts and made accessible to a defined community for reference or borrowing.
It provides physical or digital access to material, and may be a physical location or a virtual
space, or both. A library is organized for use and maintained by a public body, an institution,
a corporation, or a private individual. Public and institutional collections and services may be
intended for use by people who choose not to—or cannot afford to—purchase an extensive
collection themselves, who need material no individual can reasonably be expected to have,
or who require professional assistance with their research.

In addition to providing materials, libraries also provide the services of librarians who
are experts at finding and organizing information and at interpreting information needs.
Libraries often provide quiet areas for studying, and they also often offer common areas to
facilitate group study and collaboration. Libraries often provide public facilities for access to
their electronic resources and the Internet.

Downloaded by Studocu ([email protected])


Dept. of MCA 20 | P a g e
lOMoARcPSD|37347388

BIG DATA ANALYSIS USING PYTHON (18MCA61) 2021-22

The history of libraries began with the first efforts to organize collections of
documents. Topics of interest include accessibility of the collection, acquisition of materials,
arrangement and finding tools, the book trade, the influence of the physical properties of the
different writing materials, language distribution, role in education, rates of literacy, budgets,
staffing, libraries for specially targeted audiences, architectural merit, patterns of usage, and
the role of libraries in a nation's cultural heritage, and the role of government, church or
private sponsorship. Since the 1960s, issues of computerization and digitization have arisen.

TYPES OF LIBRARY

Many institutions make a distinction between a circulating or lending library, where


materials are expected and intended to be loaned to patrons, institutions, or other libraries,
and a reference library where material is not lent out.

 Academic libraries
 Children's libraries
 National libraries
 Public lending libraries
 Reference libraries
 Research libraries
 Digital libraries
 Special libraries

TASK PERFORMED

Collection of data using Google forms.

Google Forms is a survey administration app that is included in the Google


Drive office suite along with Google Docs, Google Sheets, and Google Slides. Forms features
all of the collaboration and sharing features found in Docs, Sheets, and Slides.

Google Forms is a tool that allows collecting information from users via a
personalized survey or quiz. The information is then collected and automatically connected to
a spreadsheet.

Google Forms is a tool that allows collecting information from users via a
personalized survey or quiz. The information is then collected and automatically connected to

Downloaded by Studocu ([email protected])


Dept. of MCA 21 | P a g e
lOMoARcPSD|37347388

BIG DATA ANALYSIS USING PYTHON (18MCA61) 2021-22

a spreadsheet. The spreadsheet is populated with the survey and quiz responses. The Forms
service has undergone several updates over the years. New features include, but are not
limited to, menu search, shuffle of questions for randomized order, limiting responses to once
per person, shorter URLs, custom themes, automatically generating answer suggestions when
creating forms, and an "Upload file" option for users answering questions that require them to
share content or files from their computer or Google Drive. The upload feature is only
available through G Suite. In October 2014, Google introduced add-ons for Google Forms,
that enable third-party developers to make new tools for more features in surveys.

Collected the right data using the following link through Google forms.
https://forms.gle/m3LNt97VQrREL3bX7

Figure 4.2: Google form template

Downloaded by Studocu ([email protected])


Dept. of MCA 22 | P a g e
lOMoARcPSD|37347388

BIG DATA ANALYSIS USING PYTHON (18MCA61) 2021-22

Figure 4.3: collection of data

Downloaded by Studocu ([email protected])


Dept. of MCA 23 | P a g e
lOMoARcPSD|37347388

BIG DATA ANALYSIS USING PYTHON (18MCA61) 2021-22

TECHNOLOGY BEHIND

PYTHON

What is Python?

Python is a popular programming language. It was created by Guido van Rossum, and
released in 1991.

It is used for:

 Web development (server-side),

 Software development,

 mathematics,

 System scripting.

What can Python do?

 Python can be used on a server to create web applications.

 Python can be used alongside software to create workflows.

 Python can connect to database systems. It can also read and modify files.

 Python can be used to handle big data and perform complex mathematics.

 Python can be used for rapid prototyping, or for production-ready software


development.

Why Python?

 Python works on different platforms (Windows, Mac, Linux, Raspberry Pi, etc).

 Python has a simple syntax similar to the English language.

 Python has syntax that allows developers to write programs with fewer lines than
some other programming languages.

Downloaded by Studocu ([email protected])


Dept. of MCA 24 | P a g e
lOMoARcPSD|37347388

BIG DATA ANALYSIS USING PYTHON (18MCA61) 2021-22

 Python runs on an interpreter system, meaning that code can be executed as soon as it
is written. This means that prototyping can be very quick.

 Python can be treated in a procedural way, an object-orientated way or a functional


way.

Good to know

 The most recent major version of Python is Python 3, which we shall be using in this
tutorial. However, Python 2, although not being updated with anything other than
security updates, is still quite popular.

 In this tutorial Python will be written in a text editor. It is possible to write Python in
an Integrated Development Environment, such as Thonny, Pycharm, Netbeans or
Eclipse which are particularly useful when managing larger collections of Python
files.

Built-in Data Types

 In programming, data type is an important concept.


 Variables can store data of different types, and different types can do different things.
 Python has the following data types built-in by default, in these categories:

Text Type: str

Numeric Types: int, float, complex

Sequence Types: list, tuple, range

Mapping Type: Dict

Set Types: set, frozenset

Boolean Type :Bool

There are four collection data types in the Python programming language:

 List is a collection which is ordered and changeable. Allows duplicate members.

Downloaded by Studocu ([email protected])


Dept. of MCA 25 | P a g e
lOMoARcPSD|37347388

BIG DATA ANALYSIS USING PYTHON (18MCA61) 2021-22

 Tuple is a collection which is ordered and unchangeable. Allows duplicate members.

 Set is a collection which is unordered and unindexed. No duplicate members.

 Dictionary is a collection which is unordered, changeable and indexed. No duplicate


members.

List: A list is a collection which is ordered and changeable. In Python lists are written with
square brackets.

Dictionary: A dictionary is a collection which is unordered, changeable and indexed. In Python


dictionaries are written with curly brackets, and they have keys and values.

Applications of Python

As mentioned before, Python is one of the most widely used language over the web. I'm
going to list few of them here:

 Easy-to-learn − Python has few keywords, simple structure, and a clearly defined
syntax. This allows the student to pick up the language quickly.

 Easy-to-read − Python code is more clearly defined and visible to the eyes.

 Easy-to-maintain − Python's source code is fairly easy-to-maintain.

 A broad standard library − Python's bulk of the library is very portable and cross-
platform compatible on UNIX, Windows, and Macintosh.

 Interactive Mode − Python has support for an interactive mode which allows
interactive testing and debugging of snippets of code.

 Portable − Python can run on a wide variety of hardware platforms and has the same
interface on all platforms.

 Extendable − You can add low-level modules to the Python interpreter. These
modules enable programmers to add to or customize their tools to be more efficient.

 Databases − Python provides interfaces to all major commercial databases.

Dept. of MCA 26 | P a g e

Downloaded by Studocu ([email protected])


lOMoARcPSD|37347388

BIG DATA ANALYSIS USING PYTHON (18MCA61) 2021-22

 GUI Programming − Python supports GUI applications that can be created and
ported to many system calls, libraries and windows systems, such as Windows MFC,
Macintosh, and the X Window system of Unix.

 Scalable − Python provides a better structure and support for large programs than
shell scripting.

LIBRARIES USED

NumPy

NumPy is a Python package which stands for ‘Numerical Python’. It is the core
library for scientific computing, which contains a powerful n-dimensional array object,
provide tools for integrating C, C++ etc. It is also useful in linear algebra, random number
capability etc. NumPy array can also be used as an efficient multi-dimensional container for
generic data.

Pandas

Pandas is an opensource library that allows to you perform data manipulation in


Python. Pandas library is built on top of Numpy, meaning Pandas needs Numpy to operate.
Pandas is also an elegant solution for time series data. Pandas is useful library in data
analysis. It can be used to perform data manipulation and analysis. Pandas provide powerful
and easy-to-use data structures, as well as the means to quickly perform operations on these
structures.

Matplotlib

Matplotlib is a plotting library for the Python programming language and its
numerical mathematics extension NumPy. It provides an object-oriented API for embedding
plots into applications using general-purpose GUI toolkits like Tkinter, wxPython, Qt,
or GTK+.

Dept. of MCA 27 | P a g e

Downloaded by Studocu ([email protected])


lOMoARcPSD|37347388

BIG DATA ANALYSIS USING PYTHON (18MCA61) 2021-22

Example python program

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

one=np.array(range(0,9))
two=np.array(range(0,9))
one.shape=(3,3) two.shape=(3,3)
print(one)

print(two)
print(one+two)

data={"Name":["a","b","c"],"Marks":[10,20,30]}
td=pd.DataFrame(data) x_axis=[1,2,3,4,12,34,29,30]
bins=[10,20,30,40]

plt.title(" ")
plt.hist(x_axis,bins)
plt.xlabel("")

plt.ylabel(" ")
plt.show( )

pd.read_csv("D:/project/dataAnalysis.csv")

output of example python program

Figure 4.4: output of example python program

Dept. of MCA 28 | P a g e

Downloaded by Studocu ([email protected])


lOMoARcPSD|37347388

BIG DATA ANALYSIS USING PYTHON (18MCA61) 2021-22

Data analysis based on the collected data set through Google form and
implementation in python using the libraries and data set and plotting a graph.

Left brain and right brain theory prediction is done based on the questions asked in
Google form where the questions are like,

1. Which subjects do you prefer?


2. It is easier for you to remember faces rather than names
3. Which type of Reading do you prefer?

Data analysis python program

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt data=pd.read_csv("D:/project/Data


Analysis.csv")print(data)

data=data.dropna()

Figure 4.5: data frame output 1

req_data=data["Which subjects do you prefer?"] df_math_ppl=data[data["Which subjects do you


prefer?"]=="Math or similar

Downloaded by Studocu ([email protected])


Dept. of MCA 29 | P a g e
lOMoARcPSD|37347388

BIG DATA ANALYSIS USING PYTHON (18MCA61) 2021-22

subjects"]
print(df_math_ppl)

ser_math=df_math_ppl["Which subjects do you prefer?"]

no_of_math_ppl= ser_math.count()

tot_ppl=req_data.count() right_brain_music=tot_ppl-
no_of_math_ppl print(right_brain_music)

tot_rec=data["It is easier for you to remember faces rather than names."]df_names_recog=data[data["It


is easier for you to remember faces rather than names."]=="No"]

ser_names=df_names_recog["It is easier for you to remember faces ratherthan names."]

no_of_names_recog=ser_names.count() right_brain_face=tot_rec.count()-
no_of_names_recog no_left_brain_ppl=no_of_math_ppl+no_of_names_recog
no_right_brain_ppl=right_brain_music+right_brain_face x_axis=[0,1]
y_axis=[no_left_brain_ppl,no_right_brain_ppl] x_label=("left vs right brain
ppl")

y_label=("no of ppl") plt.xlabel("left vs right brain


ppl")plt.ylabel("no of ppl") plt.bar(x_axis,y_axis)

plt.show()

total_opinion=data["Which type of Reading do you prefer?"]total_opinion.count()

df_online_prefereed_ppl=data[data["Which type of Reading do youprefer?"]=="Online"]

ser_online_ppl=df_online_prefereed_ppl["Which type of Reading do youprefer?"]

online_prefereed=ser_online_ppl.count() offline_prefered=total_opinion.count()-
online_prefereed x_axis=[0,2]

y_axis=[online_prefereed,offline_prefered] x_label=("online vs
offline library") y_label=("no of ppl")

plt.xlabel("online vs offline library")plt.ylabel("no of


ppl")

Downloaded by Studocu ([email protected])


Dept. of MCA 30 | P a g e
lOMoARcPSD|37347388

BIG DATA ANALYSIS USING PYTHON (18MCA61) 2021-22

plt.bar(x_axis,y_axis)plt.show()

Figure 4.6: left v/s right brain theory output

Downloaded by Studocu ([email protected])


Dept. of MCA 31 | P a g e
lOMoARcPSD|37347388

BIG DATA ANALYSIS USING PYTHON (18MCA61) 2021-22

Figure 4.7: data frame output 2

Downloaded by Studocu ([email protected])


Dept. of MCA 32 | P a g e
lOMoARcPSD|37347388

BIG DATA ANALYSIS USING PYTHON (18MCA61) 2021-22

Figure 4.8: online v/s online reading theory output

Dept. of MCA 33 | P a g e

Downloaded by Studocu ([email protected])


lOMoARcPSD|37347388

BIG DATA ANALYSIS USING PYTHON (18MCA61) 2021-22

CONCLUSION

Because of increase in the amount of data in current environment, it becomes


difficult to handle the data, and analysis of the data sets. Although there are many sources of
data that are currently fueling the rapid growth in data volume, Massive data analysis creates
new challenges at the interface between humans and computers.

Here we collected the data through the Google forms , mined the data collected and
given it as input , after some operations on data sets using particular libraries in python, we
visualize the output by plotting the graph based on given input .

Downloaded by Studocu ([email protected])


Dept. of MCA 34 | P a g e
lOMoARcPSD|37347388

BIG DATA ANALYTICS USING PYTHON (18MCA61) 2021-22

REFERENCES

[1] "The Amazing Story of Kentucky's Horseback Librarians (10 Photos)". Archive
Project. Retrieved 19 May 2017.
[2] "St. George Library Workshops". utoronto.ca. Dowler, Lawrence (1997). Gateways
to knowledge: the role of academic libraries in teaching, learning, and research.
ISBN
[3] "The Role of Academic Libraries in Universal Access to Print and Electronic
Resources in the Developing Countries, Chinwe V. Anunobi, Ifeyinwa B. Okoye".
Unllib.unl.edu. Retrieved 9 September 2012.
[4] Witten, Ian H.; Bainbridge, David Nichols (2009). How to Build a Digital Library
(2nd ed.). Morgan Kaufman. ISBN 9780080890395.
[5] Lanagan, James; Smeaton, Alan F. (September 2012). "Video digital libraries:
contributive and decentralized". International Journal on Digital Libraries. 12 (4):
159– 178. doi:10.1007/s00799-012-0078-z.
[6] Wiederhold, Gio (1993). "Intelligent integration of information". ACM SIGMOD
Record. 22(2): 434–437. doi:10.1145/170036.170118.
[7] Besser, Howard (2004). "The Past, Present, and Future of Digital Libraries". In
Schreibman, Susan; Siemens, Ray; Unsworth, John (eds.). A Companion to Digital
Humanities.Blackwell.Publishing.Ltd.pp. 557575.
doi:10.1002/9780470999875.ch36. IS BN 9781405103213. Archived from the
original on 10 August 2017. Retrieved 30April2018.

Dept. of MCA Page 35

Downloaded by Studocu ([email protected])

You might also like