ITECH 2201 Cloud Computing School of Science, Information Technology & Engineering
ITECH 2201 Cloud Computing School of Science, Information Technology & Engineering
ITECH 2201 Cloud Computing School of Science, Information Technology & Engineering
Answer: The process which takes care of Packaging, delivering and organizing the
information or data is called 'Data Science'. The manipulation of data and making updates
to new package from raw data is called Packaging. Organizing data means management
of data storage. And Delivering takes care of whether data is delivered to get authority or
not.
According to IBM estimation, what is the percent of the data in the world
today that has been created in the past two years?
Answer: According to IBM it is 90% of the data in the world today that has been
created in the past two years.
Answer: The above given link indicates that Berkeley school of Information offers
the Master's degree in Information and Data science, which is further classified
into synthetic and Advanced capstone courses.
Read the following research paper from IEEE Xplore Digital Library
Ali-ud-din Khan, M.; Uddin, M.F.; Gupta, N., "Seven V's of Big Data
understanding Big Data to extract value," American Society for Engineering
Education (ASEE Zone 1), 2014 Zone 1 Conference of the , pp.1,5, 3-5 April 2014
Answer: Author gave his research new heights by stating that “Big data has a lot
of potential in real world industry and research community” and he has explained
it by giving concept of 7 V’s. According the author Big data utilization will be
centre of focus for both researchers and students in upcoming years.
What are the 7 v’s mentioned in the paper? Briefly describe each V in
one paragraph.
• Volume:
It simply states size of the data. It refers to volume occupied by creation of
sources such as video, audio, text, social networking etc.
• Velocity:
• Variety:
The term variety means diversity in type of data. More complex the data is
more are the chances of occurring error.
• Veracity:
• Validity:
• Volatility:
• Value:
Last but not least, Value is most critical of all V's. Because all that matters
are output of process.
Explore the author’s future work by using the reference [4] in the
research paper. Summarise your understanding how Big Data can
improvise healthcare sector in 300 words.
Answer: Now a day the Healthcare sector is widely adopting digitization in their
data analysing and it is growing day by day. Same time recognizing social
insurance industry, it is reasonable that it will produce an immense sum from
claiming information over approaching A long time. Enormous information might
be extemporized Previously, tolerant consideration Likewise numerous nations in
the exhibit planet are actualizing EHR (Electronic wellbeing Records) which will
make optimized and unified Toward tolerant majority of the data. Electronic
wellbeing Records will make a huge information that could a chance to be re-
identified What's more re-analysed for profitable data.
Big Data could be supportive should forestall wellbeing protection frauds and also
blacks Toward interfacing Different information sets that will provide for
protection operator organizations Furthermore healing facilities about riches data
on low down the fake measure included.
Taking everything into account we might say that those social insurance
associations that need to actualize all the huge information if be watchful regarding
those protection and security Furthermore need with keep them ahead highest
priority on those rundown.
Exercise 3: Big Data Platform (1 mark)
In order to build a big data platform one has to acquire, organize and analyse the
big data. Go through the following links and answer the questions that follow the
links:
• http://www.infochimps.com/infochimps-cloud/how-it-works/
• http://www.youtube.com/watch?v=TfuhuA_uaho
• http://www.youtube.com/watch?v=IC6jVRO2Hq4
• http://www.youtube.com/watch?v=2yf_jrBhz5w
Please note: You are encouraged to watch all the videos in that series from Oracle.
How to acquire big data for enterprises and how it can be used?
Answer: Big Data may be besides an immense test to purpose the issues in an
association. The genuine issues for example, information flows, storage, analytics,
and intelligent media interfaces. Enormous information may be an accumulation
for information encompassed all over around a association like presents on Online
networking sites, on the web transaction records purchased/sold, advanced pictures
Also internet feature Entries. The reason for endeavour with procure huge
information will be to raise an arrangement will help the prerequisites for
coordination those system infrastructures, information storages Also progressing
business modulus.
Big data has two building components while collaborating to enterprise are
Hadoop
NoSQL.
Though information will be a great deal greater and muddled clinched alongside
analysing after that it camwood a chance to be effortlessly maintained/ took care of
by inspecting that information under bits.
What are the analyses that can be done using big data?
Answer: The analyses that can be done by using big data can be considered as
follows:
Part B
(4 Marks)
Part B answers should be based on well cited article/videos – name the references
used in your answer.For more information read the guidelines as given in
Assignment 1.
Google is a master at creating data products. Below are few examples from
Google. Describe the below products and explain how the large scale data is used
effectively in these products.
• Google’s PageRank
Answer: It gives us the ability to write a text with correct spellings and
grammar. It is useful in mainly writing e-mails.
Answer: It gives the idea of Flu trends across the globe and predicts the
relevant activities carried out in different countries.
• Google’s Trends
Answer: RDBS is not all that successful to use to store huge information on the
grounds that RDBS is not up to the requests of enormous information yet. The
required information taking care of is soaring as far as volume. RDBS has
unbendable blueprint and enormous information requests for the high scope of
different information sorts.
Answer: Currently over 150 NoSQL Databases are used by many organizations.
Below are the five NoSQL Database categories which has sub categories adopted
by organization depending on their need.
• Map step
• Shuffle step
• Reduce step
• Summarization
• Filtering
• Data Organization
• Join
• Metapatterns
• Durability
• Availability
• Security
• Integrity and
• High Performance.
Name 3 industries that should use Big Data – justify your claim in 250 words
for each industry using proper references.
• GE (General Electric)
• AYASDI
• IBM
General Electric: GE is known for having its superiority in making of electric
machines and appliances. It is a big data company with a vision to expand its
industrial internet on a large extent. In a joint venture with Accenture GE have
created software to backup data into cloud in railways and airlines.
From your lecture and also based on the below given video link:
https://www.youtube.com/watch?v=_sXkTSiAe-A
Write a paragraph about memory virtualization.
Answer: In PC composing memory virtualization is a memory which is
not physical memory. Memory virtualization is done to save the
physical memory and its expenses. In a virtualization figuring
environment, officials can use virtual memory organization to apportion
additional memory to a virtual machine that has miss the mark on
resources. The instance of virtual memory is VMware programming
which allows the customers to make various working systems without
having the physical memory. Virtual memory licenses applications on
diverse servers to share data without replication.
What is RAID 0?
Answer: Assault remains for repetitive cluster of cheap circles, RAID
permits to have physical plates and one legitimate plate. It gives no
adaptation to internal failure or copy information. In RAID 0 the
catastrophe of one commute will bring about the whole exhibit to come
up short, subsequently add up to information misfortune. Attack 0 is
utilization to expand the circles' execution.
What is ISS?
Answer: ISS remains for canny stockpiling framework; it is a
component rich RAID cluster that gives profoundly improved I/O
handling capacities. ISS gives expansive measure of reserve and
numerous I/O ways that upgrades the execution, backings glimmer
drive, virtual provisioning and computerized stockpiling tiering.
Storage Area Network (SAN) and Network Attached Storage (NAS) are
widely used concepts in data storage arena. The following YouTube
video links gives detailed description of these concepts:
− http://www.youtube.com/watch?v=csdJFazj3h0
− http://www.youtube.com/watch?v=vdf6CvGQZrk
− http://www.youtube.com/watch?v=MKZU8zOMiqE
Based on the watched videos answer the following questions:
What are two common NAS file sharing protocols? How they are
different from each other?
Answer: Two normal NAS record sharing conventions are Computer
Based NAS and integrated framework based NAS.
However, both are not the same as one another in numerous viewpoints,
for instance: The force utilization in PC based NAS is the biggest and
the execution is likewise the most effective similarly. What's more, at
the other hand inserted framework need MIPS arranged processor to run
the NAS server and force utilization for this sort is reasonable.
Part B
(3 Marks)
Exercise 3: Storage Design (1 Mark)
Design Storage Solution for New Application
Scenario
An association is conveying another business application in their
surroundings. The new application requires 1TB of storage room for
business and application information. Amid top workload, application is
relied upon to create 4900 IOPS (I/O every second) with average I/O
information square size of 4KB.
The seller accessible circle drive alternative is 15,000 rpm drive with 100
GB limit. Different details of the drives are:
You are required to figure the required number of circle drives that
can meet both limit and execution necessities of an application.
Hint: Keeping in mind the end goal to ascertain the IOPS from normal
look for time, information exchange rate, circle rpm and information
piece size allude slide 15 in week 7 address slide. When you have IOPS,
allude slide 16 in week 7 to figure the required number of plates.
− http://www.youtube.com/watch?v=hSFyf-rmjA8
− http://www.youtube.com/watch?v=iCfJCzfNLrw
On the off chance that there is break in association then FCoE will need to
supplant the entire association with new one.
You have read and answered about SAN in part A – based on your
understanding and with some research effort answers the following
questions:
What is a Virtual SAN?
Answer: Virtual SAN remains for Virtual Storage Area Network; it is a
virtual fabric which has an accumulation of ports to set an
association between fiber channel switches. VSAN has a port by port
resized structure where as other fabric is resized by switch by switch.
What's more, it can be arranged in either routes by independently
and autonomously.
http://www.youtube.com/watch?v=1SkUt7q8Dm8
Part A
(3 Marks)
Read the article in the below link and answer the questions that follow:
http://www.computer.org/csdl/mags/it/2010/02/mit2010020004.html
As Through the green utilization it can be decreased through the vitality utilizing
of the server farms, PCs, data frameworks and it utilized as a part of an
ecological sound way. Getting through the utilization of green transfer it
should be possible through re utilizing the old PCs and viably reusing waste
PCs and other electronic gears similar to console, mouse, cpu outer hard
plates.
As favouring with the green configuration it ought to be composed in an effective
way and ecologically solid parts like, PCs , servers cooling gear , server farms
. Similarly, as with the Green assembling it ought to be make the electronic
segments, PCs, and other related subsystems with the low effect or impact on
the environment.
Green use:
Green disposal:
Green design:
Green manufacturing:
As The green use refers to using the energy in data centres, information and
computers systems as by using them in an environmental friendly manner.
Green disposal refers to the use of the unwanted computers and responsibly
recycling the old computers and other electronic devices. Green design:
Designing the system in the energy efficient manner and the environmental
sound components computers, servers, hard disks. Green manufacturing: As
the green manufacturing refers to the manufacturing and developing the
electronic parts, computers, and the other associated subsystems with less
effects on the environment.
Exercise 3: Environmentally Sound Practices (1 Mark)
List 5 universities who offers Green Computing course. You should name the university,
the course name and the brief description about the course.
The green figuring can be characterized as the it is the study and routine of planning ,
fabricating use and destroying the desktops , servers and related subsystems, for example,
printers , capacity gadgets organizing adornments and speakers, earphones in a viable and
efficiency with no mischief to the earth.
Ref: https://www.google.com.au/?
gfe_rd=cr&ei=KbL0Var3BKfu8wfMmoHwCw#q=green+computing+definition
List and briefly describe (2 lines) the APIs provided by the above major
vendors.
As it gives portrayals, linguistic structure and use illustrations for each of the
activities and information sorts for amazon EC2 .
https://en.wikipedia.org/wiki/Microsoft_Speech_API
https://developers.google.com/maps/?hl=en
Part B
(3 Marks)
Most part of power consumption in data centres comes from computation processing, disk
storage, network and cooling systems. Nowadays, there are new technologies and
methods proposed to reduce energy cost in data centres. From the above paper summarize
(in 300 words) the recent work done in these fields.
As from the article there are new advances and strategies proposed to diminish vitality
costs in the server farms as there are some current systems to distributed using so as to
compute greener as chip as the microchips assume a key part in undertaking planning as it
makes the registering procedures very simpler. As it made server farms to handle a lot of
errands consistently it entirely less demanding to handle the assignment as much simpler,
to be sure it made effectively plan the undertaking in the green distributed computing.
As by utilizing virtual machines it is one of the viable ways to deal with diminish the
force utilization and to allot assets. As by utilizing system topologies the system outline as
inside the server farm is very not quite the same as the web and p2p. It ought to be
composed in view of the
Focuses. Similarly as with utilizing circle stockpiling it guarantees positives of the online
stockpiling concerning less utilization of the force use as it still a subject of green
distributed computing. As to make distributed computing greener, planning calculations
are utilized as a part of the server farms . As a calculations' portion utilized as a part of the
server farms are as per the following first start things out served calculation As the FCFS
booking calculation consequently executes lined solicitations and it forms by the request
of their entry. As there are a few positives and negatives with the first start things out
serve calculation.
Positives:
It is the least complex and essential calculation as that uses the idea of time cuts .
As in these calculation the time is partitioned into the different cuts and every hub is
given a specific time interim and in this time interim the hub will perform its operations.
Positives:
It lives up to expectations all the more proficiently for brief time occupations, as it is not
really utilized for timesharing frameworks
Negatives:
https://code.google.com/p/sainsburys-nectar-api/
https://pypi.python.org/pypi/python-novaclient
The use of API as a part of the accompanying connection is python–nova
customer 2.29.0. As the python nova customer is authorized under the apache
permit like whatever remains of open stack. As by introducing the charge line
API gets the shell summon as it called as nova, that you can use to associate
with any open stack cloud. As we can utilize it by writing os username os
secret key and after that we have to characterize with confirmation url with
os-auth-url and adaptation of the API with os–compute-programming
interface variant, then we have to indicate other one that is district name (OS-
REGION-NAME). As we locate the complete documentation on the shell as
by running now Open stack is an open source collaborative software project
which meets many of the cloud needs.
Below links gives vast information about Open stack.
https://support.rc.nectar.org.au/docs/openstack
http://docs.openstack.org/ap/quick-start/content/
Write a report (2 pages) about the Open stack features and functionalities.
Open stack has different undertakings each they could call their own
code names as like
Project home
Open stack docs
It as assets like API Reference as it is the nuts and bolts for validating and utilizing
the figure and picture APIS
Programming interface referencing is only the application programming interfaces, as
it offers the way to use the capacities of an administration as by utilizing the
predefined capacities , as it as a references' percentage like square stockpiling API v2,
register APIV2.1(CURRENT) , Compute APIV2 (bolstered) , figure API v2
augmentations upheld .character API V3(CURRENT)
API Quick begin as it is manual for utilizing the open stack APIs
Open stack customers official and creating summon line customers
Open stack blog it is an accumulation of considerations from the designers and the
other key players of the open stack ventures
Research cloud client discussion: IT is an open piece anyone can provide for offer
encounters.
It has open stack documentation for occurrence it as documentation for kilo
It has tryout open stack as it utilized like an asset equipment which is uninhibitedly
accessible to test open stack applications.
As to speedy begin with the open stack APIS ask for as we have to utilize a few
strategies as it first accompanies the C url it is an order line device that lets us
to send Hypertext exchange demands and to get reactions .
It has open stack summon line customers as every open stack task gives an order
line customers that makes us to get to its APIS through simple to utilize
orders.
It as REST customers like Mozilla and google as both those offers program based
graphical interface for Rest.
As in Mozilla REST customers bolsters all HTTP techniques RFC
2616(HTTP/1.1) and RFC 2518(WebDAV). As we can build custom HTTP
ask for (custom strategy with assets URL and HTTP solicitation body) to
straightforwardly test solicitations against a server.
It has open stack python programming advancement unit: As these uses
programming improvement pack to make the python robotization scripts that
can make and oversee assets as in our open stack cloud .The product
improvement unit executes python ties to the open stack APIS which makes
us to perform the mechanization lives up to expectations in python as by
making solutions and approaches the python questions as opposed to make the
rest calls straightforwardly. All of the open stack summon line apparatuses are
executed as by utilizing the python programming improvement unit.
For the validation and API Request work process it as parameter, sort,
Description.
password
xsd:string The password for the user.
(required)
As though the solicitation get succeeded the server gives back a validation
token. As by sending the API asks for as it incorporates the token in the x-
auth –token header. As by keep on sending the API asks for with that the
token until the occupation gets completed o