Machine Learning & Grid Technology
Machine Learning & Grid Technology
1. The entropy function relative to a Boolean classification, as the proportion, P+ , of positive example varies between 0.0 and 1.0 .
Give the graphical representation for the entropy of P+. Also give your interpretation of the entropy values 0.0, 0.5 and 0.8.
Interpretation
If the collection had 50% +'s and 50% -'s (i.e., entropy = 1.0), the target attribute does not give any more information than our
guesses.
If the collection had 70% +'s and 30% -'s (i.e., entropy = 0.88), the target attribute gives more information than our guesses.
2. Suggest a suitable data type for each of the following attributes. Justify your answer. Attributes= {Humidity, exam mark, income,
acidity, blood pressure}
.exam mark;;;numerical......income,,,numerical...blood pressure,,,ordinal,,,,,humidity....ordinal,,,,,acidity....ordinal
3. Neural network is a technique used in classification in machine learning, give a clear definition and graphical representation to
neural network. Suggest a possible problem as a candidate to be solved by a neural network. Give your justification.
Neural Network
In information technology, a neural network is a system of programs and data structures that approximates the
operation of the human brain. A neural network usually involves a large number of processors operating in parallel,
each with its own small sphere of knowledge and access to data in its local memory
There are many different problems that can be solved with a neural network. However, neural networks are commonly used to
address particular types of problems. The following four types of problem are frequently solved with neural networks:
Classification
Prediction
Pattern recognition
Optimization
4. Demonstrate the process of developing a decision tree for classification (using Weka package). Outline the process from coding
the dataset to generating the final decision tree.
Create CSV file,
First line will be the column heading
Rest of the lines contain the column values
Open weka
Select explorer
Open file
Select the file
Go to classified tab
Press choose under classifier
Then select trees and J48
And then press start button
5. Convert the following tree to a logical expression using the disjunction and conjunction operation.
(Outlook=Sunny Ù Humidity=Normal)
Ú (Outlook=Overcast)
Ú (Outlook=Rain Ù Wind=Weak)
-------------------------------------------------------------------------------------------------
Grid Technology
2. What are the strengths and weaknesses of Web Services? Discuss briefly, and give at least two items each. Explain how
Service Description works in Web Services architecture.
Strengths:
– Web Services are platform and language independent since they use standard XML for data transfer.
– Web Services are firewalls friendly since they HTTP for transmitting messages (data).
Weaknesses:
– Overhead: Transmitting data all in XML is not efficient, but makes Web Services more portable than other technologies.
However, what is lost in performance is gained in portability.
– Lack of versatility: For example Web Services don’t offer services such as persistency, notification, etc… But WSRF (Web
Services Resource Framework) help WS to make them more versatile.
Once the required Web Service is located (by the service processes-discovery) the service ‘describe itself’ and telling what operations are
supported and how to invoke them. This process is handled by WSDL (Web Services Description Language).
3. Give one advantage and one disadvantage of Grid computing over a multiprocessor supercomputer. What could improve
the speedup factor of a multiprocessor computer?
Advantages:
– In Grid computing, each node can be computer hardware, which when combined can produce similar computing resources to a
multiprocessor supercomputer, but at lower cost.
Disadvantages:
– Various processors and local storage areas do not have high-speed connections, but is well-suited for applications where there
is no need for processors to communicate intermediate results between them.
The speedup factor of a multiprocessor computer can be improved by reducing the serial and inter-communication overhead between
processors.
4. Describe how TLS/SSL server authentication and TLS/SSL client authentication work.
TLS/SSL server authentication:
TLS/SSL enabled client software can use standard techniques of public-key cryptography to check that a server's certificate and public ID
are valid and have been issued by a certificate authority (CA) listed in the client's list of trusted CAs.
This confirmation might be important if the user, for example, is sending a credit card number over the network and wants to check the
receiving server's identity.
TLS/SSL client authentication: TLS/SSL client authentication allows a server to confirm a user's identity.
Using the same techniques as those used for server authentication, TLS/SSL enabled server software can check that a client's certificate
and public ID are valid and have been issued by a certificate authority (CA) listed in the server's list of trusted CAs.
5. Comparison of distributed computing, grid computing, parallel computing and conventional supercomputers, describe briefly
“Distributed” or “grid” computing in general is a special type of parallel computing that relies on complete computers (with onboard
CPUs, storage, power supplies, network interfaces, etc.) connected to a network (private, public or the Internet) by a conventional
network interface, such as Ethernet. This is in contrast to the traditional notion of a supercomputer, which has many processors
connected by a local high-speed computer bus.
The primary advantage of distributed computing is that each node can be purchased as commodity hardware, which, when
combined, can produce a similar computing resource as multiprocessor supercomputer, but at a lower cost. This is due to the
economies of scale of producing commodity hardware, compared to the lower efficiency of designing and constructing a small
number of custom supercomputers. The primary performance disadvantage is that the various processors and local storage
areas do not have high-speed connections. This arrangement is thus well-suited to applications in which multiple parallel
computations can take place independently, without the need to communicate intermediate results between processors. The
high-end scalability of geographically dispersed grids is generally favorable, due to the low need for connectivity between
nodes relative to the capacity of the public Internet.
There are also some differences in programming and deployment. It can be costly and difficult to write programs that can run in
the environment of a supercomputer, which may have a custom operating system, or require the program to address
concurrency issues. If a problem can be adequately parallelized, a “thin” layer of “grid” infrastructure can allow conventional,
standalone programs, given a different part of the same problem, to run on multiple machines. This makes it possible to write
and debug on a single conventional machine, and eliminates complications due to multiple instances of the same program
running in the same shared memory and storage space at the same time