0% found this document useful (0 votes)
70 views21 pages

Assignment # 1: Performance Timeline of Flynn Taxonomy

The document discusses parallel computer architectures and Flynn's taxonomy. It describes four categories of parallel architectures based on whether the instruction and data streams are single or multiple: SISD (single instruction, single data stream), SIMD (single instruction, multiple data streams), MISD (multiple instruction, single data stream), and MIMD (multiple instruction, multiple data streams). It provides examples of computer processors that fall under each category and summarizes their characteristics such as memory type, number of function units, scheduling, and number of transistors.

Uploaded by

asdasd
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
70 views21 pages

Assignment # 1: Performance Timeline of Flynn Taxonomy

The document discusses parallel computer architectures and Flynn's taxonomy. It describes four categories of parallel architectures based on whether the instruction and data streams are single or multiple: SISD (single instruction, single data stream), SIMD (single instruction, multiple data streams), MISD (multiple instruction, single data stream), and MIMD (multiple instruction, multiple data streams). It provides examples of computer processors that fall under each category and summarizes their characteristics such as memory type, number of function units, scheduling, and number of transistors.

Uploaded by

asdasd
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 21

ASSIGNMENT # 1

Performance timeline of Flynn taxonomy

Different computer architectures have been built to exploit this inherent parallelism.
In general, a computer architecture consists of one or more interconnected
processor elements that operate concurrently, solving a single overall problem. The
various architectures can be conveniently described using the stream concept. A
stream is simply a sequence of objects or actions. There are both instruction streams
and data streams, and there are four simple combinations that describe the most
familiar parallel architectures:

1. SISD – single instruction, single data stream; the traditional uniprocessor.


2. SIMD – single instruction, multiple data stream, which includes array
processors and vector processors.
3. MISD – multiple instruction, single data stream, which are typically systolic
arrays.( Not practical, just theory)
4. MIMD – multiple instruction, multiple data stream, which includes traditional
multiprocessors as well as the newer work of networks of workstations.
SISD – single instruction, single data stream.

SIMD – single instruction, multiple data stream.

MIMD – multiple instruction, multiple data stream.

# of Scheduling # transistors
Memory function unit (SISD)/Pro (SISD)
Processor Year of
Processor Memory/ distributio (SISD) gramming / vector
Type introduc register n / processor unit paradigm length
tion (SIMD) (MIMD) (SIMD,MIMD) (MIMD) (SIMD)
SISD Intel 8086 1978 X X 1 Dynamic 29K
SIMD CDC Cyber 1981 Memory X 1 X 65535
205
SISD Intel 80286 1982 X X 1 Dynamic 134K
SIMD Cray 2 1985 Register X 5 X 64
SISD Intel 80486 1989 X X 2 Dynamic 1.2M
MIMD Intel i860 1990 X Central 4–28 Shared X
memory
SIMD Cray Y‐ 1991 Register X 16 X 64
MP/C90
SISD HP PA‐RISC 1991 X X 1 Dynamic 580K
7000
SISD MIPS 1992 X X 2 Dynamic 1.1M
R4000
MIMD MIPS 1992 X Distributed 4–64 Shared X
R3000 memory
SISD MIPS 1994 X X 6 Dynamic 3.4M
R8000

MIMD Convex 1994 X Global 1–4 Shared X


C4/XA memory

SIMD Cray T90 1995 Register X 1–32 X 128


SISD Intel 1995 X X 5 Dynamic 5.5M
Pentium
Pro

MIMD SuperSPAR 1995 X Distributed 16–2048 Message X


C passing

MIMD PA‐RISC 1995 X Global 8–128 Shared X


7200 memory

SISD AMD K5 1996 X X 6 Dynamic 4.3M


SISD Intel 1997 X X 5 Dynamic 7.5M
Pentium II

SISD AMD K6 1997 X X 7 Dynamic 8.8M

SIMD NEC SX‐5 1998 Register X 1–512 X 256

SISD AMD K7 1999 X X 9 Dynamic 22M

SISD Intel 1999 X X 5 Dynamic 28M


Pentium III

MIMD DEC 21164 2000 X Distributed 40–2176 Shared X


memory

SISD AMD 2003 X X 9 Dynamic 105M


Athlon 64
FX51

SISD Intel 2003 X X 5 Dynamic 125M


Pentium 4
Prescott
ASSIGNMENT # 2

ALGORITHM:
K-mean clustering

DESCRIPTION:
K means algorithm is an iterative algorithm that tries to partition the dataset into K
pre-defined distinct non-overlapping subgroups (clusters) where each data point
belongs to only one group. It tries to make the intra-cluster data points as similar as
possible while also keeping the clusters as different (far) as possible. It assigns data
points to a cluster such that the sum of the squared distance between the data
points and the cluster’s centroid (arithmetic mean of all the data points that belong
to that cluster) is at the minimum. The less variation we have within clusters, the
more homogeneous (similar) the data points are within the same cluster.

APPLICATION:
K-Means clustering is used in a variety of examples or business cases in real life, like:
 Academic Performance
Based on the scores, students are categorized into grades like A, B, or C. 
 Diagnostic systems
The medical profession uses k-means in creating smarter medical decision support
systems, especially in the treatment of liver ailments.
 Search engines
Clustering forms a backbone of search engines. When a search is performed, the
search results need to be grouped, and the search engines very often use clustering
to do this. 

 Wireless sensor networks


The clustering algorithm plays the role of finding the cluster heads, which collects all
the data in its respective cluster.
FORMULAS:

Different distance function maybe used such as Euclidean distance, Squared


Euclidean distance, Manhattan distance, Cosine distance.

SEQUENTIAL CODE:
(Python)

import numpy as np
from numpy.linalg import norm

class Kmeans:

def __init__(self, n_clusters, max_iter=100, random_state=123):


self.n_clusters = n_clusters
self.max_iter = max_iter
self.random_state = random_state

def initializ_centroids(self, X):


np.random.RandomState(self.random_state)
random_idx = np.random.permutation(X.shape[0])
centroids = X[random_idx[:self.n_clusters]]
return centroids

def compute_centroids(self, X, labels):


centroids = np.zeros((self.n_clusters, X.shape[1]))
for k in range(self.n_clusters):
centroids[k, :] = np.mean(X[labels == k, :], axis=0)
return centroids

def compute_distance(self, X, centroids):


distance = np.zeros((X.shape[0], self.n_clusters))
for k in range(self.n_clusters):
row_norm = norm(X - centroids[k, :], axis=1)
distance[:, k] = np.square(row_norm)
return distance

def find_closest_cluster(self, distance):


return np.argmin(distance, axis=1)

def compute_sse(self, X, labels, centroids):


distance = np.zeros(X.shape[0])
for k in range(self.n_clusters):
distance[labels == k] = norm(X[labels == k] - centroids[k], axis=1)
return np.sum(np.square(distance))

def fit(self, X):


self.centroids = self.initializ_centroids(X)
for i in range(self.max_iter):
old_centroids = self.centroids
distance = self.compute_distance(X, old_centroids)
self.labels = self.find_closest_cluster(distance)
self.centroids = self.compute_centroids(X, self.labels)
if np.all(old_centroids == self.centroids):
break
self.error = self.compute_sse(X, self.labels, self.centroids)

def predict(self, X):


distance = self.compute_distance(X, old_centroids)
return self.find_closest_cluster(distance)
ALGORITHM:

1. Randomly select |S|/K members of the set S to form K-subsets


2. While Error E is not stable:
3. Compute a means Xi, 1≤ i ≤ K for each of the K subsets.
4. Compute distance d(i,j), 1≤ i ≤ K, 1≤ j ≤ N of each vector such that d(i,j) = || Xi - vj
||
5. Choose vector members of the new K subsets to their closest distance to Xi, 1≤ i ≤
K.
The serial K-means algorithm has time complexity O(RsKN) where K is the number of
desired clusters and Rs is the number of iterations.

PRAM MODEL:
MPI IMPLEMENTATION:
(Algorithm)

1: MPI_Init// start the procedure


2: Read N objects from the file
3: Partition N data objects evenly among all processes, and assume that each process
has N’ data objects
4: For each process, install steps 5-11
5: Randomly select K points as the initial cluster centroids, denoted as μk (1≤k≤K)
6: Calculate J in (1), denoted as J’
7: Assign each object n (1 ≤ n ≤ N) to the closest cluster
8: Calculate the new centroid of each cluster μk in (2)
9: Recalculate J in (1)
10: Repeat steps 6-9 until J’- J < threshold
11: Generate the cluster id for each data object
12: Generate new cluster centroids according to the clustering results of all
processes at the end of each iteration
13: Generate a final centroid set Centroid by Function Merge and output the
clustering results: K centroids
14:MPI_Finalize// finish the procedure

CODE:

#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
#include <assert.h>

// Creates an array of random floats. Each number has a value from 0 - 1


float* create_rand_nums(const int num_elements) {
float *rand_nums = (float *)malloc(sizeof(float) * num_elements);
assert(rand_nums != NULL);
for (int i = 0; i < num_elements; i++) {
rand_nums[i] = (rand() / (float)RAND_MAX);
}
return rand_nums;
}

// Distance**2 between d-vectors pointed to by v1, v2.


float distance2(const float *v1, const float *v2, const int d) {
float dist = 0.0;
for (int i=0; i<d; i++) {
float diff = v1[i] - v2[i];
dist += diff * diff;
}
return dist;
}

// Assign a site to the correct cluster by computing its distances to


// each cluster centroid.
int assign_site(const float* site, float* centroids,
const int k, const int d) {
int best_cluster = 0;
float best_dist = distance2(site, centroids, d);
float* centroid = centroids + d;
for (int c = 1; c < k; c++, centroid += d) {
float dist = distance2(site, centroid, d);
if (dist < best_dist) {
best_cluster = c;
best_dist = dist;
}
}
return best_cluster;
}

// Add a site (vector) into a sum of sites (vector).


void add_site(const float * site, float * sum, const int d) {
for (int i=0; i<d; i++) {
sum[i] += site[i];
}
}

// Print the centroids one per line.


void print_centroids(float * centroids, const int k, const int d) {
float *p = centroids;
printf("Centroids:\n");
for (int i = 0; i<k; i++) {
for (int j = 0; j<d; j++, p++) {
printf("%f ", *p);
}
printf("\n");
}
}

int main(int argc, char** argv) {


if (argc != 4) {
fprintf(stderr,
"Usage: kmeans num_sites_per_proc num_means num_dimensions\n");
exit(1);
}

// Get stuff from command line:


// number of sites per processor.
// number of processors comes from mpirun command line. -n
int sites_per_proc = atoi(argv[1]);
int k = atoi(argv[2]); // number of clusters.
int d = atoi(argv[3]); // dimension of data.
// Seed the random number generator to get different results each time
// srand(time(NULL));
// No, we'd like the same results.
srand(31359);

// Initial MPI and find process rank and number of processes.


MPI_Init(NULL, NULL);
int rank, nprocs;
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &nprocs);

// Data structures in all processes.


// The sites assigned to this process.
float* sites;
assert(sites = malloc(sites_per_proc * d * sizeof(float)));
// The sum of sites assigned to each cluster by this process.
// k vectors of d elements.
float* sums;
assert(sums = malloc(k * d * sizeof(float)));
// The number of sites assigned to each cluster by this process. k integers.
int* counts;
assert(counts = malloc(k * sizeof(int)));
// The current centroids against which sites are being compared.
// These are shipped to the process by the root process.
float* centroids;
assert(centroids = malloc(k * d * sizeof(float)));
// The cluster assignments for each site.
int* labels;
assert(labels = malloc(sites_per_proc * sizeof(int)));

// Data structures maintained only in root process.


// All the sites for all the processes.
// site_per_proc * nprocs vectors of d floats.
float* all_sites = NULL;
// Sum of sites assigned to each cluster by all processes.
float* grand_sums = NULL;
// Number of sites assigned to each cluster by all processes.
int* grand_counts = NULL;
// Result of program: a cluster label for each site.
int* all_labels;
if (rank == 0) {
all_sites = create_rand_nums(d * sites_per_proc * nprocs);
// Take the first k sites as the initial cluster centroids.
for (int i = 0; i < k * d; i++) {
centroids[i] = all_sites[i];
}
print_centroids(centroids, k, d);
assert(grand_sums = malloc(k * d * sizeof(float)));
assert(grand_counts = malloc(k * sizeof(int));
assert(all_labels = malloc(nprocs * sites_per_proc * sizeof(int)));
}

// Root sends each process its share of sites.


MPI_Scatter(all_sites,d*sites_per_proc, MPI_FLOAT, sites,
d*sites_per_proc, MPI_FLOAT, 0, MPI_COMM_WORLD);

float norm = 1.0; // Will tell us if centroids have moved.

while (norm > 0.00001) { // While they've moved...

// Broadcast the current cluster centroids to all processes.


MPI_Bcast(centroids, k*d, MPI_FLOAT,0, MPI_COMM_WORLD);

// Each process reinitializes its cluster accumulators.


for (int i = 0; i < k*d; i++) sums[i] = 0.0;
for (int i = 0; i < k; i++) counts[i] = 0;

// Find the closest centroid to each site and assign to cluster.


float* site = sites;
for (int i = 0; i < sites_per_proc; i++, site += d) {
int cluster = assign_site(site, centroids, k, d);
// Record the assignment of the site to the cluster.
counts[cluster]++;
add_site(site, &sums[cluster*d], d);
}

// Gather and sum at root all cluster sums for individual processes.
MPI_Reduce(sums, grand_sums, k * d, MPI_FLOAT, MPI_SUM, 0,
MPI_COMM_WORLD);
MPI_Reduce(counts, grand_counts, k, MPI_INT, MPI_SUM, 0,
MPI_COMM_WORLD);

if (rank == 0) {
// Root process computes new centroids by dividing sums per cluster
// by count per cluster.
for (int i = 0; i<k; i++) {
for (int j = 0; j<d; j++) {
int dij = d*i + j;
grand_sums[dij] /= grand_counts[i];
}
}
// Have the centroids changed much?
norm = distance2(grand_sums, centroids, d*k);
printf("norm: %f\n",norm);
// Copy new centroids from grand_sums into centroids.
for (int i=0; i<k*d; i++) {
centroids[i] = grand_sums[i];
}
print_centroids(centroids,k,d);
}
// Broadcast the norm. All processes will use this in the loop test.
MPI_Bcast(&norm, 1, MPI_FLOAT, 0, MPI_COMM_WORLD);
}

// Now centroids are fixed, so compute a final label for each site.
float* site = sites;
for (int i = 0; i < sites_per_proc; i++, site += d) {
labels[i] = assign_site(site, centroids, k, d);
}

// Gather all labels into root process.


MPI_Gather(labels, sites_per_proc, MPI_INT,
all_labels, sites_per_proc, MPI_INT, 0, MPI_COMM_WORLD);

// Root can print out all sites and labels.


if ((rank == 0) && 1) {
float* site = all_sites;
for (int i = 0;
i < nprocs * sites_per_proc;
i++, site += d) {
for (int j = 0; j < d; j++) printf("%f ", site[j]);
printf("%4d\n", all_labels[i]);
}
}

MPI_Finalize();

ASSIGNMENT # 3
Transparency:
The distributed systems should be perceived as a single entity by the users or the
application programmers rather than as a collection of autonomous systems, which
are cooperating. The users should be unaware of where the services are located and
also the transferring from a local machine to a remote one should also be
transparent.
The following are the different transparencies encountered in the distributed system.
1. Access Transparency:
Clients should be unaware of the distribution of the files. The files could be
present on a totally different set of servers which are physically distant apart and
a single set of operations should be provided to access these remote as well as
the local files. Applications written for the local file should be able to be executed
even for the remote files. The examples illustrating this property are the File
system in Network File System (NFS), SQL queries, and Navigation of the web.
2. Location Transparency:
Clients should see a uniform file name space Files or groups of files may be
relocated without changing their pathnames. A location transparent name
contains no information about the named object’s physical location. This property
is important to support the movement of the resources and the availability of
services. The location and access transparencies together are sometimes referred
as Network transparency. The examples are File system in NFS and the pages of
the web.
3. Concurrency Transparency:
Users and Applications should be able to access shared data or objects without
interference between each other. This requires very complex mechanisms in a
distributed system, since there exists true concurrency rather than the simulated
concurrency of a central system. The shared objects are accessed simultaneously.
The concurrency control and its implementation is a hard task. The examples are
NFS, Automatic Teller machine (ATM) network.
4. Replication Transparency:
This kind of transparency should be mainly incorporated for the distributed file
systems, which replicate the data at two or more sites for more reliability. The
client generally should not be aware that a replicated copy of the data exists. The
clients should also expect operations to return only one set of values. The
examples are Distributed DBMS and Mirroring of Web pages.
5. Failure Transparency:
Enables the concealment of faults, allowing user and application programs to
complete their tasks despite the failure of hardware or software components.
Fault tolerance is provided by the mechanisms that relate to access transparency.
The distributed system are more prone to failures as any of the component may
fail which may lead to degraded service or the total absence of that service. As
the intricacies are hidden the distinction between a failed and a slow running
process is difficult. Examples are Database Management System.
6. Migration Transparency:
This transparency allows the user to be unaware of the movement of information
or processes within a system without affecting the operations of the users and
the applications that are running. This mechanism allows for the load balancing of
any particular client, which might be overloaded. The systems that implement
this transparency are NFS and Web pages.
7. Performance Transparency:
Allows the system to be reconfigured to improve the performance as the load
varies.
8. Scaling Transparency:
A system should be able to grow without affecting application algorithms.
Graceful growth and evolution is an important requirement for most enterprises.
A system should also be capable of scaling down to small environments where
required, and be space and/or time efficient as required. The best-distributed
system example implementing this transparency is the World Wide Web.
9. Revision transparency:
Vertical growth of the system, this transparency refers to the software revisions
which are not visible to the users.
10. Parallelism transparency:
Parallel activities without users knowing how, when and where they are taking
place.

Clock Synchronization:

Name of Type of Approach Scalabi Reasons for Fault Limitation


Parameter Algorithm lity Implementation Tolera
& nce
Name of
Algorithm
Cristian’s Centralized Passive Time Poor To minimize Not 1. Single time server
Algorithm Server and based propagation time might be fail.
on External clock (in milliseconds) 2. Faulty Time server
synchronization cause server replied
approach with incorrect time.

Berkeley Centralized Active Time Poor To minimize the Not 1. Server becomes
Algorithm Server and based maximum bottleneck.
on Internal clock difference between
synchronization any two clock (in
approach milliseconds)

Global Distributed No such time Poor To resolve single Not 1. Network should
Averaging server and based point of failure and support broadcast
Algorithm on internal clock to minimize skew facility.
value (in 2. Congestion may
milliseconds) occur due to large
amount of message
passing.
Network Distributed External Clock is Good To minimize Yes 1. Supports in UNIX
Time used as reference propagation time system only
Protocol time server. Based (in milliseconds)
on Multiple time and faster access of
server arranged in correct time value.
levels.

Precision Centralized Master-Slave Good More accuracy then Yes 1. Network should
Time approach where NTP by using GPS support multicasting.
Protocol Master is receiver, order of 2. Intended for
controlled by GPS timing in (in relatively localized
receiver microseconds) system.

Two phase commit protocol:


This protocol has two phases that is why it is called as Two-phase commit protocol.
 The first phase is called as “PREPARE”.
 The second phase is called as “COMMIT”.
Two units were used coordinator and participator (or worker).
 Coordinator: who initiate the transactions
 Participator: which participates in these transactions.
Phase1: The coordinator asks from each participator whether they have
successfully completed their responsibilities for that transaction and are ready to
commit. Each participator responds ‘yes |OK’ or ‘no |abort’.
 Every participator writes its data records in a log. If it is unsuccessful to do,
then it responds with a failure message; if it is successful, then it sends an OK
message.
Phase2: This phase will start when coordinator receive successful response in
phase 1 from all participator.
 Now coordinator sends a commit request to all the participators.
 Now participators write the commit as part of its log record.
 Participants send a message that its commit has been successfully
implemented.
 If a server fails, the coordinator sends instructions to all servers to roll back
the transaction
This mechanism ensures the ACID transactions guarantee
Atomicity: either the entire transaction will be reflected in the final state of the
system or none of it.
Consistency: Either Transaction is successful or its rollback.
Isolation: Concurrent transactions do not interfere with each other
Durability: After successful completion of a transaction all changes made by the
transaction persist even in the case of a system failure.

Three phase commit protocol:


The 3PC defines the following states:
 initial (the 3PC processing is starting),
 waiting (participant is available to commit , it received canCommi’ message
from the coordinator),
 pre-commit (participant is ready to commit and received preCommit message
from the coordinator),
 committed (participant is committed, it was commanded by coordinator to
commit),
 aborted (participant is aborted, it was commanded by coordinator to abort)
 
It also has the following phases:
 Phase 0: The start of the processing is the same as for 2PC. The business logic
can send a message to the JMS queue and or insert a record to the database
table. Each participant starts its local (resource-local) transaction which is
enlisted to the global managed by coordinator.
 Phase 1: The three phase commit (3PC) starts when application logic says to
commit the work. Now the coordinator commands participants to switch to
state waiting by sending canCommit message and changes its own state
from initial to waiting. Here we can expect participants lock their resources
(as happens in case of prepare command in 2PC).
 Phase 2: Participants acknowledged that are available to commit and
coordinator sends preCommit message to both of them. Coordinator moves
to preCommit state.
 Phase 3: Coordinator collects the acks on preCommit message and it
commands participants to commit. Participants are committed and resources
are released.
 
The diagram above visually points to the fact why the new phase was added. The
final transition from the waiting state is to abort which is in opposite to the final
transition of the pre-commit state which is default to commit. There is no state which
would contain transition to both final states - to commit and to abort simultaneously.
The structure of the protocol ensures that neither the participants nor the
coordinator are further than one state transition from each other. That permits
participants to decide - when the coordinator is not available - the final state of the
transaction. When any participant is in the pre-commit state the transactions is
about to commit - participants can be sure that coordinator decided to commit
before. When there is no participant in pre-commit state the transaction is about to
abort - participants know it’s possible that coordinator decided to abort.

 Dealing with failures: 


The failure of participant is observed by time outing the coordinator while waiting
the participant response.
 If timeout occurs for waiting state we know that there are some participants
in initial state or/and waiting state. Coordinator commands to abort.
 If timeout occurs for pre-commit state we know that there are some
participants in waiting state or/and in pre-commit state. Coordinator
commands to abort.
 If there is a participant which does not receive the abort message we are fine
as upon the recovery, the participant decides depending on the state of other
participants.

The failure of coordinator is observed by time outing the processing while participant
waits to be commended for an action. The behavior of the participant is the same
in waiting and pre-commit states. If timeout occurs there is need to find a new
coordinator which verifies what the state of the participants is. The action of the new
coordinator is depicted on the diagram as the failure step.

 The new coordinator is marked being in waiting state when participants are


in waiting or aborted state. Then the new coordinator commands to abort.
 The new coordinator is marked being in pre-commit state when other
participants are in waiting, pre-commit or commit states. Then the new
coordinator commands to commit.
Coordinator can’t forget about transaction till all participants acknowledged that
they processed the required action. We can depict it as a new forgotten state
connected from commit and aborted states. Only when all acknowledgements are
received then the information about transaction existence could be removed from
the coordinator log store. The participant upon recovery can’t decide to commit a
transaction even if it’s in the pre-commit state. That’s because coordinator could
decide to abort the whole transaction when participant changed to pre-commit state
as the only one (other participants are still in waiting state) before it was capable to
send acknowledgment back to the coordinator. Coordinator then commands all
participants to abort. Thus in this case, the participant must ask the other
participants to find out the state of the transaction.

You might also like