0% found this document useful (0 votes)

20 views55 pages

Topic 6A 2024

Uploaded by

abbastayyaba417

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views55 pages

Topic 6A 2024

Uploaded by

abbastayyaba417

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 55

Information Technology

FIT3143 Parallel Computing

Semester 2 2024

Topic 6A:
Clusters and Cluster Performance
Dr Carlo Kopp, MACM, SMIEEE, AFAIAA
Faculty of Information Technology
© 2011 - 2024 Monash University
Why Study Clusters and Their Performance?
 Clusters are the most common “building block” used to form parallel (MIMD)
systems, therefore they must be understood;
 Most Grids are formed by aggregating clusters using middleware, Clouds are
usually clusters running Cloud hypervisors
 Foundation knowledge: Because the performance of clusters determines many
critical aspects of parallel system performance, understanding clusters is
essential;
 Foundation knowledge: The limitations of a cluster can limit what a parallel
application can or cannot do;
 Practical skills: When coding parallel applications you will have to run the code
on a cluster, or a cluster in a grid/cloud;
 Practical skills: You may have to benchmark an application on a cluster, so
understanding cluster performance matters

2
Clusters vs. Grids vs. Clouds
Clusters
 Clusters emerged during the 1990s as an alternative to traditional
“supercomputers”, which were usually optimised for vector processing, and
architected to exploit Instruction Level Parallelism;
 A “cluster” is a term applied to a group of general purpose processors,
connected by a high speed “fabric” of links, that are running software to execute
usually large parallel processing jobs;
 Parallelism in clusters occurs at the level of the process, unlike traditional
supercomputers;
 In principle, the limits to the number of cores in a cluster are determined by the
performance limits of the “fabric” interconnecting the machines forming the
cluster;
 Most contemporary supercomputers are built as clusters.

4
Clusters vs. Grids vs. Clouds (I)
 The term “Cloud” is now very widely employed to describe cloud computing
environments, but also any large distributed computing system, even if it is not
running genuine “Cloud” middleware and a runtime environment!
 Clusters are typically confined to one computer room, and run middleware and
programming environments optimised for parallel jobs, especially
“supercomputing” tasks;
 A “Grid” is usually formed by aggregating multiple clusters over a Wide Area
Network (WAN), to increase aggregate performance, using “Grid Middleware”;
 A “Cloud” like a “Grid” may aggregate vast numbers of cores over multiple sites,
but “Cloud middleware” is usually built to support a disparate mix of different
users, and provide “elastic” allocation of computing resources.

5
Clusters vs. Grids vs. Clouds (II)
 A cluster or multiple clusters may be running cluster middleware, or Grid
middleware, or Cloud middleware;
 In some instances, such middleware may be run concurrently, such as
cluster that is used for local jobs but also participates in a Grid;
 An ongoing problem with distributed computing is the imprecision of
language and labels used to describe systems, especially in industry;
 The best indication of what category a distributed system falls under is the
type of middleware and programming environment being run;
 In current usage, the term “cluster” is often only used to describe the
hardware, reflecting the fact that middleware may support various models;

6
Clusters - Integration
 The simplest way to form a cluster is to interconnect a large number of racked
general purpose processors, using a high speed network;
 The principal challenge is in providing a way of managing jobs, distributing the
computing load across the cores, and providing seamless IPC between
processes in jobs;
 These tasks are typically performed by “middleware”, and in many instances,
“runtime environment” software for managing jobs, down to individual
processes;
 In constructing clusters, which are the basic building block in Grids and clouds,
the performance of the interconnecting fabric is critically important;
 Two parameters are of interest, these are latency and bandwidth in
interconnecting processes running on cores.

7
Hardware and Fabrics
TIK Experimental Cluster “Scylla” – ETH Zurich, Switzerland
Ethernet Switches
22 x Athlon
Commodity PCs
Debian Linux

http://www.tik.ee.ethz.ch/~ddosvax/cluster/

9
Juno Linux Cluster - Lawrence Livermore National Laboratory, USA
https://computing.llnl.gov/tutorials/linux_clusters/
Infiniband Fabric
1,152 x Quad Core
Opteron Nodes

10
Clusters - Fabrics
 Clusters need interconnects to carry traffic between the machines or “nodes” forming the
cluster
 The general term for such an interconnect is “fabric”
 Fabrics are typically defined by the type of interconnect used and the type of topology used
 While variants of Ethernet remain the most common interconnect used in clusters,
especially small/cheap clusters, higher performing clusters use faster interconnects (below)
 Topologies vary widely and reflect operator priorities – tradeoffs between performance and
reliability (redundancy)
 Considerations are path length, latency, fault tolerance, scalability and cost (Chkirbene et
al, 2018)
 Pathlength and latency impact performance, fault tolerance impacts reliability, and cost
reflects complexity

11
Comparing Topologies (Chkirbene et al, 2018)
 Degree of nodes: the number of ports per server in the data centre. “Flat”
topologies have degree = 2, while “Recursive” topologies have degree > 2
 Scalability: number of nodes in the topology, good scalability allows incremental
increases to large numbers
 Diameter: defined as the maximum of the shortest distances between all pairs
of nodes
 Fault tolerance: continues to operate even in the presence of component
failures
 Average Path Length: the APL shows the efficiency of the routing algorithms
employed in packet transmission
 Bandwidth: data transfer rate or throughput from one node to another. One-to-
One, One-to-All, One-to-Several and All-to-All bandwidth often employed

12
Comparing Topologies (Chkirbene et al, 2018)

13
Cluster Fabric Topology Example – Fat Tree Model

Note core layer routers will be higher performing compared to aggregation and edge layer switches

14
Cluster Fabric Topology Example – HyperFlatNet

15
Cluster Fabric Topology Example – VacoNet

16
Cluster Fabric Topology Performance – Average Path Length

Lower APL is better due to lower cumulative queuing delay across routers/switches - Chkirbene et al, 2018

17
Cluster Design and Implementation
Cluster Design
 Earliest clusters formed by stacking PCs or Unix workstations on
benches, and later 19 inch racks, and interconnecting them with 10
Mbits/s 802.3 Ethernet (e.g. Monash PPME);
 Interconnect performance was soon found to be important for computing
tasks performing a lot of IPC, or demanding a lot of bandwidth;
 The general trend since then has been to use commodity Ethernet and
Ethernet switches for small / cheap / low performance clusters, and much
more expensive and elaborate interconnects for large / expensive / high
performance clusters;
 Clusters are usually used as building blocks in larger data centres –
hardware configuration reflecting performance demands of the
application;

19
Cluster Building Blocks
 Computational Nodes:
– Individual computers, using multiple core CPUs;
– Multiple CPU, multiple core systems;
– Server “blades” sharing backplanes in racks;
 Fabrics:
– Low performance commodity Ethernet switches and routers;
– High performance 2.5/5/10G Ethernet switches and routers;
– High performance interconnect Infiniband, ePCIe switches and routers;
 Middleware:
– Load balancing/sharing middleware for clusters;
– Parametric computing or other cluster / grid middlware;
– Cloud middleware to run virtual machines

20
“Economy” Cluster Interconnects
 Initially 10 and later 100 Mbit/s “Fast Ethernet”;
 Currently “Gigabit Ethernet” (GbE) mostly used, with commodity “Gigabit
Ethernet” switches;
 Gigabit Ethernet over TCP/IP doesn’t substantially reduce latency in comparison
with “Fast Ethernet”; this is due to protocol processing delays in hosts which
remain CPU bound;
 Network interconnects are queuing systems and behave accordingly;
 Emerging “economy” interconnects are based on:
A. 10GBASE-T, 5GBASE-T and 2.5GBASE-T Ethernet adaptors and switches
intended for data centres and end user premises
B. External PCIe (Express) switches, similar in performance to Thunderbolt, also
based on PCIe – BSD Socket API via PCIe device driver or TCP/IP over PCIe;

21
10GBASE-T, 5GBASE-T and 2.5GBASE-T Ethernet
 10 Gigabit Ethernet first defined in 2002 for optical fibre links
 Later revisions (IEEE 802.3-2018) support optical fibre, copper
twisted pair and twin-axial cable, backplanes and printed circuit
boards
 Used primarily as a fabric in data centres to interconnect servers /
hosts in clusters and clouds
 More expensive than 1GBASE-T but cheaper than Infiniband
 5GBASE-T and 2.5GBASE-T (802.3bz) used reduced data rate
10GBASE-T waveform with cheap Cat-5/6 twisted pair Ethernet
cables

22
10GBASE-T Ethernet Hardware
Intel X540-T2, dual port 10 Gigabit Ethernet NIC, PCIe 2.1 x8 card. Two 8P8C (RJ45) connectors, 10GBASE-T standard
(Nosachev – Wikimedia)

Cat 6 Ethernet 8P8C (RJ45) Twisted Pair Cable

(Raysonho - Wikimedia)

23
External PCIe (PCI Express) Hardware
Cronologic Ndigo External PCIe Expander

External PCIe 24 x Switch

[https://www.serialcables.com/product-category/pcie-gen3-switch/]

External PCIe Cables [http://www.onestopsystems.com]

24
External PCIe (PCI Express) Software

Dolphin Software eXpressWare suite – TCP/IP over ePCIe / Proprietary SuperSockets

[https://www.dolphinics.com/products/dolphin_pci_express_software.html]

25
“Performance” Cluster Interconnects
 cLAN VIA implementation. Very low latencies of 0.5µsec and high
bandwidth – now obsoleted;
 QsNet from Quadrics, similar to Myrinet, latency around 5µsec – now
obsoleted;
 ANSI/VITA 26-1998 Myrinet - simple low-latency switches, 640 Mbps
to 10 Gbps – now obsoleted;
 Infiniband – high performance mesh providing typically 56 – 100 Gbps;
 Infiniband and Gigabit Ethernet variants are now dominant in large
commercial and scientific supercomputing clusters;

26
InfiniBand Fabric

Mellanox - http://www.mellanox.com/pdf/whitepapers/IB_Intro_WP_190.pdf

27
InfiniBand Hardware

Mellanox InfiniBand 1U Switch, 12 QSFP+ ports

Mellanox ConnectX®-4 Single/Dual-Port Adapter -100Gb/s 16 x PCI3

28
Cluster Performance
Benchmarking the Cluster
 How do we know how fast the cluster runs?
 Measurement of performance is an important factor while selecting a
system design and architecture
 Numerous methods exist for benchmarking a cluster – none are ideal
 Important that the technique used provides a representative compute
load compared to the intended application;
 Workloads can be homogenous – e.g. parallel identical jobs – clusters,
grids;
 Workloads can be inhomogeneous/hetergenous – parallel but very
different jobs – clouds;

30
Network Latency and Bandwidth
 Latency is easy to determine by measurement of elapsed time using common
clocks;
 Bandwidth (throughput, capacity) is easy to determine by benchmarking;
 Results must be used with caution, as time variant network conditions dictate
latency and usable bandwidth at any time;
 Latency and bandwidth only characterise the network and not the
computational aspects of the system;
 Measurements usually done on lightly loaded systems to determine best
achievable performance;
 Many tools available for latency and bandwidth measurements
 Some lab tasks will provide opportunity to do measurements

31
Computational Performance
 How much computational effort a cluster can do determines its
economic viability and usefulness
 Performance is a measure of computational effort done per time
under some set of conditions
 Computational effort is the quantity or number of computational
operations e.g. machine instructions performed
 Performance metrics are usually highly sensitive to the
conditions of measurement – e.g. type of workload, application
program, operating system, and configuration of the hardware
platform

32
Computational Benchmarking
 The performance of a computer is usually measured by benchmarking
 Benchmarking can be done by running a specific application and measuring
how long it takes to run and what compute resources are consumed
 Benchmarking can also be done by running a dedicated benchmark
applications or benchmark suites
 In all benchmarking we seek “like comparisons”, as in “comparing apples to
apples, rather than apples to oranges”
 The intent is always to perform measurements of performance that permit
honest side by side comparisons between systems
 Vendors have a long running history of manipulating benchmarks to favour
their products – caveat emptor applies!

33
Performance Metrics – FLOPS/MOPS/MIPS
 Maximum aggregate performance of the system can be measured in terms of
Maximum aggregate floating-point operations:
P = N*C*F*R
 Where:
 P performance in FLOPS, MFLOPS, GFLOPS,
 N number of nodes,
 C number of CPUs (CPU cores),
 F floating point operations per clock period,
 R clock rate.
 The other measure is for integer operations – using MOPS/MIPS where P is
expressed in integer Mega- “Operations” or “Instructions”

34
Application Performance
 Number of operations performed while executing the application,
divided by the total runtime, M/GFLOPS or M/GOPS.
 Computed using a program similar to the actual program the user
intends to run on the production system.
 More meaningful than theoretical peak performance.
 Need to correctly estimate the number of floating point (or integer)
operations in the code.
 The algorithm must be optimised.
 Problems may arise if the code was tuned for a particular platform;
specifically where code includes features written around machine
specific performance accelerators.

35
Application Runtime Comparison
 The total “wall-clock” run time for an existing application and
dataset.
 It frees us from counting the operations in the code – simply
compare time to compute
 Also it removes the need to develop benchmarking code which
may differ from the intended application.
 Performance tuning could distort results
 The same application must be run on all systems being
benchmarked against one another under identical conditions

36
Scalability How Well Does It Scale?
 One measure of Scalability is computed thus:
S = T(1) / T(N)
 Where T(1) is the wall clock time for a program to run on a single
processor
 T(N) is the runtime over N processors
 A scalability figure close to N ie S≅ N means the program scales
well
 Scalability metric helps estimate the optimal number of processors
for an application
 Amdahl’s Law and other models can be used to estimate scalability

37
Efficiency
 It is calculated thus:

E = P(N)/N where P is performance

 Values close to unity or 100% are ideally sought

 This metric suffers from the same problems as the Scalability measure

38
Percentage of Peak
 Application performance statistics are gathered in terms of
the percentage of the theoretical peak performance
 A real application is run and results compared to a theoretical
estimate
 Such statistics highlight the extent an application is making
use of the computational power of the system
 Depends on the type of the application and statistical mix of
executed instructions

39
System Utilisation
 System level effects include:
A. Competition between tasks executing on the system
B. I/O contention
C. Memory swapping
D. Job Scheduler inefficiencies
E. Job start-up delays
 A system can be assessed on its long-term throughput through these
statistics
 Statistics easily collected using sar, vmstat, netstat, iostat
or other tools.

40
MPI Ping-Pong Test
 A widely used measure in clusters
 Tests the aggregated bandwidth and latency of the interprocessor
communication network
 The API will be written in C and assumes the MPI libraries have
been installed
 Reading: H. Kamal, B. Penoff and A. Wagner, “SCTP versus TCP
for MPI,” SC '05: Proceedings of the 2005 ACM/IEEE Conference
on Supercomputing, Seattle, WA, USA, 2005, pp. 30-30. doi:
10.1109/SC.2005.63

41
MPI Ping-Pong Test (Kamal et al 2005)

Abstract: SCTP (Stream Control Transmission

Protocol) is a recently standardized transport
level protocol with several features that better
support the communication requirements of
parallel applications; these features are not
present in traditional TCP (Transmission
Control Protocol). These features make SCTP
a good candidate as a transport level protocol
for MPI (Message Passing Interface). MPI is a
message passing middleware that is widely
used to parallelize scientific and compute
intensive applications. TCP is often used as
the transport protocol for MPI in both local
area and wide area networks. Prior to this
work, SCTP has not been used for MPI. We
compared and evaluated the benefits of using
SCTP instead of TCP as the underlying
transport protocol for MPI. …..

42
LINPACK Benchmark
 High Performance LINPACK (HPL) Benchmark is widely used
within clusters.
 These benchmarks execute the LINPACK codes available on
Netlib.
 These benchmarks can overestimate performance.
 LINPACK is a library of functions written to solve linear
equations and linear least-squares problems;
 LINPACK was written in FORTRAN; widely used in many
applications.

43
Measuring and Monitoring Performance
Measuring and Monitoring Performance
 There are many available tools for real time monitoring and / or
logging of machine performance;
 Unix and unix-like systems have provided the text based vmstat
(BSD) and sar (SVR4) utilities since the 1980s;
 The user must analyse and interpret text based logs of data that can
be challenging;
 More recent ncurses based top and htop provide realtime text based
monitoring;
 Very recent netdata tool provides browser based realtime graphical
representation;

45
Output from vmstat tool

46
Output from top tool

47
Output from htop tool

48
Output from netdata tool

49
Output from netdata tool

50
Observations
 Different applications require different runtime environments, operating
systems, libraries, parallel or distributed computing middleware, and hardware
interconnection topologies;
 There are no panacea solutions – Grids, Clouds, conventional Clusters, HPC
“supercomputers” all perform best for applications which fit their unique
characteristics;
 Coarse assessments of performance may be unrealistically optimistic – or
pessimistic - for many applications;
 Benchmarks can be highly accurate, but only for applications which are very
close in behaviour to the benchmark;
 The best performance benchmark is the intended application itself,
using a representative dataset or parameters.

51
Summary
Summary
 Clusters vs. Grids vs. Clouds
 Hardware and Fabrics
 Cluster Design and Implementation
 Cluster Performance
 Measuring and Monitoring Performance

53
Reading Materials
References / Reading
 http://users.monash.edu/~ckopp/SYSTEMS/Vector-CPU-0600.htm
 http://users.monash.edu/~ckopp/SYSTEMS/Cluster-Practical-1299.htm
 http://www.drdobbs.com/parallel/managing-cluster-computers/184404165
 http://users.monash.edu/~ckopp/SYSTEMS/SCSI-SAN-0799.htm
 http://gridbus.org/papers/encyclopedia.pdf
 http://users.monash.edu/~ckopp/SYSTEMS/Gigabit-IP-LAN-1097.htm
 http://www.linuxvirtualserver.org/
 http://users.monash.edu/~ckopp/SYSTEMS/Infiniband-Intro-0901.htm
 Blaise Barney, Linux Clusters Overview, Lawrence Livermore National Laboratory:
https://computing.llnl.gov/tutorials/linux_clusters/
 Eric Hazen, Linux Cluster for Computational Physics Applications, Boston Uni:
http://joule.bu.edu/~hazen/LinuxCluster/
 Chkirbene, Z., Hamila, R., Foufou, S. (2018). A Survey on Data Center Network Topologies. In:
Boudriga, N., et al (eds) Ubiquitous Networking. UNet 2018. Lecture Notes in Computer Science, vol
11277. Springer, Cham. https://doi.org/10.1007/978-3-030-02849-7_13

Lecture 14 Firewall
No ratings yet
Lecture 14 Firewall
46 pages
Unit-Ii PPT
No ratings yet
Unit-Ii PPT
43 pages
CC Unit 1
No ratings yet
CC Unit 1
18 pages
Cluster Computing: Shashwat Shriparv Infinitysoft
No ratings yet
Cluster Computing: Shashwat Shriparv Infinitysoft
28 pages
Cluster Computing
100% (6)
Cluster Computing
28 pages
Cluster Computing: DATE: 28 November 2013
No ratings yet
Cluster Computing: DATE: 28 November 2013
32 pages
Cluster Computing: A Paper Presentation On
No ratings yet
Cluster Computing: A Paper Presentation On
16 pages
CC Notes
No ratings yet
CC Notes
30 pages
Unit-1 Part-1
No ratings yet
Unit-1 Part-1
14 pages
Scalable Computing Over The Internet
No ratings yet
Scalable Computing Over The Internet
41 pages
IT INFRAS Reviewer
No ratings yet
IT INFRAS Reviewer
14 pages
Module01 Cloudcomputing 250409082345 d719f5bc
No ratings yet
Module01 Cloudcomputing 250409082345 d719f5bc
82 pages
04 - Computer Clusters
No ratings yet
04 - Computer Clusters
66 pages
4.7 Cluster WSC
No ratings yet
4.7 Cluster WSC
17 pages
Cloud Computing Unit-1
100% (1)
Cloud Computing Unit-1
88 pages
Advance Computing Technology (170704)
No ratings yet
Advance Computing Technology (170704)
106 pages
Cluster and Grid Computing
No ratings yet
Cluster and Grid Computing
37 pages
A Seminar Report
100% (1)
A Seminar Report
13 pages
Data Centre Architecture
50% (2)
Data Centre Architecture
16 pages
Cluster Computing
No ratings yet
Cluster Computing
7 pages
Cluster Computer
No ratings yet
Cluster Computer
22 pages
System Models For Distributed and Cloud Computing
No ratings yet
System Models For Distributed and Cloud Computing
10 pages
CC - AB Notes
No ratings yet
CC - AB Notes
75 pages
ADSU1VFTVF25
No ratings yet
ADSU1VFTVF25
118 pages
Cluster Computing
No ratings yet
Cluster Computing
43 pages
Cloud Computing Unit I and II
No ratings yet
Cloud Computing Unit I and II
20 pages
PDC Lec 7
No ratings yet
PDC Lec 7
22 pages
Unit IV Cluster Computing
No ratings yet
Unit IV Cluster Computing
70 pages
Presented By: Veena.K.P Mca S5 Roll No:28
No ratings yet
Presented By: Veena.K.P Mca S5 Roll No:28
35 pages
Unit-1 (Cloud Computing) 1. (Accessible) Scalable Computing Over The Internet
100% (1)
Unit-1 (Cloud Computing) 1. (Accessible) Scalable Computing Over The Internet
17 pages
1.1 System Models For Distributed and Cloud Computing
No ratings yet
1.1 System Models For Distributed and Cloud Computing
37 pages
Cloud Notes
No ratings yet
Cloud Notes
60 pages
I CT Assignment
No ratings yet
I CT Assignment
5 pages
Unit I GCC
No ratings yet
Unit I GCC
6 pages
Explain All The Evolutionary Changes in The Age of Internet Computing. The Age of Internet Computing
No ratings yet
Explain All The Evolutionary Changes in The Age of Internet Computing. The Age of Internet Computing
5 pages
Unit - 1 Systems Modelling, Clustering and Virtualization: 1. Scalable Computing Over The Internet
No ratings yet
Unit - 1 Systems Modelling, Clustering and Virtualization: 1. Scalable Computing Over The Internet
28 pages
Cloud Computing Unit 1 Notes
No ratings yet
Cloud Computing Unit 1 Notes
29 pages
Amity University, Rajasthan: " Cluster Computing "
No ratings yet
Amity University, Rajasthan: " Cluster Computing "
3 pages
High Performance Computing
100% (2)
High Performance Computing
61 pages
Cluster
No ratings yet
Cluster
55 pages
CC 1
No ratings yet
CC 1
32 pages
Module 2 CC
No ratings yet
Module 2 CC
8 pages
MCA Cluster Computing
No ratings yet
MCA Cluster Computing
24 pages
Cluster Computing
No ratings yet
Cluster Computing
32 pages
Classification of Distributed Computing Systems
No ratings yet
Classification of Distributed Computing Systems
14 pages
A Cluster Computer and Its Architecture
No ratings yet
A Cluster Computer and Its Architecture
3 pages
Data Center Fabric Architectures
100% (2)
Data Center Fabric Architectures
30 pages
Seminar
No ratings yet
Seminar
20 pages
Lecture 14 PDC Bcs 6ef Smi Spring 2025
No ratings yet
Lecture 14 PDC Bcs 6ef Smi Spring 2025
32 pages
Ira Pramanick
No ratings yet
Ira Pramanick
24 pages
Unit1 Cloud Computing
No ratings yet
Unit1 Cloud Computing
45 pages
Pdcco 1
No ratings yet
Pdcco 1
8 pages
Visvesvaraya Technological University (VTU) : Created by
No ratings yet
Visvesvaraya Technological University (VTU) : Created by
47 pages
02 Testing Process - Done
No ratings yet
02 Testing Process - Done
24 pages
Numerical Solutions
No ratings yet
Numerical Solutions
2 pages
06 Decision Tables - Done
No ratings yet
06 Decision Tables - Done
27 pages
01a SQA - Done
No ratings yet
01a SQA - Done
28 pages
Lecture 13 ACL
No ratings yet
Lecture 13 ACL
19 pages
Lecture 11 Malware
No ratings yet
Lecture 11 Malware
17 pages
05 Boundary Value Analysis - Done
No ratings yet
05 Boundary Value Analysis - Done
16 pages
Lecture 10 DB Security
No ratings yet
Lecture 10 DB Security
33 pages
Lecture-12 SoftwareSecurity
No ratings yet
Lecture-12 SoftwareSecurity
22 pages
Parser and CFG
No ratings yet
Parser and CFG
12 pages
Intro To NLP Course Outline (Fall-2024)
No ratings yet
Intro To NLP Course Outline (Fall-2024)
4 pages
Lexical Analyzer
No ratings yet
Lexical Analyzer
16 pages
Introduction To Natural Language Processing
No ratings yet
Introduction To Natural Language Processing
31 pages
NLP Pos Ner
No ratings yet
NLP Pos Ner
11 pages
Optimize System Bandwidth For HPC Ai Micron CXL Intel Xeon Whitepaper
No ratings yet
Optimize System Bandwidth For HPC Ai Micron CXL Intel Xeon Whitepaper
8 pages
Linpack Benchmark MatLab PDF
No ratings yet
Linpack Benchmark MatLab PDF
3 pages
Data-Level Parallelism Presentation (1) Morning 6 35AM
No ratings yet
Data-Level Parallelism Presentation (1) Morning 6 35AM
87 pages
Supercomputer Benchmarking
No ratings yet
Supercomputer Benchmarking
18 pages
High Computing Solution Resources - White Papers56 - en Us
No ratings yet
High Computing Solution Resources - White Papers56 - en Us
7 pages
All-Products Esuprt Software Esuprt It Ops Datcentr MGMT High-Computing-Solution-Resources White-Papers84 En-Us
No ratings yet
All-Products Esuprt Software Esuprt It Ops Datcentr MGMT High-Computing-Solution-Resources White-Papers84 En-Us
8 pages
Introduction To Parallel Computing: National Tsing Hua University Instructor: Jerry Chou 2017, Summer Semester
No ratings yet
Introduction To Parallel Computing: National Tsing Hua University Instructor: Jerry Chou 2017, Summer Semester
81 pages
Experimental Investigation of Phase Change Materials For Thermal Management of Handheld Devices
No ratings yet
Experimental Investigation of Phase Change Materials For Thermal Management of Handheld Devices
7 pages
17 7 024-Isoiec30134-4ed1 0en
No ratings yet
17 7 024-Isoiec30134-4ed1 0en
17 pages
TOP500 Supercomputer Sites: Jack J. Dongarra
No ratings yet
TOP500 Supercomputer Sites: Jack J. Dongarra
31 pages
Performance Evaluation Based On Open Source Cloud Platforms For High Performance Computing
No ratings yet
Performance Evaluation Based On Open Source Cloud Platforms For High Performance Computing
5 pages
ENERGY STAR Computers Version 8.0 Final Draft Test Method
No ratings yet
ENERGY STAR Computers Version 8.0 Final Draft Test Method
9 pages
Cloud Computing Unit-1 Notes
No ratings yet
Cloud Computing Unit-1 Notes
16 pages
High Performance Computing (HPC) - Lec2
No ratings yet
High Performance Computing (HPC) - Lec2
53 pages
Supercomputerjgs Ojirs
No ratings yet
Supercomputerjgs Ojirs
12 pages
Supercomputer
No ratings yet
Supercomputer
54 pages

Topic 6A 2024

Uploaded by

Topic 6A 2024

Uploaded by

Information Technology

FIT3143 Parallel Computing

Cat 6 Ethernet 8P8C (RJ45) Twisted Pair Cable

External PCIe 24 x Switch

External PCIe Cables [http://www.onestopsystems.com]

Dolphin Software eXpressWare suite – TCP/IP over ePCIe / Proprietary SuperSockets

Mellanox InfiniBand 1U Switch, 12 QSFP+ ports

Mellanox ConnectX®-4 Single/Dual-Port Adapter -100Gb/s 16 x PCI3

E = P(N)/N where P is performance

 Values close to unity or 100% are ideally sought

Abstract: SCTP (Stream Control Transmission

You might also like