0% found this document useful (0 votes)
10 views32 pages

Emerging Assignment 230906 111230

The document outlines various emerging technologies, focusing on virtual reality, augmented reality, mixed reality, big data, advancements in computer processing, cloud computing, and the evolution of data science. It discusses the characteristics and life cycle of big data, as well as the differences between structured and unstructured data. Additionally, it highlights the impact of improved processing speeds and new chip architectures on various industries, along with the role of cloud computing and APIs in providing scalable on-demand services.

Uploaded by

Nahom Yigezu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views32 pages

Emerging Assignment 230906 111230

The document outlines various emerging technologies, focusing on virtual reality, augmented reality, mixed reality, big data, advancements in computer processing, cloud computing, and the evolution of data science. It discusses the characteristics and life cycle of big data, as well as the differences between structured and unstructured data. Additionally, it highlights the impact of improved processing speeds and new chip architectures on various industries, along with the role of cloud computing and APIs in providing scalable on-demand services.

Uploaded by

Nahom Yigezu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 32

Emerging technologies

Group assignments
Section 4
Group members ID
1,Noel yakob....................1100/15
2,Nebil kemal...................1072/15
3,Tsion alemayehu...........1328/15
4,Soliyana Abraham..........1263/15
5,Rgbe aklog......................1143/15
6,Tadiwos dagne...............1290/15
7,yafet eshetu....................
8,Nahom yegizu.................0991/15
9,Nahom berihun...............0992/15
10,samrawit G/tsadik........1195/15
Assignments 1

1,virtual reality augmented reality and mixed reality


Their example and differences
Virtual reality (VR)
Users are completely submerged in a synthetic digital
environment with
virtual reality (VR). A VR headset is required so that you may
explore
the virtual environment without being distracted by the outside
world.
You can watch movies, play games, and attend meetings in VR,
for
instance.
E.g., Meta Quest 2
Augmented reality (AR)
With augmented reality (AR), virtual items are superimposed
over the physical environment. To see the actual world with
some digital
modifications, you can use a smartphone, a
tablet, or a set of glasses with a camera and a screen. For
instance, you may use Snapchat's augmented reality filters, play
Pokémon Go, or utilize Google Maps to
find directions.
Mixed reality (MR)
Mixed reality (MR) attaches virtual items to the actual
environmentrather than merely overlaying them. The interaction
between physicaland virtual items is seamless. You require a
gadget that can track your
location, orientation, and surrounding surroundings while
projecting holograms into your range of vision. For instance, you
may build and modify 3D models, work with others remotely, or
master new skills with
Microsoft HoloLens.
Assignments 2
1,Big data: Structured data versus unstructured data
Big data
It refers to data sets that are too large or complex to be dealt
with by traditional data processing application software.
It is a "large dataset" means a dataset too large to reasonably
process or store on a single computer.
Common examples of big date include: individual prices, ages
weight, address, names, etc.
Structured data: is most often catagorized as quantitative data,
and it's the type of data most of us are used to working with.
It adheres to a pre-defined data model amd is therefore
straightforward to analyses
It is highly organized and easily understood by machine
language.
The programming language for managing structured data is
called structured query
language( SQL).
Examples are:
Excel files or SQL database
Names, dates, addreses, credit card numbers, stock information.
It is also used in multiple comsumeroriented
businesses like
• E-commerce : review data, pricing data
•Healthcare: hospital administration, pharmacy and patient data
and medical history of patients .
•Banking: financial transaction details like name of beneficiary,
account details, sender or receiver
information and bank details.
•Travel industry : passenger data, flight information, and travel
transactions.
Unstructured data: is often catagorized as qualitativa data and
cannot be processed and analyzed using conventional data tools
amd methods.
It is any event or alert sent and received by
any user within an organization with no proper file formatting or
direct business co-dependency.
It cannot be organized in relationaldatabases.
Non-relational or NoSQL databases are the best for managing
unstructured data. Another way of managing unstructured data
is to have it flow into data lake or pool , allowing to in ots raw
unstructured
format.
Examples are
Rich media: social media, entertainment , surveillance, satellite
information , geospatial data, weather
forecasting, podcasts.
•Documents: invoices, records, web hostory, emails, productivity
applications.
• Internet of things: sensor data , tickerdata
• Analytics: machine learning, artificial intelligence(AI).
Benefits of structured and unstructured data
Structured data :
•it can be easily fed into machine learnig models as input
database.

•it does not require any AI or ML experties to work with it.

•it is of higher quality, consistency and usability than


unstructured data.

•it is maintained in a stable, centralized repository that improves


the flow of business processes and decision - making etc.
Unstructured data
•it os easily available
•ot can be stored on shared or hybrid cloud servers rith minimal
expenditure on database management
2,Advancements in computer processing speed and new chip
architectures

The rapid growth of technology across numerous industries has


been accelerated by improvements in computer processing
speed and new, affordable designs. Industry sectors like artificial
intelligence, data analysis, gaming, and others have benefited
from these developments. Let's explore these subjects in more
detail:

Improvements in semiconductor technology and the shrinking of


electrical components have led to a significant rise in computer
processing speed during the past few decades. This
development was facilitated by Moore's Law, which stated that
the number of transistors on integrated circuits doubles roughly
every two years.
Computer programs became increasingly sophisticated and
complicated as processing
speed rose. New technologies were able to be created as a
result, including real-time data processing, richer machine
learning algorithms, and quicker picture identification.The user
experience was improved overall by faster processing rates,
which allowed for seamless multitasking, quick response times,
and realistic images in games and simulations.

New inexpensive architectures: In addition to enabling faster


processing rates, these new architectures have completely
altered the availability and cost of sophisticated computing
resources. Due to this, more individuals and businesses now
have access to high-performance computing resources.
The employment of Graphics Processing Units (GPUs) in general-
purpose computing is an illustration of a low-cost architecture.
In addition to machine learning and scientific simulations, GPUs
are increasingly used for a
variety of computationally demanding activities that were
originally intended for rendering graphics. Because of its parallel
architecture and ability to carry out numerous operations at
once, GPUs are more effective for some types of computations
compared to traditional Central Processing Units (CPUs).The use
of application-specific integrated circuits (ASICs) and field-
programmable gate arrays (FPGAs) in specialized computer
applications is another new development. These architectures
can be tailored to improve performance for certain tasks, leading
to increased effectiveness and efficiency.

Overall, faster computer processing speeds and the


development of new low-cost designs have opened the door for
more potent and reasonably priced computing resources. These
advancements have sparked innovation across numerous
industries, creating new opportunities and
possibilities for both people and businesses.

3,Cloud computing and APIs (on-demand services, usually


through the internet, on a pay-per-use basis.) E.g. IBM, Amazon,
Microsoft, Google.

Cloud computing is a technology that enables users to access


and use computing resources, such as servers, storage, and
applications, over the internet on a pay-per-use basis. It provides
on-demand services, allowing users to scale their resources as
needed and only pay for what they use.

API stands for Application Programming Interface. It is a set of


rules and protocols that allows different software applications to
communicate and interact with each other. APIs define the
methods and data formats that can be used to request and
exchange
information between different systems or components. They
enable developers to access certain features or functionalities of
a software application or service without having to understand
the underlying code or implementation details. APIs are
commonly used in web development, mobile app development,
and integration of different software systems.
Some of the major cloud service providers in the market are
IBM, Amazon, Microsoft, and
Google.

IBM:- offers a comprehensive range of cloud services, including


infrastructure as a service
(IaaS), platform as a service (PaaS), and software as a service
(SaaS). They provide solutions for businesses of all sizes, from
startups to enterprise-level organizations.
Amazon Web Services (AWS):- is a leading cloud platform that
offers a wide range of
services, such as computing power, storage, databases, machine
learning, and analytics. It is known for its scalability, reliability,
and flexibility.
Microsoft Azure:- is another popular cloud computing
platform that provides services for building, deploying, and
managing applications and services through Microsoft-managed
data centers. It offers a wide range of services, including virtual
machines, storage, databases, AI, and analytics.
Google Cloud Platform (GCP):- is Google's cloud computing
service that offers a variety of services, including computing
power, storage, machine learning, and data analytics. It provides
a scalable and reliable infrastructure to support businesses'
needs.
4,The emergence of data science

The story of how data scientists became sexy is mostly the story
of the coupling of the mature discipline of statistics with a very
young one–computer science. The term “Data Science” has
emerged only recently to specifically designate a new profession
that is expected to make sense of the vast stores of big data. But
making sense of data has a long history and has been discussed
by scientists, statisticians, librarians, computer scientists and
others for years. The following timeline traces the evolution of
the term “Data Science” and its use, attempts to define it, and
related terms.
1962 John W. Tukey writes in “The Future of Data Analysis”: “For
a long time I thought I was a statistician, interested in inferences
from the particular to the general. But as I have watched
mathematical statistics evolve, I have had cause to wonder and
doubt… I
have come to feel that my central interest is in data analysis…
Data analysis, and the parts of statistics which adhere to it,
must…take on the characteristics of science rather than those of
mathematics… data analysis is intrinsically an empirical science…
How vital and how important… is the rise of the stored-program
electronic computer? In many instances the answer may surprise
many by being ‘important but not vital,’ although in others there
is no doubt but what the computer has been ‘vital.’” In 1947,
Tukey coined the term “bit” which Claude Shannon used in his
1948 paper “A Mathematical Theory of Communications.” In
1977, Tukey published Exploratory Data Analysis, arguing that
more emphasis needed to be placed on using data to suggest
hypotheses to test and that Exploratory Data Analysis and
Confirmatory Data Analysis “can—and should—proceed side by
side.”1974 Peter Naur publishes Concise Survey of Computer
Methods in Sweden and the United States. The book is a survey
of contemporary data processing methods that are used in a
wide range of applications. It is organized around the concept of
data as defined in the IFIP Guide to Concepts and Terms in Data
Processing: “[Data is] a representation of facts or ideas in a
formalized manner capable of being communicated or
manipulated by some process.“ The Preface to the book tells the
reader that a course plan was presented at the IFIP Congress in
1968, titled “Datalogy, the science of data and of data processes
and its place in education,“ and that in the text of the book, ”the
term ‘data science’ has been used freely.” Naur offers the
following definition of data science: “The science of dealing with
data, once they have been established, while the relation of the
data to what they represent is delegated to other fields and
sciences.”
1977 The International Association for
Statistical Computing (IASC) is established as a Section of the ISI.
“It is the mission of the IASC to link traditional statistical
methodology, modern computer technology, and the knowledge
of domain experts in order to convert data into information and
knowledge.”

Assignments 3
1,List and discuss the characteristics of big data
The characteristics of a big data
A big data is a collection from different sources or places. It is
described by 5
characteristics that include :
1.Volume it is the size of big data that is contained or that
exists.It’s like the foundation of a big data.
2. Value value is one of the important perspective of business ,
and the value
of big data ordinates from operation , stronger customer ,
relationship e.t.c
3.Variety it is the different types of data that range from
structured,unstructured, and to raw data.
4.Velocity it is the process in which data is
collected,absorbed,and managed.
5.Veracity it refer to the truth or accuracy of the data and
information which
refers to the executive level of confidence .
6.Variability it refers to changing the nature or information of
data .

2,Describe the big data life cycle. Which step you think most
useful and why?

The four steps of the big data life cycle


Simply put, from the perspective of the life cycle of big data,
there are nothing more than four aspects:
1. Big data collection
2. Big data preprocessing
3. Big data storage
4. Big data analysis
All above four together constitute the core technology in the big
data life cycle.
● Big data collection
Big data collection is the collection of structured and
unstructured massive data from various sources.
Database collection: Sqoop and ETL are popular, and
traditional relational databases MySQL and Oracle still serve as
data storage methods for many enterprises. Of course, for the
open source Kettle and Talend itself, big data integration content
is also integrated, which can realise data synchronisation and
integration between hdfs, hbase and mainstream Nosq
databases.
Network data collection: A data collection method that uses web
crawlers or website public APIs to obtain unstructured or semi-
structured data from web pages and unify them into local data.
File collection: Including real-time file collection and processing
technology flume,
ELK-based log collection and incremental collection, etc.
● Big data preprocessing
Big data preprocessing refers to a series of operations such as
“cleaning, filling, smoothing, merging,
normalisation, consistency check” and other operations on the
collected raw data before data analysis, in order to improve the
data Quality lays the foundation for later analysis work. Data
preprocessing mainly includes four parts
•Data cleaning
•Data integration
•Data conversion
•Data specification
Data cleaning refers to the use of cleaning tools such as ETL to
deal with missing data (missing attributes of interest), noisy data
(errors in the data, or data that deviates from expected values),
and inconsistent data.
Data integration refers to the consolidation and storage of data
from different data sources in a unified database. The storage
method focuses on solving three problems: pattern matching,
data redundancy, and data value conflict detection and
processing.Data conversion refers to the process of processing
the inconsistencies in the extracted data. It also includes data
cleaning, that is, cleaning abnormal data according to business
rules to ensure the accuracy of subsequent analysis results.
Data specification refers to the operation of minimising the
amount of data to obtain a smaller data set on the basis of
keeping the original appearance of the data to the maximum
extent, including:
data party aggregation, dimension specification, data
compression, numerical specification, concept layering, etc.
● Big data storage
Big data storage refers to the process of
using memory to store the collected data in the form of a
database in three typical routes:
New database cluster based on MPP architecture: Using Shared
Nothing architecture, combined with the efficient distributed
computing model of MPP architecture, through column storage,
coarse-grained indexing and other big data processing
technologies, the focus is on data storage methods developed
for industry big data. With the characteristics of low cost, high
performance, high scalability, etc., it has a wide range of
applications in the field of enterprise analysis applications.
Compared with traditional databases, its PB-level data analysis
capabilities based on MPP products have significant advantages.
Naturally, the MPP database has also become the best choice for
a new generation of enterprise data warehouses.
Technology expansion and packaging based on Hadoop: Hadoop-
based technology
expansion and encapsulation is aimed at data and scenarios that
are difficult to process with traditional relational
databases (for storage and calculation of unstructured data,
etc.), using Hadoop open source advantages and related features
(good at handling unstructured and semi-structured data),
Complex ETL processes, complex data mining and calculation
models the process of deriving relevant big data technology.
With the advancement of technology, its application scenarios
will gradually expand. The most typical application scenario at
present is to support the Internet big data storage and analysis
by expanding and encapsulating Hadoop, involving dozens of
NoSQL technologies.
Big data all-in-one: This is a combination of software and
hardware designed for the analysis and processing of big data. It
consists of a set of integrated servers,
storage devices, operating systems,
database management systems, and pre-installed and optimised
software for data query, processing, and analysis. It has good
stability and vertical scalability.

● Big data analysis and mining


From visual analysis, data mining algorithms, predictive analysis,
semantic engine, data quality management, etc., the process of
extracting, refining and analysing the chaotic data.
Visual analysis: Visual analysis refers to an analysis method that
clearly and effectively conveys and communicates information
with the aid of graphical means. Mainly used in massive data
association analysis, that is, with the help of a visual data
analysis platform, the process of performing
association analysis on dispersed heterogeneous data and
making a complete analysis chart. It is simple, clear, intuitive and
easy to accept.
Data mining algorithm: Data mining algorithms are data analysis
methods that test and calculate data by creating data mining
models. It is the theoretical core of big data analysis.
There are various data mining algorithms, and different
algorithms show different data
characteristics due to different data types and formats. But
generally speaking, the process of creating a model is similar,
that is, first analyse the data provided by the user, then search
for specific types of patterns and trends, and use the analysis
results to define the best parameters for creating a mining
model, and apply these parameters In the entire data set to
extract feasible patterns and detailed statistics.
Data quality management refers to the identification,
measurement, monitoring, and early warning of various data
quality problems that may be caused in each stage
of the data life cycle (planning, acquisition, storage, sharing,
maintenance, application, extinction, etc.) to improve data A
series of
quality management activities.
Predictive analysis: Predictive analysis is one of the most
important application areas of big data analysis. It combines a
variety of advanced analysis functions (special statistical analysis,
predictive
modelling, data mining, text analysis, entity analysis,
optimization, real-time scoring, machine learning, etc.), to
achieve the purpose of predicting uncertain events.
Help users analyse trends, patterns, and relationships in
structured and unstructured data, and use these indicators to
predict future events and provide a basis for taking measures.
Semantic Engine: Semantic engine refers to the operation of
adding semantics to existing data to improve users’ Internet
search experience
3,List and describe each technology or tool used in the big data
life cycle.
big data tool is software that extracts information from various
complex data types and sets, and then processes these to
provide meaningful insights. Traditional databases cannot
process huge data hence best big data tools that manage big
data easily are used by businesses.

There are a variety of big data processing technologies available,


including Apache Hadoop, Apache Spark, and MongoDB. Each
of these technologies has its own strengths and weaknesses, but
all of them can be used to gain insights from large data sets.

The Apache Hadoop software library is a framework that


allows for the distributed processing of large data sets across
clusters of computers using simple programming models. It is
designed to scale up from single
servers to thousands of machines, each offering local
computation and storage. Rather than rely on hardware to
deliver high-availability, the library itself is designed to detect
and handle failures at the application layer, so delivering a
highly-available service on top of a cluster of computers, each of
which may be prone to failures.

Apache Spark is a multi-language engine for executing data


engineering, data science, and machine learning on single-node
machines or clusters.

MongoDB is built on a scale-out architecture that has become


popular with developers of all kinds for developing scalable
applications with evolving data schemas. As a document
database, MongoDB makes it easy for developers to store
structured or unstructured data. It uses a JSON-like format to
store documents.
4,Discuss the three methods of computing over a large dataset.

1) MapReduce is a processing technique and a program model


for distributed computing based on java. The MapReduce
algorithm contains two important tasks, namely Map and
Reduce. Map takes a set of data and converts it into another set
of data, where individual elements are broken down into tuples
(key/value pairs). Secondly, reduce task, which takes the output
from a map as an input and combines those data tuples into a
smaller set of tuples. As the sequence of the name MapReduce
implies, the reduce task is always performed after the map
job.The major advantage of MapReduce is that it is easy to scale
data processing over multiple computing nodes.
2) Parallel processing uses two or more processors or CPUs
simultaneously to handle various components of a single
activity. Systems can slash a program’s execution time by
dividing a task’s many parts among several processors. Multi-
core processors, frequently found in modern computers, and any
system with more than one CPU are capable of performing
parallel processing.Most computers can have two to four cores,
while others can have up to twelve. Complex operations and
computations are frequently completed in parallel processing.

3) Distributed computing refers to a system where processing


and data storage is distributed across multiple devices or
systems, rather than being handled by a single central device. In
a distributed system, each device or system has its own
processing capabilities and may also store
and manage its own data. These devices or systems work
together to perform tasks and share resources, with no single
device serving as the central hub.
Components
There are several key components of a Distributed Computing
System
Devices or Systems: The devices or systems in a distributed
system have their own processing capabilities and may also store
and manage their own data.
Network: The network connects the devices or systems in the
distributed system, allowing them to communicate and
exchange data.
Resource Management: Distributed systems often have some
type of resource management system in place to allocate and
manage shared resources such as computing power, storage,
and networking.

You might also like