Emerging Assignment 230906 111230
Emerging Assignment 230906 111230
Group assignments
Section 4
Group members ID
1,Noel yakob....................1100/15
2,Nebil kemal...................1072/15
3,Tsion alemayehu...........1328/15
4,Soliyana Abraham..........1263/15
5,Rgbe aklog......................1143/15
6,Tadiwos dagne...............1290/15
7,yafet eshetu....................
8,Nahom yegizu.................0991/15
9,Nahom berihun...............0992/15
10,samrawit G/tsadik........1195/15
Assignments 1
The story of how data scientists became sexy is mostly the story
of the coupling of the mature discipline of statistics with a very
young one–computer science. The term “Data Science” has
emerged only recently to specifically designate a new profession
that is expected to make sense of the vast stores of big data. But
making sense of data has a long history and has been discussed
by scientists, statisticians, librarians, computer scientists and
others for years. The following timeline traces the evolution of
the term “Data Science” and its use, attempts to define it, and
related terms.
1962 John W. Tukey writes in “The Future of Data Analysis”: “For
a long time I thought I was a statistician, interested in inferences
from the particular to the general. But as I have watched
mathematical statistics evolve, I have had cause to wonder and
doubt… I
have come to feel that my central interest is in data analysis…
Data analysis, and the parts of statistics which adhere to it,
must…take on the characteristics of science rather than those of
mathematics… data analysis is intrinsically an empirical science…
How vital and how important… is the rise of the stored-program
electronic computer? In many instances the answer may surprise
many by being ‘important but not vital,’ although in others there
is no doubt but what the computer has been ‘vital.’” In 1947,
Tukey coined the term “bit” which Claude Shannon used in his
1948 paper “A Mathematical Theory of Communications.” In
1977, Tukey published Exploratory Data Analysis, arguing that
more emphasis needed to be placed on using data to suggest
hypotheses to test and that Exploratory Data Analysis and
Confirmatory Data Analysis “can—and should—proceed side by
side.”1974 Peter Naur publishes Concise Survey of Computer
Methods in Sweden and the United States. The book is a survey
of contemporary data processing methods that are used in a
wide range of applications. It is organized around the concept of
data as defined in the IFIP Guide to Concepts and Terms in Data
Processing: “[Data is] a representation of facts or ideas in a
formalized manner capable of being communicated or
manipulated by some process.“ The Preface to the book tells the
reader that a course plan was presented at the IFIP Congress in
1968, titled “Datalogy, the science of data and of data processes
and its place in education,“ and that in the text of the book, ”the
term ‘data science’ has been used freely.” Naur offers the
following definition of data science: “The science of dealing with
data, once they have been established, while the relation of the
data to what they represent is delegated to other fields and
sciences.”
1977 The International Association for
Statistical Computing (IASC) is established as a Section of the ISI.
“It is the mission of the IASC to link traditional statistical
methodology, modern computer technology, and the knowledge
of domain experts in order to convert data into information and
knowledge.”
Assignments 3
1,List and discuss the characteristics of big data
The characteristics of a big data
A big data is a collection from different sources or places. It is
described by 5
characteristics that include :
1.Volume it is the size of big data that is contained or that
exists.It’s like the foundation of a big data.
2. Value value is one of the important perspective of business ,
and the value
of big data ordinates from operation , stronger customer ,
relationship e.t.c
3.Variety it is the different types of data that range from
structured,unstructured, and to raw data.
4.Velocity it is the process in which data is
collected,absorbed,and managed.
5.Veracity it refer to the truth or accuracy of the data and
information which
refers to the executive level of confidence .
6.Variability it refers to changing the nature or information of
data .
2,Describe the big data life cycle. Which step you think most
useful and why?