1.introduction To Bigdata Chap1
1.introduction To Bigdata Chap1
Introduction
to Big Data
Chapter 1 & 2 (Week 1)
Course overview & introduction
Asst. Prof. Minseok Seo
[email protected]
Course Overview
Introduction to Big Data 01
Contents
1. Course Overview
Brief introduction of professor & course
Object & Aim of the course
Assignments & Quiz
Evaluation
VS.
What is Data?
In a broad sense, Big Data represents both sample size and dimensionality.
Value*
Furht B., Villanustre F. (2016) Introduction to Big Data. In: Big Data Technologies and Applications. Springer, Cham
It is the time to prepare for an academic course to cultivate data analysts
commensurate with demand.
Concept of
Big Data
Computational
Basic Skill in
approaches for
Data Science
Big Data
Introduction to
Big Data
Statistical
R
approaches for
programming
Big Data
Visualization
for Big Data
15 12.09 - 12.15 Trends in various academic & industrial fields for application of Big Data
The methodology learned in theory class will be exercised in the computer lab. on Thursday.
There are two representative computer language for Big data analysis, R and
Python.
It is not required any prior knowledge of the R language because I plan to provide
example code for student's practice.
https://cran.r-project.org/
Quiz
There will be two simple quizzes in class to check the student's learning
progress of the course (before and after midterm respectively).
Homework
There will be 4 times assignments.
This will be a report on the theory and practice of data analysis learned in
class.
10%
30%
20%
10%
30%
No Textbook
If you have any questions about the course please email me and I will reply as
soon as I see it.
I will be available at Mon: 12:00 - 17:00 | Wed: 10:00 - 13:00 | Thu: 10:00 - 13:00.
1. Course Overview
Brief introduction of professor & course
Object & Aim of the course
Assignments & Quiz
Evaluation
Value*
transferred about 197 PB of data thorough its network each data (2018)
Computer Scientist
In short, if you’re having trouble with data processing on your computer (멘붕에
빠지면), it will be due to the Big Data.
Statistician
In short, if you’re having trouble with data analysis on your computer (멘붕에 빠지
면), it will be due to the Big Data.
Software Hardware
Prescreening techniques
Data Visualization
Feature selection
Parallel processing
Clouding computing
Distributed processing
There are two representative computer language for Big data analysis, R and
Python.
https://cran.r-project.org/
Since 1997: international “R-core” team of ca. 15 people with access to common
CVS archive.
Impossible
(1) R is not a database, but connects to DBMSs
(2) R has no GUI, but connect to Java, TclTk
(3) R is fundamentally very slow, but allows to call own C/C++ code
(4) R is no spreadsheet view of data, but connects to Excel/MsOffice
(5) R is no professional & commercial support
But all R users in the world are developers (Power of Collective intelligence; 집단지성).
If you make a meaningful package at any time, you can publish it within 1 second.
Therefore, applying latest algorithms are faster than any programming language.