Untitled Document - G
Untitled Document - G
D
USING PYTHON
SECTION-A
Introduction to Big data and Data Science:-Meaning of Big data and Data
Science, Challenges of big data, Relationship between Big Data and Data
Science, Benefits and uses of data science and big data.
Facts of data:-Structured versus Unstructured data,natural language,
machine-generated data, graph-based data, audio, image and video data.
Data Science Process:-Goal setting, retrieving data, data preparation , data
cleansing, data integration and transformation, exploratory data analysis, data
visualization, Model building and performance evaluation, presentation.
Data Set and its features:-Meaning of the terms:-observations and
variables, Discrete and continuous variables, quantitative and qualitative
variables, dependent and independent variables, variables classified on scale:
Nominal, Ordinal, Interval and Ratio variables.
Data Preparation:-Need for data preparation, Datacleansing, Methods of data
cleansing – data entry errors, sanity checks, outlier detection, treatment of
missing values, discrepancies in data, use of metadata, codes and rules. Data
SECTION-A
Integration, Types of data integration. Data Transformation strategies –
Normalization, Data Discretization and discretization methods.