Data Scientist
Data Scientist
To be added by NSDA
NIELIT Chandigarh,
Plot No. C-134, Phase VIII, Industrial Area, Sector 72, Mohali. 160071.
1
QUALIFICATION FILE SUMMARY
2
Formal structure of the qualification
Estimated
Title of unit or other component Mandatory/ size
Level
(include any identification code used) Optional (learning
hours)
Configure Deployment Platform Mandatory 15
Please attach any document giving further detail about the structure of the qualification – e.g.
a Curriculum or Qualification Pack. Detailed Curriculum attached at Annexure III.
SECTION 1
ASSESSMENT
Name of assessment body:
Examination Cell,
National Institute of Electronics and Information Technology
6-CGO Complex, Electronics Niketan,
Lodhi Road, New Delhi. 110003.
Presently, only candidates undergoing training shall be assessed. Later on, candidates having
experience and knowledge shall be assessed. The information will be provided on finalization
of such procedure.
Describe the overall assessment strategy and specific arrangements which have been put
in place to ensure that assessment is always valid, consistent and fair and show that
these are in line with the requirements of the NSQF:
The emphasis is on practical demonstration of skills & knowledge based on the performance
criteria. Each OUTCOME is assessed & marked separately. Student is required to pass in all
OUTCOMEs individually and marks are allotted. Following assessment methodologies are
used.
3
A. Written Assessment (Multiple Choice Questions)
B. Practical Assessment
C. Viva Voce Assessment
Please attach any documents giving further information about assessment and/or RPL.
ASSESSMENT EVIDENCE
Complete the following grid for each grouping of NOS, assessment unit or other
component as listed in the entry on the structure of the qualification on page 1.
Title of Unit/Component:
(Detailed Curriculum attached As Annexure-III)
Assessable Assessment criteria for the outcome Total Writte Practic Vivo-
Outcomes Mark n al voce
1. 75 5 5 5
Configure Preparation of platform for analyzing big
Deployment data
Platform 10 10 10
Acquiring skills to interact with prepared
platform
10 10 10
Acquiring skills to secure files by
managing users, groups and their privilege
Total 25 25 25
2 Selection of database based on 100 5 5 0
Analyze and Requirements
Define Business Acquiring the skills on designing database 5 5 0
Requirements Acquiring the advanced skills on database 5 5 5
designing
Acquiring the skills to manipulate data 5 5 5
Acquiring the advanced skills to 10 10 5
manipulate data
Acquiring the skills to manage large scale 10 10 5
data warehouse and eliciting hidden
information
Total 40 40 20
4
3 Acquiring fundamental software 150 10 10 5
Design and developing skills
Develop Acquiring skills on handling unusual 10 10 5
Presentation situations at runtime
Layer Acquiring skills on development software 10 10 5
with latest practices
Acquiring skills on architecture of front- 10 10 5
end application
Acquiring skills on developing front-end 10 10 5
application
Attaining skills on integrating application 10 10 5
with backend database
Total 60 60 30
4 Acquiring skills on platform preparation 150 5 5 5
Analyze Big for managing big data
Data in Cluster Acquiring skills on platform preparation 10 10 5
Environment for managing big data in grid
Acquiring skills on interacting with big 10 10 10
data file system
Acquiring skills on analyzing data using 10 10 10
conventional style of programming
Acquiring advance skills on analyzing 20 20 10
data using conventional style of
programming
Total 55 55 40
5 Use of data warehouse facility for 200 5 5 5
Analyze Data analyzing big data
using Big Data Use of Programming language to Analyze 10 10 10
Analytic Tools big data stored at data warehouse
Use of column database for analyzing big 10 10 5
data
Use of programming language to analyze 10 10 5
stored in column database
Use of high level tool to analyze big data 10 10 5
Use of programming language to analyze 10 10 5
big data stored in high level tool
Use of Big Data Analytic tool to analyze 10 10 10
semi-structured/unstructured data
Use of programming language to analyse 10 10 5
Semi Structured/ unstructured data
Total 75 75 50
6 Identify big data Requirements 325 25
Manage Real Document big data requirements 25
World Data Design big data application 100
Analytic Develop big data application 50
Application
Test big data application 50
Steps to implement the developed 50
application
Demonstrate big data application 25
5
Total 325
7 Acquiring Communication Skill 10
Enhancing Managing career, staff and professional 20
Communication relationships
Skill Preparing for interview 20
Total 50
Grand Total 1050 255 255 540
Means of assessment 1
Proctored online assessments (LAN and Web based), carried out using a variety of question
formats applicable for linear / adaptive methodologies; performance criteria being assessed
via situation judgement tests, simulations, code writing, psychometrics and multiple choice
questions etc.
SECTION 2
EVIDENCE OF NEED
Big Data is a term used to describe the process of collecting, organizing, and
analyzing large sets of data to discover hidden patterns, unknown correlations, and
other useful information. Big Data helps you understand the information contained
within the data and helps identify that which is most important for future business
decisions. Big data has become an essential factor for the success of business in
various verticals. Many developing countries such as China, Brazil, and France are
also focusing on adopting this new technology.
The major driving force of this technology is the rising unstructured data from
several sources and the constant need of enterprises to optimize large workloads of
data to enhance the overall efficiency of system. However, there are few restraints
in the advancement of this technology such as lack of skilled personnel and lack of
security measures and solutions. However, with the extensive efforts of large
vendors in the ecosystem, they have established advanced training and learning
centers, and have launched various certification courses to cater to these issues.
(Source: http://www.micromarketmonitor.com/market-report/big-data-reports-
4483038255.html?gclid=CLnf6JvS3coCFVAXaAodtboF6w)
The global big data market is expected to grow to $46.34 billion by 2018 at an
estimated CAGR of 25.5%, during the forecast period. Rapid increase in data
generation from different industry verticals is one of the major factors driving
this market.
The demand for Analytics skill is going up steadily but there is a huge deficit on the
supply side. This is happening globally and is not restricted to any part of
geography. In spite of Big Data Analytics being a ‘Hot’ job, there is still a large
6
number of unfilled jobs across the globe due to shortage of required skill. A
McKinsey Global Institute study states that the US will face a shortage of about
190,000 data scientists and 1.5 million managers and analysts who can understand
and make decisions using Big Data by 2018.
Currently, India has the highest concentration of analytics globally. In spite of this,
the scarcity of data analytics talent is particularly acute and demand for talent is
expected to be on the higher side as more global organizations are outsourcing their
work.
What steps were taken to ensure that the qualification(s) does/do not duplicate
already existing or planned qualifications in the NSQF?
As the understanding and adoption models of QPs evolve in the industry and
across its sub-sectors, we foresee consolidation of qualification packs as a natural
progression. The Qualification does not exist as per information available in public
domain.
What arrangements are in place to monitor and review the qualification(s)?
What data will be used and at what point will the qualification(s) be revised or
updated?
The Qualification is to be monitored and reviewed every two years.
The following data will be used
1. Results of assessments
2. Employer feedback will be sought post-placement
3. Student feedbacks
4. Workshops and seminar for reviewing the qualifications
5. Industry Requirements
6. Consultation/ Tie-up with Industries or Expert for review of the Curriculum.
Please attach any documents giving further information about any of the topics above.
NIL
7
SECTION 3
SUMMARY EVIDENCE OF LEVEL
Level of qualification: 6
Summary of Direct Evidence:
Justify the NSQF level allocated to the QP by building upon the five descriptors of NSQF. Explain the reasons for allocating the level to the QP.
Generic NOS is/are linked to the overall authority attached to the job role.
Data Scientist
Leve
Process required Professional knowledge Professional skill Core skill Responsibility
l
Data Scientists carries out After acquiring professional They are proficient in Data Scientist after They are able to lead
the job to identify knowledge on Big Data developing solution based on acquiring skills both team as well as work in
requirements of business tools and Techniques, the detailed design and practical managerial and technical team.
analyzes which are helpful Data Scientist will be knowledge gained during of this level are able to
in making business competent to identify course interact with different
decisions technical requirements in stakeholders involved like
terms of hardware, software vendors, clients and users.
and other IT related devices.
Data Scientists acquire wide 6
They plan tests, prepare tests
range of theoretical practical cases, generate test data and They are able to make
skills to provide analytic They can prepare detailed perform testing on test data independent decision
solution to business analytic design of the proposed involved in providing
problem solution for Big Data solution.
Their job is to prepare Analytics
abstract model based on
requirement to propose
solution
7 5 6 9 6
8
SECTION 4
EVIDENCE OF RECOGNITION OR PROGRESSION
What steps have been taken in the design of this or other qualifications to ensure that
there is a clear path to other qualifications in this sector?
This qualification comprises both technical and analytic skills and can be linked to any
qualification higher than this one, existing or to come.
Cloud providers are now started to provide services for Big data analytics. Big Data as a
Service (BDaaS) like traditional cloud services, IaaS, PaaS and SaaS, is next paradigm shift
in analytics. Amazon Web Service’s Elastic MapReduce (EMR) is the most prominent core
BDaaS available. NIELIT has recently signed MoU with Amazon for imparting training on
AWS (Amazon, Web Services)
Similarly, Altiscale named one of top 5 Big Data Cloud providers provides solution for
BDaaS.
Please attach any documents giving further information about any of the topics above.
Sources:
https://aws.amazon.com/elasticmapreduce/
https://www.altiscale.com/
9
SECTION 5
EVIDENCE OF INTERNATIONAL COMPARABILITY
List any comparisons which have been established.
Big Data University is an IBM initiative to spread big data literacy. The university offers course
on ‘Data Scientist Fundament’
(Source: http://bigdatauniversity.com/)
10
Annexure I
Industry Validation
The global big data market is expected to grow to $46.34 billion by 2018 at an
estimated CAGR of 25.5%, during the forecast period. Rapid increase in data
generation from different industry verticals is one of the major factors driving
this market.
The demand for Analytics skill is going up steadily but there is a huge deficit
on the supply side. This is happening globally and is not restricted to any part of
geography. In spite of Big Data Analytics being a ‘Hot’ job, there is still a large
number of unfilled jobs across the globe due to shortage of required skill. A
McKinsey Global Institute study states that the US will face a shortage of about
190,000 data scientists and 1.5 million managers and analysts who can understand and
make decisions using Big Data by 2018.
(Source: http://www.edureka.co/blog/10-reasons-why-big-data-analytics-is-
the-best-career-move)
IBM, Cisco and Oracle together advertised 26,488 open positions that required
big data expertise in 2015. DELL has 25.1% of all available big data positions that
WANTED Analytics tracks.
(Source: http://www.forbes.com/sites/louiscolumbus/2015/11/16/where-big-
data-jobs-will-be-in-2016/#4f8a1261f7f1)
11
According to the ‘Peer Research – Big Data Analytics’ survey, it was
concluded that Big Data Analytics is one of the top priorities of the organizations
participating in the survey as they believe that it improves the performances of their
organizations.
Based on the responses, it was found that approximately 45% of the surveyed
believe that Big Data analytics will enable much more precise business insights, 38%
are looking to use Analytics to recognize sales and market opportunities. More than
60% of the respondents are depending on Big Data Analytics to boost the
organization’s social media marketing abilities. The QuinStreet research based on
their survey also backs the fact that Analytics is the need of the hour, where 77% of
the respondents consider Big Data Analytics a top priority.
(Source: http://www.datasciencecentral.com/profiles/blogs/10-reasons-why-
big-data-analytics-is-the-best-career-move)
12
Annexure II
Placement Record
Following is summary of the batches conducted at NIELIT Chandigarh on Big Data
As per information, 3 students are placed, 3 are undergoing further study. The
candidates are working in Outline Software Solution, Chandigarh (2) and High Tech ILS
Cloud Solution Pvt. Ltd Chandigarh (1)
13
Annexure -III
Detailed Curriculum
Name of Unit of : Configure Deployment Platform
Qualification
Duration : 15 Hours
Topics : Ubuntu
14
Name of Unit of : Analyse and Define Business Requirements
Qualification
Duration : 30 Hours
15
Acquiring the skills Knowledge Discovery in Databases, 6
to manage large Data Mining, Data warehouse.
scale data Migrating data from source to data
warehouse and warehouse, cleaning, aggregation
eliciting hidden operations.
information
16
Name of Unit of : Design and Develop Presentation Layer
Qualification
Duration : 60 Hours
Topics : Java Programming
17
Name of Unit of : Analyze Big Data in Cluster Environment
Qualification
Duration : 30 Hours
Topics :Hadoop and Map Reduce Programming
18
Name of Unit of : Analyze Data using Big Data Analytic Tools
Qualification
Duration : 60 Hours
Topics :Big Data Analysis using HBase, HIVE and PIG
19
Name of Unit of : Manage Real World Data Analytic Application
Qualification
Duration : 60 Hours
Topics :Project
20
Name of Unit of : Enhancing Communication Skill
Qualification
Duration : 15 Hours
21