0% found this document useful (0 votes)
36 views7 pages

DAL Oral Question Bank

Uploaded by

jackiejamessjj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views7 pages

DAL Oral Question Bank

Uploaded by

jackiejamessjj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

Q.

What is Data Analysis:


Data analysis is the process of examining, transforming, and arranging raw data in a specific way to
generate useful information from it

Q. What are types of Data Analytics:

There are four major types of data analytics

Descriptive analytics Diagnostic analytics

Predictive analytics Prescriptive analytics

Descriptive Analytics
• Descriptive Analytics, is the conventional form of data analysis
• It seeks to provide a depiction or “summary view” of facts and figures in
an understandable format
Diagnostic analytics
• Diagnostic Analytics is a form of advanced analytics which examines data
or content to answer the question “Why did it happen?”
Predictive analytics
• Predictive analytics helps to forecast trends based on the current events
Prescriptive analytics
• Set of techniques to indicate the best course of action
• It tells what decision to make to optimize the
outcome

Q. Which are Python Packages used for Data Science

A package is a collection of Python modules.


• Numpy
• SciPy
• Pandas
• Statsmodels
• Matplotlib
• Seaborn
• Plotly
• Bokeh
• Scikit Learn
• Keras
Q. Explain in short about Basic libraries in Python:

NumPy–Numerical Python: NumPyisa Python library used for working with arrays. It also
has functions for working in domain of linear algebra, fourier transform, and matrices.

Pandas–Data frame Python: pandas is a software library written for the Python programming
language for data manipulation and analysis. In particular, it offers data structures and
operations for manipulating numerical tables and time series.

Matplotlib–Visualization: Matplotlib is a comprehensive library for creating static,


animated, and interactive visualizations in Python. Matplotlib makes easy things easy and
hard things possible. Create, Develop publication quality plots. Use interactive figures that
can zoom, pan, update.

Sklearn–Machine Learning: Scikit-learn is a free machine learning library for Python. It


features various algorithms like support vector machine, random forests, and k-neighbours.

Seaborn is a library for making statistical graphics in Python.

 It builds on top of matplotlib and integrates closely with pandas data structures.

 Seaborn helps you explore and understand your data. Its plotting functions operate on
dataframes and arrays containing whole datasets and internally perform the necessary
semantic mapping and statistical aggregation to produce informative plots. Its dataset-
oriented, declarative API lets you focus on what the different elements of your plots
mean, rather than on the details of how to draw them

Q. Explain Different plots

 A scatter plot (also called a scatterplot, scatter graph, scatter chart, scattergram, or
scatter diagram) is a type of plot or mathematical diagram using Cartesian coordinates
to display values for typically two variables for a set of data.

 A histogram is a graphical representation that organizes a group of data points into


user-specified ranges. Similar in appearance to a bar graph, the histogram condenses
a data series into an easily interpreted visual by taking many data points and grouping
them into logical ranges or bins.

 Box Plot is the visual representation of the depicting groups of numerical data
through their quartiles. Boxplot is also used for detect the outlier in data set. It
captures the summary of the data efficiently with a simple box and whiskers and
allows us to compare easily across groups. Boxplot summarizes a sample data using
25th, 50th and 75th percentiles. These percentiles are also known as the lower
quartile, median and upper quartile.

Q. What is statement in python To Print


Importing Numpy import numpy as np

#Print 3x3 matrix with all zeros print(np.zeros((3,3)))

#Print 2x2 matrix with all zeros print(np.ones((2,2)))

#Print identity matrix of 3x3 print(np.eye(3))

# Printing shape of array print("Shape of array: ", c.shape)

# Printing size (total number of elements) of a print("Size of array: ", c.size)


rray

# Printing type of elements in array print("Array stores elements of type: ", c.dty
pe)

Importing Pandas import pandas as pd

reading data from csv file df = pd.read_csv('data.csv')

Exporting data to CSV with pandas df.to_csv('export.csv')


Q. What is Pandas DataFrame:

A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in
rows and columns. A pandas DataFrame can be created using various inputs like – Lists,
dictionary, series, Numpy ndarrays, another DataFrame.

Q. What is data Wrangling

Data wrangling is the process of cleaning, structuring and enriching raw data into a desired
format for better decision making in less time.

Data wrangling in python deals with the below functionalities:


1. Data exploration: In this process, the data is studied, analyzed and understood by
visualizing representations of data.
2. Dealing with missing values: Most of the datasets having a vast amount of data contain
missing values of NaN, they are needed to be taken care of by replacing them with mean,
mode, the most frequent value of the column or simply by dropping the row having
a NaN value.
3. Reshaping data: In this process, data is manipulated according to the requirements,
where new data can be added or pre-existing data can be modified.
4. Filtering data: Some times datasets are comprised of unwanted rows or columns which
are required to be removed or filtered

Q. What is population?
Population is a pool or collection of elements or individuals from which we draw a statistical

sample for a study. It is the entire group about which we want to draw a conclusion. The

number of elements or individuals in a population is called the population size.

Q. What is Sample?

It is a subset of the population. It is the specific group from which you collect data. The

number of elements or individuals in a sample is called the sample size. The process of

selecting a sample is called sampling.

Q. What are different Sampling techniques explain any one:


Simple Random Sampling (SRS):
Stratified Sampling:
Cluster Sampling:
Systematic Sampling:
Convenience Sampling:

Q. What is Hypothesis:
It is a statement about a population which we want to verify on the basis of information
which contained in a sample. E.g Messi is the best captain

Q. Different Terminologies in Hypothesis Testing:


1. Population: Population is a pool or collection of elements or individuals from which
we draw a statistical sample for a study
2. Sample: Sample is a subset of the population. It is the specific group from which you
collect data
3. Parameter:
It is a summary description of a fixed characteristic of the target population e.g Mean,
Variance, Standard Deviation
4. Sampling Distribution:
Statistic obtained through a large number of samples drawn from a specific population
5. Standard Error:
Similar to Standard Deviation, but this the measure of spread of Sample data, whereas
SD is for Population

6. Null Hypothesis H0:


Hypothesis that the event won't happen Hypothesis assumed to be true before we collect
data. If the null hypothesis is approved, no changes will be made.
Ex: If the hypothesis is that “the consumption of a particular medicine reduces the chances of
heart arrest”, the null hypothesis will be “the consumption of the medicine doesn’t reduce the
chances of heart arrest.”

7. Alternate Hypothesis H1:


Hypothesis that the event will happen This is what we want to prove to be true with our
collected data The rejection of the null hypothesis leads to the acceptance of the alternative
hypothesis. Ex: If a researcher is assuming that the
bearing capacity of a bridge is more than 10 tons, then the hypothesis under this study will be
Null hypothesis H0: μ= 10 tons
Alternative hypothesis Ha: μ>10 tons

8. Simple Hypothesis
Hypothesis completely specifies the distribution of the population

9. Composite Hypothesis:
Hypothesis does not completely specify the distribution of the population

10. Type-1 Error:


Error occurs when the sample results, lead to the rejection of the null hypothesis when it
is in fact true. It is equivalent to false positives when you reject a true null hypothesis

11. Type-2 Error:


Error occurs when sample results, the null hypothesis is not rejected when it is in fact false
It is equivalent to false negatives when you accept a false null hypothesis

12. Level of Significance (α):


The probability of making a Type-I error Alpha is the maximum probability that we have a
Type-I error.
For a 95% confidence level, the value of alpha is 0.05. This means that there is a 5%
probability that we will reject a true null hypothesis

Q. Which are different hypothesis Test?

1. Z Test

2. T Test

3. ANOVA / analysis of variance:


What is Machine Learning:

Machine learning is a branch of artificial intelligence (AI) and computer science


which focuses on the use of data and algorithms to imitate the way that humans
learn, gradually improving its accuracy.

Classification of Machine Learning


Machine learning implementations are classified into three major categories,
depending on the nature of the learning “signal” or “response” available to a
learning system which is as follows:-
1. Supervised learning: When an algorithm learns from example data and
associated target responses that can consist of numeric values or string
labels, such as classes or tags, in order to later predict the correct
response when
2. Unsupervised learning: Whereas when an algorithm learns from plain
examples without any associated response, leaving to the algorithm to
determine the data patterns on its own.
3. Reinforcement learning: When you present the algorithm with examples
that lack labels, as in unsupervised learning. However, you can
accompany an example with positive or negative feedback according to
the solution the algorithm proposes comes under the category of
Reinforcement learning,

Categorizing on the basis of required Output


1. Classification: When inputs are divided into two or more classes, and
the learner must produce a model that assigns unseen inputs to one or
more (multi-label classification) of these classes. This is typically tackled
in a supervised way. Spam filtering is an example of classification, where
the inputs are email (or other) messages and the classes are “spam” and
“not spam”.
2. Regression: Which is also a supervised problem, A case when the
outputs are continuous rather than discrete.
3. Clustering: When a set of inputs is to be divided into groups. Unlike in
classification, the groups are not known beforehand, making this typically
an unsupervised task.

You might also like