Python - Final 1
Python - Final 1
Which of the following statements is correct to load the dataset into Pandas as a
DataFrame with a variable name df?
A. df = pd.read_csv('salaries-by-college-type.csv')
B. df = pd.load_csv('salaries-by-college-type.csv')
C. df = pd.read_csvfile('salaries-by-college-type.csv)
D. df = pd.load_csvfile('salaries-by-college-type.csv')
Ans : A
2. To get a list of the column names from the DataFrame obtained in Problem 1, which of the
following statements should be executed?
A. df. Index (row labels)
B. df. Indexes # Invalid
C. df.column # Invalid
D. df.columns
Ans: D
3.Which of the following statements should be used to find how many different school types
does the dataset have ?
A. df['School Type'].unique()
B. df['School Type'].nunique()
C. df['School Type ].sum()
D. df['School Type'].total()
Ans: B
4. Which of the following Statements should be used to find out the actual data type of each
item in the Mid-Career Median Salary column?
A. type(df['Mid-Career Median Salary'])
B. type(df[Mid-Career Median Salary'].dtype)
C. type(df['Mid-Career Median Salary'].dtypes)
D. type(df[‘Mid-Career Median Salary'][0])
Ans: C
5. From Problem 4, we now know that the type of the data points in the Mid-Career Median
Salary column is string. Which of the following statements should be used to remove the $
sign and convert the data points from string to floating point numbers?
A. df[Mid-Career Median Salary’].replace(^.0-9]’, ").astype(float)
B. df[‘Mid-Career Median Salary’].str.replace('[^.0-9]', ").astype(float)
C. df['Mid-Career Median Salary’].sub(I^0-9]', ").astype(float) # not valid
D. df['Mid-Career Median Salary'].str.sub(‘[^.0-9]', ").astype(float) # not valid
Ans: B
6. In order to obtain the information shown below about the dataset, which of the following
statements should be executed?
<class 'pandas.core. frame. Dataframe"›
RangeIndex: 269 entries, 0 to 268
Data columns (total 8 columns):
Column Non-Null Count Dtype
School Name 269 non-null object
School Type 269 non-null object
Starting Median Salary 269 non-null object
Mid-Career Median Salary 269 non-null object
Mid-Career 10th Percentile Salary 231 non-null object
Mid-Career 25th Percentile Salary 269 non-null object
Mid-Career 75th Percentile Salary 269 non-null object
Mid-Career 90th Percentile Salary 231 non-null object
dtypes: object (8)
memory usage: 16.9+ KB
A. df.dtype # not valid
B. df.dtypes
C. df.info - Provides the entire dataset of rows and columns.
D. df.info()- Provides the summary as mentioned in the output.
Ans: D
7. Based on the screenshot shown in Problem 6, which of the following statements is
incorrect?
A. The dataset has 269 rows and 8 columns.
B. The information contained in the Dtype column most likely is not accurate
C. The Mid-Career 90th Percentile Salary column has 30 missing values
D. Two columns have missing values
Ans: C
8. Which of the following statements should be used to find the average Mid-Career Median
Salary for each college type, respectively?
A. df['School Type']['Mid-Career Median Salary'] mean()
B. df['Mid-Career Median Salary'].mean()
C. df.groupby['School Type']['Mid-Career Median Salary'].mean()
D. df.groupby ('School Type')['Mid-Career Median Salary'].mean()
Ans: D
9. Which of the following statements should be used to find out which university has the
highest Mid-Career Median Salary?
A. df.sort_values(by = 'Mid-Career Median Salary', ascending = False)
B. df.sort_values(by = 'Mid-Career Median Salary', ascending = False) iloc[0]
C. df.sort_values(by = 'Mid-Career Median Salary, ascending = True)
D. df.sort_values(by = 'Mid-Career Median Salary, ascending = True) iloc[0]
Ans: B
10. Which of the following statements should be used to find out the total numbers of
universities whose Mid-Career Median Salary are above $100,000?
A. df[df[‘Mid-Career Median Salary'] > 100000]['School Name'].count()
B. df[df['Mid-Career Median Salary'] > 100000][School Name’].sum()
C. df[df[Mid-Career Median Salary'] > 100000][‘School Name'].size()
D. df[df['Mid-Career Median Salary'] > 100000].sum()
Ans: A
count() - counts non-null values
size() - counts null values also
sum() - sum of all values in each column of the dataset
11. Which of the following statements should be used to find out the total numbers of
university names which contain the word "State'?
A. df['School Name’].str.contains('State').count()
B. df['School Name’].str.contains('State').sum()
C. df['School Name’].contains('State') count()
D. df['School Name'].contains('State').sum()
Ans: A
12.
A. Df
B. df.loc[([2013, 2016], 1),:]
C. df.loc[(2013, 2016)’ 1),:]
D. Df.loc[2013, 2016, 1]
Ans : B
13.
Ans: A
14.
A. df.dropna(thresh = 1)
B. df.dropna(thresh = 2)
C. df.dropna(thresh = 3)
Ans: C
15. For the DataFrame df shown in the left below, which statement from the following should
be used to obtain the result shown in the right?
A. df.fillna(method = ‘ffill’)
B. df.fillna(method = ‘bfill’)
C. df.fillna(0)
D. df.fillna({1:0.3, 3: 0})
Ans: D
16.
A. df.unstack()
B. df.stack()
C. df.reindex()
D. df.set_index()
Ans: B
17.
A. df.set_index([‘date’, ‘item’])
B. df.reindex([‘date’, ‘item’])
C. df.reset_index([‘date’, ‘item’])
Ans : A
19.
pd.concat([df5, df6], join = ‘inner’)
# This method concatenates two data frames. The join parameter specifies to use
only the common columns (i.e., an inner join).
pd.concat([df5, df6])
# using all the columns from both data frames (i.e., the union of their column labels).
By default, it uses an outer join
df5.append(df6z)
Ans : A
20.
pd.merge(df1, df3).drop(‘name’,axis=0)
# merges by performing an inner join on any columns with matching values. Then it
drops the 'name' column from the merged data frame along the rows (axis=0).
Ans: C
21.
Ans : B (write in exam because sir told this example for B option in class)
Ans : B
23.
In a confusion matrix for a classification problem, the rows correspond to the true classes
and the columns correspond to the predicted classes. The elements of the matrix represent
the number of samples that belong to the true class and were predicted to belong to the
predicted class.
24.
Ans: B
Ans: A
26.
Ans: D
27.
data[‘age’].describe()
data[‘age’].count()
data[‘age’].size()
data[‘age’].value_counts()
Ans: D
28.
data[‘rating’].plot.box()
data[‘rating’].value_counts().plot.box()
data[‘rating’].value_counts().plot.bar()
data[‘rating’].plot.bar()
Ans: C
29.
data.pivot_table(index = ‘title’, columns = ‘gender’, aggfunc = ‘mean’)
data.pivot_table(‘rating’, index = ‘title’, columns = ‘gender’, aggfunc =’mean’)
data.pivot_table(‘(‘rating’, index = ‘gender’, columns=’title’, aggfunc = ‘mean’)
data.pivot_table(index = ‘gender’, columns = ‘title’, aggfunc = ‘mean’)
Ans: B
30.
Ans: A