0% found this document useful (0 votes)
7 views32 pages

Data Analyst面试指南

This document serves as a comprehensive handbook for aspiring data analysts, detailing job responsibilities, essential skills, and interview preparation strategies. It outlines the data analyst's role in data collection, modeling, analysis, and visualization, while emphasizing the importance of statistical knowledge and practical experience. Additionally, it provides a list of recommended books, video resources, and common interview questions to aid candidates in their preparation.

Uploaded by

Sylvia WEI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views32 pages

Data Analyst面试指南

This document serves as a comprehensive handbook for aspiring data analysts, detailing job responsibilities, essential skills, and interview preparation strategies. It outlines the data analyst's role in data collection, modeling, analysis, and visualization, while emphasizing the importance of statistical knowledge and practical experience. Additionally, it provides a list of recommended books, video resources, and common interview questions to aid candidates in their preparation.

Uploaded by

Sylvia WEI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

Offerrealize

Data analyst
面试必备指南

一本超级实用的 Handbook

求 职 信 息

超 全 资 料

面 试 辅 助
Data analyst

目 录
岗 位 详 解
考 察 概 念
相 关 书 籍
视 频 材 料
面 试 真 题
Data analyst

岗 位 详 解
Data analyst

Data Analyst(DA)的工作主要是从大量数据中整理、分析和研究,以
获取有价值的信息。虽然 BA 和 DA 都涉及数据,但它们的用途不同。
BA 致力于为企业提供战略建议,而 DA 则为公司决策提供数据支持
和参考。

无论在哪个行业,基本技能要求基本相同。首先是对业务的理解;其
次是熟练掌握数据分析工具;第三是良好的数据分析能力。区别主要
体现在所需附加技能,这些技能会因所处行业而异。
Data analyst

DA 的日常工作

DA 的工作其实就是一直在和数据“打交道”,具体数据流程包括:

 数据收集:从各种来源收集数据,包括网络公开信息源、内部数据
和第三方数据源等
 数据建模和处理:利用 Python、SQL 等工具设计收集、存储和处
理数据的方法和流程。
 数据清理:整理数据集,删除重复数据为后续分析做准备
 数据分析:进行广泛且全面的多维度分析,包括探索性、描述性、
诊断性和预测性分析等,以揭示数据背后的洞见和趋势
 数据可视化:利用工具如 Tableau 创建数据仪表板或报告,以图表
和可视化方式呈现数据,为 BA 进行商业判断和决策提供支持。
Data analyst

考 察 概 念
面试考察技能
Data analyst

面试考察概念

Data Analyst 岗位对于候选人的要求涵盖多个方面。

在教育背景方面,需要掌握数学、统计学和概率论等基础课程,尤其
需要对统计学基础有深入了解,比如 p 值、t 检验和显著性等。同时,
熟练掌握 SQL 技能也是必不可少的技能要求。

其次,项目经验者需要准备 STAR 法则下的问题解答,强调项目目标、


角色、遇到的困难及解决方式等。而没有项目经验的申请者,提供作
品集展示能力会是一个加分项。

在数据分析能力方面,熟悉常用分析方法、AB Testing、数据异常分
析等至关重要。还需要了解数据分析的方法和模型,包括指标拆解和
用户画像等。

认知层面也是考察的重点,包括个人成长、对公司文化的理解和适应
以及在工作中所持的态度和价值观。清晰、逻辑和整洁地展现这些方
面的能力和认知,对于面试成功至关重要。
Data analyst

相 关 书 籍
面试相关书籍

1.A General Introduction to Data Analytics

Author: João Moreira, Andre Carvalho, Tomás Horvath

【推荐理由】:

这本书是一份关于 DA 的重要指南,用通俗易懂的语言编写,无需读

者具备深入的统计学或编程知识。作者专注于解释基本 DA 技术,提

供了多个练习和示例。书中详细阐述了 DA 的动机,并介绍了数据可
Data analyst

视化、总结方法,以及发现数据集中组和模式的技巧。同时包含了分

类、回归等预测任务的内容。最后,书中涵盖了网络挖掘、信息检索、

社交网络分析、文本处理和推荐系统等流行的 DA 应用。
Data analyst

2.SQL Cookbook: Query Solutions and Techniques for All SQL

Users

Author: Anthony Molinaro, Robert de Graaf

【推荐理由】:

这本书的目标是为数据库开发人员提供 SQL 查询的实用解决方案和

技术。它不仅包含 SQL 的基础知识,还涵盖了复杂查询和问题的解决

方法,适合那些希望在实际工作中运用 SQL 的开发人员。

它重点在于提供大量 SQL 查询示例和技巧,帮助读者学习处理各种类

型的数据查询和操作。涵盖了多种数据库管理系统(如 MySQL、
Data analyst

Oracle、SQL Server 等),并提供了针对不同开发人员层次的实用

建议。这本书的内容旨在帮助开发人员更深入地理解和运用 SQL,从

而更有效地处理实际工作中遇到的挑战。
Data analyst

3.Python for Data Analysis: Data Wrangling with Pandas,

NumPy, and IPython

Author: Wes McKinney

【推荐理由】:

这本书是针对初学者的实用 Python 入门指南,特别适合那些希望利

用 Python 操作、处理、清理和压缩数据集的人群。这一版的实践指

南已更新至 Python 3.6,并涵盖大量实用案例,展示了如何有效解决

各种数据分析问题。通过学习,您将掌握最新版本的 pandas、

NumPy、IPython 和 Jupyter。
Data analyst

视 频 材 料
https://www.youtube.com/watch?v=Y6175TGFuMI
Data analyst

面 试 真 题
1. How Would You Define a Good Data Model?

A good data model exhibits the following:

· Predictability: The data model should work in ways that

are predictable so that its performance outcomes are always

dependable.

· Scalability: The data model’s performance shouldn’t

become hampered when it is fed increasingly large datasets.

· Adaptability: It should be easy for the data model to

respond to changing business scenarios and goals.

· Results-oriented: The organization that you work for or its

clients should be able to derive profitable insights using the

model.

2. What Is Collaborative Filtering?

Collaborative filtering is a kind of recommendation system

that uses behavioral data from groups to make

recommendations. It is based on the assumption that groups

of users who behaved a certain way in the past, like rating a

certain movie 5 stars, will continue to behave the same way in


Data analyst

the future. This knowledge is used by the system to

recommend the same items to those groups.

3. What Is Data Wrangling?

Data wrangling is the process of taking raw data and cleaning

and enriching it so that it can be analyzed easily to generate

trends and patterns. This process makes all downstream uses

of data a lot more efficient.

4. What Is Time Series Analysis?

Time Series Analysis is a data analysis approach that analyzes a

data set over certain intervals of time. It can be especially

valuable in areas where tracking data over time can unearth

valuable insights. For example, a time series analysis of

COVID-19 can help us see trends in the way the disease has

spread.

5. What Is the Difference Between Time Series Analysis and

Time Series Forecasting?

Time series analysis simply studies data points collected over a

period of time looking for insights that can be unearthed from


Data analyst

it. Time series forecasting, on the other hand, involves making

predictions informed by data studied over a period of time.

6. What Is Clustering? List the Main Properties of

Clustering Algorithms.

Clustering is the technique of identifying groups or categories

within a dataset and placing data values into those groups,

thus creating clusters.

Clustering algorithms have the following properties:

 Iterative

 Hard or soft

 Disjunctive

 Flat or hierarchical

7. What Is Univariate, Bivariate, and Multivariate Analysis?

Univariate analysis is when there is only one variable. This is

the simplest form of analysis like trends, you can’t perform

causal or relationship analysis this way. For example, growth in

the population of a specific city in the last 50 years.


Data analyst

Bivariate analysis is when there are two variables. You can

perform causal and relationship analysis. This could be the

gender-wise analysis of growth in the population of a specific

city.

Multivariate analysis is when there are three or more variables.

Here you analyze patterns in multidimensional data, by

considering several variables at a time. This could be the break

up of population growth in a specific city based on gender,

income, employment type, etc.

8. What Is a Pivot Table?

A pivot table is a data analysis tool that sources groups from

larger datasets and puts those grouped values in a tabular

form for easier analysis. The purpose is to make it easier to

find figures or trends in the data by applying a particular

aggregation function to the values that have been grouped

together.

9. What Is Logistic Regression?

Logistic regression is a form of predictive analysis that is used

in cases where the dependent variable is dichotomous in

nature. When you apply logistic regression, it describes the


Data analyst

relationship between a dependent variable and other

independent variables.

10. What Is Linear Regression?

Linear regression is a statistical method used to find out how

two variables are related to each other. One of the variables is

the dependent variable and the other one is the explanatory

variable. The process used to establish this relationship

involves fitting a linear equation to the dataset.

11. What Is the Role of Linear Regression in Statistical Data

Analysis?

Linear regression is a powerful technique within statistical data

analysis. It helps you establish relationships between different

variables, which is very handy in evaluating business

outcomes.

Consider an example where a credit card company wants to

know which factors lead to customers defaulting on payments.

Applying linear regression can help the company zero in on

the characteristics of defaulters, and thus help the company

improve the profile of its clients.


Data analyst

12. What Do You Mean by Hierarchical Clustering?

Hierarchical clustering is a data analysis method that first

considers every data point as its own cluster. It then uses the

following iterative method to create larger clusters:

 Identify the values, which are now clusters themselves,

that are the closest to each other.

 Merge the two clusters that are most compatible with

each other.

13. How Do You Tackle Missing Data in a Dataset?

There are two main ways to deal with missing data in data

analysis.

Imputation is a technique of creating an informed guess about

what the missing data point could be. It is used when the

amount of missing data is low and there appears to be natural

variation within the available data.

The other option is to remove the data. This is usually done if

data is missing at random and there is no way to make

reasonable conclusions about what those missing values

might be.
Data analyst

14. What Are the Different Data Validation Methods in

Data Analytics?

There are a few methods used to validate the data in a dataset.

The includes:

 Field-level validation: Correcting data as it is entered into

the appropriate fields in a dataset.

 Form-level validation: The data entered by a user is

validated in real-time and any erroneous data is flagged so

that the user can correct it.

 Data saving validation: This involves validating the data in

a database whenever it is saved.

 Search criteria validation: This validation technique is used

when the results of a user’s query need to be highly relevant.

The search criteria is validated so that the most relevant results

of a query can be returned.

15. What Is an N-Gram?

An n-gram is a method used to identify the next item in a

sequence, usually words or speech. N-grams uses a

probabilistic model that accepts contiguous sequences of


Data analyst

items as input. These items can be syllables, words, phonemes,

and so on. It then uses that input to predict future items in the

sequence.

16. What Is the Difference Between Variance, Covariance,

and Correlation?

Variance is the measure of how far from the mean is each

value in a dataset. The higher the variance, the more spread

the dataset. This measures magnitude.

Covariance is the measure of how two random variables in a

dataset will change together. If the covariance of two variables

is positive, they move in the same direction, else, they move in

opposite directions. This measures direction.

Correlation is the degree to which two random variables in a

dataset will change together. This measures magnitude and

direction. The covariance will tell you whether or not the two

variables move, the correlation coefficient will tell you by what

degree they’ll move.

17. What Is a Normal Distribution?

A normal distribution, also called Gaussian distribution, is one

that is symmetric about the mean. This means that half the
Data analyst

data is on one side of the mean and half the data on the other.

Normal distributions are seen to occur in many natural

situations, like in the height of a population, which is why it

has gained prominence in the world of data analysis.

18. Do Analysts Need Version Control?

Yes, data analysts should use version control when working

with any dataset. This ensures that you retain original datasets

and can revert to a previous version even if a new operation

corrupts the data in some way. Tools like Pachyderm and Dolt

can be used for creating versions of datasets.

19. Can a Data Analyst Highlight Cells Containing Negative

Values in an Excel Sheet?

Yes, it is possible to highlight cells with negative values in

Excel. Here’s how to do that:

1. Go to the Home option in the Excel menu and click on

Conditional Formatting.

2. Within the Highlight Cells Rules option, click on Less Than.

3. In the dialog box that opens, select a value below which

you want to highlight cells. You can choose the highlight color

in the dropdown menu.


Data analyst

4. Hit OK.

You will see that all values below the one you entered have

been highlighted in the Excel sheet.

20. How Do You Differentiate Between a Data Lake and a

Data Warehouse?

A data lake is a large volume of raw data that is unstructured

and unformatted. A data warehouse is a data storage structure

that contains data that has been cleaned and processed into a

form where it can be used to easily generate valuable

insights.

21. How Do You Differentiate Between Overfitting and

Underfitting?

Underfitting and overfitting are both modeling errors.

Overfitting occurs when a model begins to describe the noise

or errors in a dataset instead of the important relationships

between data points. Underfitting occurs when a model isn’t

able to find any trends in a given dataset at all because an

inappropriate model has been applied to it.

22. How Many X Are in Y Place?


Data analyst

This question takes many forms, but the premise of it is quite

simple. It’s asking you to work through a mathematical

problem, usually figuring out the number of an item in a

certain place, or figuring out how much of something could

potentially be sold somewhere. Here are some real examples

from Glassdoor:

 How many piano tuners are in the city of Chicago? ”

(Quicken Loans)

 “How many windows are there in New York City, by your

estimation?” (Petco)

 “How many gas stations are there in the United States?”

(Progressive)

The idea here is to put you in a situation where you can’t

possibly know something off the top of your head, but to see

you work through it anyway. Basically, you want to pull the

data you do have, or at least can approximate, and work

yourself through a solution. Let’s take the number of windows

in New York City as an example for the sample answer below.


Data analyst

Note: Figures in this answer do not necessarily realistically

reflect facts; they are approximations (there are actually 8.6

million people in NYC, according to 2017 data, for example).

Sample answer: I believe there are about 10 million people in

New York, give or take a couple million. Assuming each of

them lives in a residential building, with three rooms or more,

if there were one window per room, that would make

approximately 30 million windows. I’m making a few different

assumptions that are probably inaccurate. For instance, that

everyone lives alone and that the average size of their

residences is just three rooms with one window per room.

Obviously, there will be a lot of variations in reality. But I think,

in terms of residences, 30 million windows could be closed.

Then you’d have to take windows for businesses, subway rail

cars, and personal vehicles. If the average subway car seats

1,000 people, with 1 window per 2 seats, that’s 500 windows

per car. A little more math: I’d guess there are at least enough

subway cars to support the whole population of New York: so

10 million divided by 1,000 comes out to 10,000. So there are

another 5 million windows for subway cars. If half of all people

own their own vehicle, that’s another six windows per person,

so 30 million more windows. I’d guess there are at least


Data analyst

100,000 businesses with windows in NYC. Let’s just say for the

sake of argument there’s an average of 10 windows each.

That’s another million. I’m sure there’s way more than that.

Overall, we’re at 66 million windows (30,000,000 x 2 +

5,000,000 + 1,000,000). All of this pretty much hinges on how

close I am to the actual population of New York City. Also,

there are other places to find windows, such as buses or boats.

But that’s a start.

23. You Have 10 Bags of Marbles With 10 Marbles in Each

Bag. All but One Bag Has Marbles Which Weigh 10g Each.

The Exception’s Marbles Weigh 11g Each. How Would You

Determine Which Bag Has 11g Marbles Using a Scale Only

Once? (Google)

This question would be really difficult to figure out on the spot.

Fortunately, it’s a puzzle with answers all over the place

online.

The identifying factor for each of these bags of marbles is

weight; fortunately, we have only one different bag.

Unfortunately, we only have one chance to weigh, so we

couldn’t just weigh each bag individually.


Data analyst

Instead, we can solve the problem if we put a different number

of marbles from each bag into a new bag to weigh it and

reverse engineer the identity of the heavier bag.

Let’s take 1 marble from the first bag, 2 from the second bag,

3 from the third bag, and so on. This way each bag we’ve

drawn from is uniquely identifiable by the number of marbles

missing. I’ve used my kindergarten-level illustration skills to

draw this process.

The total number of marbles in the bag can be calculated now

using the series sum formula alluded to in question 5:

n(n+1)/2. If we plug the numbers in, we should get 55. Now

we have to multiply it by the weight of each marble, which is

10g. That means the total weight of the marbles should be

550g, in a perfect world.

But we’re not in a perfect world. One of these bags is different.

Let’s say, for argument’s sake, the third bag is the one that

has the heavier 11g marbles. The weights would look like this:

10, 20, 33, 40, 50, 60, 70, 80, 90, 100. If you weighed this, in

total, it would add up to 553. Clearly, one of these bags has

botched things up. To find out which one, we can subtract 550

from 553, getting 3. In other words, the third bag is the odd

one out. The formula, then, would look like this: W –


Data analyst

w(n(n+1)/2), where W = total weight and w = weight of each

marble (except the odd ones).

Note that we’ve labeled the bags 1-10 based on the number

of marbles taken from it. The difference won’t necessarily be

this number, however. If the bag were more than 1g heavier or

lighter, we’d have to do more math. Say, for example, the odd

marbles weighed 12g instead; the difference would have been

6. This still points to the third bag because we know that the

odd marbles are 2g heavier than the other marbles. If we

divide 6 by 2, we get 3.

24. Introduce Yourself.

This question is your opportunity to give the recruiter your

elevator pitch. It’s an open-ended question, but you don’t

want to ramble on about your background and achievements.

Start by giving the recruiter your name and your academic

background. Then talk about what got you interested in the

field. Finish off with any certifications or interesting projects

that you’ve worked on to show your proficiency in the field.

Make each of those parts of the answer brief, between one

and two sentences.


Data analyst

25. What Do You Know About Data Analytics?

The purpose of this question is to gain an insight into your

understanding of the field in a broad sense. Talk about data

analytics in terms of its purpose in a business context and

what it can help organizations achieve. Don’t wade too deep

into the weeds; stick to explaining the importance of being

able to process and interpret data the right way and how you

approach those things.

26. Why Did You Opt for a Data Analytics Career?

This is your chance to slip into storytelling mode a little bit.

Recruiters like when you can talk passionately about the field

you’re working in and have personal reasons for why you

want to work in it. Describe how you got interested in data

analytics and the reasons for wanting to work in the field.

As much as possible, stay away from generic reasons for being

interested in data science. Go into your own journey: how you

heard about it, the resources you used to study different

aspects of the field, and the work that you have done.

27. What Is the Most Challenging Project You Encountered

on Your Learning Journey?


Data analyst

Recruiters ask this question to understand your

problem-solving approach and ability to take the initiative on

projects.

Answer by throwing back to a specific project that you worked

on, starting with the goal of the project and its business

context. Then talk about what problems emerged that made it

challenging. Most importantly, talk about how you solved

those problems, including details about both your own

contributions as well as how you rallied your team around

you.

28. Situational Question Based on the Resume

There are some questions that will emerge in response to

specific pieces of information in your resume.

One common one is regarding gaps in a resume. If you have

gaps in your resume, then give recruiters an honest answer

about what caused it. You don’t want to go into too much

detail. Simply explain what caused the gap and how you

picked up where you left off in your data analytics journey.

29. How Do You Prepare for a Data Analyst Interview?


Data analyst

The first thing that you need to do to prepare is to understand

what the company you’re applying to is trying to achieve with

its data analysis efforts. Recruiters are quickly impressed when

you show an understanding of the organizational context

you’ll be working in.

After that, focus on your skills in regard to three things: data

analysis math and statistics, data analysis approaches, and

data analysis tools. Finally, attempt practice questions like the

ones we’ve covered here.

30. How Should You Answer “Why Should We Hire You as

a Data Analyst?” During an Interview?

This question is your opportunity to show that you can

contribute to the company in meaningful ways and fit in with

the ethos of the organization.

Answer this question by first talking about what you

understand about the organization’s business goals. For

example, you might say something like, “Your company is

currently looking to use data analysis to inform which new

customer categories it targets with its marketing efforts.”

Then go into details about how your skills can contribute to

the operation.
Data analyst

Just the fact that you’ve done your research in this manner is

sure to impress recruiters. It is evident that you’re able to

gather information and deduce what the company’s goals are

based on what you find.

From there on in, you need to convince recruiters that you

have the skills to fulfill your responsibilities within the

organization. Any projects that you’ve done previously that

might be similar to what you will be working on is worth

mentioning here. Talk about the project in terms of its goals

and how you contributed to it within your team.

It helps to talk about the process that you use to translate

business goals into requirements for a data analysis project.

How do you determine what data points are important? How

will you source that data? How will you store the data and

what kind of operations do you think are important to conduct

on them? Going over these details is an important step to

establish that you can add value to a company as a data

analyst.

Cultural fit has also become an important consideration for

hiring managers. Look out for the soft skills mentioned in the

job description and connect them to your own strengths. For

example, if the company says it’s looking for good


Data analyst

collaborators, you can include details on how you make

teamwork part of your process and bring various stakeholders

on board. Most importantly, convey a passion for your field of

work and the company that you’re looking to work in.

You might also like