CS PS
CS PS
我是一个乐于学习,兴趣优先的人。我的学习方式即是一个不断搜索和发现兴趣点的过程。
对于人工智能产生浓厚兴趣是因为姚期智教授的一门 Quantum Computing 课程,我接触到
量子计算和量子信息的内容,其中的某些想法,比如:一个量子比特常常能表达很多信息
如果能够利用量子传输,然后在单机上转换成 bit 操作,传输的成本会大大降低。类似的想
法如果能够实现的话,将带来计算机领域的又一场革命。我了解到量子和 AI 的相通之处,
量子态(一个 Qbit)是一个不确定的态,我们测量的时候会以一定概率确定下来,但是没
有测量的时候是存在在那里的。这让我不禁联想到自己在 2008 年队列程序设计大赛中做的
AI 预测项目,这个量子的测量过程跟我所做的 AI 预测过程相同,在竞赛中,我们常常先选
一些 feature,然后每个 feature 的 precision 和 accuracy 都不是 1,甚至只有 0.1,也就是说
每 个 都 是 概 率 有 效 的 , 然 后 我 们 通 过 一 些 算 法 , 使 得 最 终 得 到 的 结 果 的 precision 和
accuracy 都比较高,然后获取那些有效的部分。之后,我查阅了量子计算与量子信息与 AI 结
合的科研现状,虽然这个命题至今还没有应用上有实质性的突破,但却让我对 AI 的研究产
生了浓厚的兴趣。
作为一名姚班的学生,我获得了比其他本科生更多的机会来学习计算机领域最基础和最新
的理论,以及最前沿的科研方向。除了科学的课程设置,姚班灵活的项目安排让我有机会
参加了很多科研会议和讨论会,帮助教授做论文回顾,进入教授的实验室做研究,以及参
加各种充满挑战和趣味的竞赛活动,让我在计算机领域大开眼界,一展身手。比如我在
Professor Papakonstantinou 的 Advanced algorithms 的课程中学到了各种算法以及相关研究,
所 做 的 有 关 finite metric embeddings 和 optimization problems, online and approximation
algorithms 的 lecture notes 获得了老师的极大肯定。此外,在队列程序设计大赛中,我参与
设计了 AI 的程序设计,让我在 AI 的实现方面有了初步的训练。
姓名排岐这个项目的目的是将 CS 领域的重名的人区分开来,避免混淆。总共花费了我一个
学期的时间,在保证学业的同时,我必须按时完成实验室的任务,以免耽误整个团队的进
度,于是我减少了休息的时间,利用周末以及没有课的时间尽力完成任务。这个项目让我
收获很大,首先,我负责了姓名合并(把由于缩写、姓和名颠倒等原因造成的同一个人不
同名字的情况合并),由 citation 导致的可能的合并中涉及到的概念、运用的方法和最终的
Java 编程实现;其次,这个项目工作量很大,涉及的数据很多,而当时我对 Java 和数据库
等知识还不够熟悉,又要同时平衡课程和科研的时间,因此我不断地询问同项目组的师兄
师姐,模仿原有的代码。到了后期每跑一遍程序就要一两个小时,我必须谨慎地判断是不
是犯了一些小错误。最后,我在项目中学到了很多知识,比如 Java 编程,还有一些数据挖
掘的算法,更重要的是我掌握了做科研的流程,知道要改进 performance 要不断进行错误
分析,也知道了看上去很 trivial 的东西可能会很复杂。
今年暑假,我离开知识工程实验室来到百度实习,让我进一步接触到现实中的各种数据和
需要处理的实际问题,让我有机会在实际问题中应用到数据挖掘和数学建模的各种方法。
百度作为全球最大的中文搜索引擎,每时每刻都能接收到庞大的数据。针对在百度搜索中
的各种作弊现象,我主要做了三方面的调研工作,一是,对于新 id 作弊情况的调研;二是,
百度风云榜的调研;三是,外链质量的调研;总的工作是通过对百度每天新建 ID 的数据进
行分析,然后针对可能的作弊 ID 的特征,给出各种过滤的策略,比如第一部分对于新 id 作
弊情况的调研,我给出了两种策略:一是,做出一个极端的选择,过滤掉所有新 ID,并构
建数学模型来分析这种策略造成的影响,然后根据分析的结果,给出第二种合理的策略:
加载一个白名单表,过滤掉所有不是特别好的 id,从而帮助百度屏蔽掉一些作弊的 ID 来保
证搜索结果和搜索排行能更准确更真实地反应用户的行为。
现在我的兴趣点在人工智能领域,特别是希望能在数据挖掘、知识工程等方面能有机会做
更深的研究,同时我对 AI 的其他领域也有很大的兴趣,比如遗传编程,遗传算法,人工神
经网络等,我希望通过贵校的硕士项目,让我能了解更多的人工智能研究,接受更系统的
训练,让我的思维和眼界更开阔。更重要的是,我非常清楚贵校只接收全世界最优秀的学
生,而我从小到大一直都是中国最优秀的学生中的一员,因此我非常希望自己还能有机会
与世界上最聪明最有创意的一群人一起工作。在获得硕士学位后,我希望还能在贵校的 phd
的项目中做进一步的科研工作。
Keen on learning as well as faithful to interests, I have built my learning framework squarely on
persistently exploring and discovering what most intrigues me. My fascination with Artificial
intelligence arose from taking the course Quantum Computing by Professor Andrew Chi-
Chih Yao, where I set foot in the realm of quantum information and became enchanted about the
idea that information may possibly be conveyed via quantum-capable of representing a great deal
of information- and converted into bitwise operation. Noting that the cost of transmission would
be greatly reduced, a successful implementation of it may even usher in an outright revolution in
this computer world. From this course, I came to see a connection between quantum and AI: The
fact that Qbit is an uncertain state that settles with a probability in any measurement might be
associated with the AI forecasting project I participated in Queue Programming Contest 2008.
During the course of the competition, our team usually picked features with precision and
accuracy far less than 1-even as small as 0.1-and later managed to enhance the precision and
accuracy in final results from which the effective proportion was drawn. While I learned later on
from literature that the research bonding quantum computing & information with AI has yet to
make breakthrough in application, this sparked my sharp interest in AI research.
I have been engaged in 3 major projects since I joined the Knowledge Engineering Group led by
Professor Jie Tang, which enables me to reach a deep understanding of knowledge engineering,
intelligent searching and intelligent text processing.
The project of Name Disambiguation was to differentiate scholars with identical names in CS field
and thus to prevent such name ambiguity from arising. I was responsible for merging cases where
abbreviation or surname & given name reversal leads to different names for one person. Apart
from working on the concepts and methodologies on potential mergers arising from citation, I also
took on Java language implementation at the final stage. In the project, I had to deal with a large
volume of data while still being a green hand on database as well as on java language at the very
beginning. Overwhelming as it stood, I hit my stride fairly fast with the help from senior members
in the team. Starting off with imitating the original code, I became able to create my own one.
Since the program may take hours to run, I had to become meticulous in avoiding mistakes. While
the project took me an entire semester when I was hectic all along and committed almost all my
extracurricular time to this research, it turned out to be very rewarding: I learned how to prioritize
and to balance research against coursework, I mastered the Java language, and I obtained a good
command a set of algorithms on data mining. However, what may benefit me uppermost is that I
get to know how research should progress forward, to know how performance can be enhanced
through continued error analysis, and to know how complicated things may turn out to be no
matter how trivial it looks at the first glance.
In the project of Social Graph, we were to provide a solution based on a case study of the website
www.arnetminer.org to make users stick around longer on the website. We therefore studied
various facets that may potentially affect user searching behavior, dug at features that tend to
prolong their stay, and finally worked out mathematical models. Two prominent factors that we
have discovered so far are: strong social bond and celebrity effect. To be more specific, once a
person’s profile is clicked, his/her social network will be identified. If there happens to be an
acquaintance of the user or a celebrity in this person’s social network, the user tends to stay longer
while continuing to search.
This summer I temporarily left the knowledge engineering lab and took an internship in Baidu,
Inc, which offered me the opportunity to resolve real problems using the approaches I took away
from study on data mining and mathematical modeling. As the world largest Chinese language
search engine, Baidu receives and processes large scale data at every single moment and steps
need to be taken toward easing the ramping search engine cheating problems. With this regard, I
conducted a survey that dwelled on the following aspects: (1) cheating with new IDs; (2) Baidu
Billboard Ranking; (3) quality of external links. Based on the survey and the analysis on the
information of newly created IDs, I proposed a variety of filtering strategies starting with making
an extreme choice of filtering out all the new IDs. This indiscriminate filtering strategy was later
meliorated by examining its impact through construction of mathematic models and thereby gave
rise to a more sensible one: filtering out IDs that are not particular good through loading a
whitelist. This result enables Baidu to block cheating IDs and thus to ensure searching results and
ranking to truly reflect user behaviors.
I have now returned to the Knowledge Engineering Lab and embarked on another riveting project
on twitter behaviors: Which of your followee’s followees will you follow? That is to say , if user B
follows C,D and E on Twitter while A follows B, then among C,D and E, which is most likely to
be followed by A? Being a loyal twitter fan, I feel particularly fascinated about this topic. My
current work focuses on all possible contributory features such as time zone difference, mutual
friends, social balance, homophyly and implicit structure etc.
With my zealous enthusiasm toward artificial intelligence, combined with my various research
experiences, I have geared myself up for the more challenging graduate program. I have
particularly strong desire to delve deeper into data mining and knowledge engineering as well as
to try my hands in genetic programming & algorithm, neural network and other domains of AI.
Featured by eminent faculty and milieu replete with academic activities, the M .S. program in
Computer Science of your university will definitely bring out the best of me. Upon completing my
M.S. program, I wish to follow it up with a PhD, to pursue a career in academia. Coming along
with China’s most excellent student group, I dream about being able to work with the world’s
most creative research group.