An Efficient Method For High
An Efficient Method For High
Phrase Mining
Abstract
Existing System
Proposed System
We propose a novel topical phrase mining method CQMine. Our method could
achieve a better performancethan state-of-the-art methods in terms of phrase
quality and topical cohesion. In order to effectively and efficiently mine topical
phrases and improve phrase quality and topical cohesion, we propose a Cohesive
and Quality Topical Phrase Mining (CQMine) framework, which automatically
clusters documents with a more sensible topic model, and improves the quality of
phrases by adopting more accurate and rigorous mining approaches.
We propose effective and efficient quality phrase mining approaches. By
eliminating order sensitive andavoiding inappropriate segmentation, our
approaches could guarantee the quality of extracted phrases. Moreover, we also
design effective algorithms to accelerate the processing.We propose a novel topic
model to address topic assignment problem associated with idiomatic phrases
toimprove the cohesion of topical phrases.
Considering the fact that some phrases are only valid in certain domains, we
propose an iterative framework tofacilitate more accurate domain terminologies
finding. Experimental evaluation and case study demonstratethat our method is of
high interpretability and efficiency compared with the state-of-the-art methods.
FutureWork
Modules
News Publisher
News publisher provides the news articles on daily basis, breaking news; live news
etc. news data are stored in database. Offering the services to the end users. News
Recommendation system publish the news articles based on categories. News
Publisher search the news topics randomly whether the articles are displaying
related to category. Users Registered in news portal to view the news articles, once
read the article can also to comment the article and shared to others
Examined the effectiveness of our quality phrase mining stage by measuring the
phrase quality in two metrics: (1) Wiki-phrases benchmark and (2) Expert
Evaluation. Wiki-Phrases: Wiki-phrases is a collection of popular mentions of
entities by crawling intra-Wiki citations within Wiki content. Wiki phrases
benchmark provides a good coverage of commonly used phrases which could
avoid the variance caused by different human raters. In this evaluation,we regarded
Wiki phrases as ground truth phrases. That is to belongs to/not belongs to Wiki
phrases. To compute precision, only the Wiki phrases are considered to be positive.
For recall, we firstly mergedall the phrases returned by all methods including ours,
and then we obtained the intersection between the Wiki phrases and the merged
phrases as the evaluation set.
In the CQMine framework the quality phrase mining stage contains three steps:
Firstly, a PhraseTrie is built to count all possible phrases’ frequencies. Then, a
complete phrase mining algorithm is applied to mine complete phrases, which will
be under the guidance of a statistics-based measurement to satisfy phraseness
criterion. During phrase mining, the mined phrases are stored inPhraseTrie to avoid
recomputing duplicate phrases. Finally, to guarantee the appropriateness
requirement, for each document, CQMine needs to check if it contains overlapping
phrases, if so, we will partition them into non-overlapping phrases by utilizing an
effective and efficient overlaping phrases segmentation algorithm. After quality
phrase mining, a document is transformed from a multiset of words (bag-of-words)
into a multiset of phrases (bag-of-phrases) which will be taken as the input of topic
modeling.
Significant progresses have been made on the topical phrase mining and they can
be broadly classified into three types:
Algorithm
The completeness of extracted phrases highly depends on the merge order. In order
to obtain the complete phrases, we need to enumerate every possible merge order.
Obviously, a straight-forward algorithm of finding the complete phrases in
document d is: enumerating all the subsequences of this document first, then verify
whether each one is a complete phrase.The algorithm QBA (q-Chunk Based
Approach) firstly generates boundaries It then computes the local solution of each
chunk using DPBA denote the left boundary of current chunk. For each boundary
algorithm QBA checks whether satisfies merge condition.
SYSTEM REQUIREMENTS
➢ RAM - 4 GB (min)
➢ Hard Disk - 40 GB
➢ Monitor - SVGA
Software Requirements:
Operating System - Windows XP or Later Version
Coding Language - Java/J2EE(JSP,Servlet)
Front End - J2EE
Back End - MySQL