Design and Implementation of Domestic News Collection System
Design and Implementation of Domestic News Collection System
This system seeks to gather news from certain websites and present it to users in a
succinct and easy-to-understand format.Users can use specific keywords to find news
that they are interested in, which allows for personalisation.
EXISTING SYSTEM
Different types of Supervised classifier is implemented in the existing model of
system.
Drawbacks :-
• 2. Filter dataset according to requirements and create a new dataset which has
attribute according to analysis to be done
• 5. Train the model with training data then analyze testing dataset over
classification algorithm
• GUI Programming: Python supports GUI applications that can be created and ported
to many system calls, libraries, and windows systems, such as Windows MFC,
Macintosh, and the X Window system of Unix.
• Scalable: Python provides a better structure and support for large programs than shell
scripting.
SYSTEM
ARCHITECTURE
UML DIAGRAMS
UML stands for Unified Modeling Language. UML is a standardized general-purpose modeling
language in the field of object-oriented software engineering. The standard is managed, and was
created by, the Object Management Group. The goal is for UML to become a common language
for creating models of object oriented computer software. In its current form UML is comprised
of two major components: a Meta model and a notation. In the future, some form of method or
process may also be added to; or associated with, UML. The Unified Modeling Language is a
standard language for specifying, Visualization, Constructing and documenting the artifacts of
software system, as well as for business modeling and other non-software systems. The UML
represents a collection of best engineering practices that have proven successful in the modeling
of large and complex systems.
USE CASE DIAGRAM
import module
m +p +f
roc ea
application, and for detailed modelling p
or es
s()
tur
e(
t(
) )
Class diagrams can also be used for data Result Algorithm base
6 :
result
()
ACTIVITY DIAGRAM
Functional testing provide systematic demonstrations that functions tested are available as specified by the
business and technical requirements, system documentation, and user manuals.
Functions: Identified functions must be exercised.
Output: Identified classes of software outputs must be exercised.
Systems/Procedures: system should work properly
Integration Testing :
Software integration testing is the incremental integration testing of two or more integrated software
components on a single platform to produce failures caused by interface defects.
Test Case for Excel Sheet Verification:
Here in machine learning we are dealing with dataset which is in excel sheet format so if any test case we
need means we need to check excel file. Later on classification will work on the respective columns
of dataset
FUTURE ENHANCEMENT
Future enhancements for a domestic news collection system based on Python can involve
incorporating advanced technologies, improving user experience, and ensuring the system
remains relevant and efficient. Here are some enhancement theories for your domestic news
collection project
Utilize machine learning and NLP techniques to improve article recommendation systems.
Implement sentiment analysis to gauge public opinion on news topics. Develop algorithms for
content stigmatization and topic categorization
Enhance user profiles and recommendation engines. Implement user behavior tracking to provide
personalized news feeds based on reading history and preferences.
CONCLUSION
Thus we have tried to implement the paper “Haixia Lv”, “Design And Implementation Of Domestic News
Collection System Based On Python” published in Web crawler is an important way to obtain data from the
Internet. This paper designs a set of configurable news collection system based on web crawler, which can crawl
news from target news website. It can crawl a variety of multi-source data and the crawler is customized highly.
In addition, it can do corresponding processing to crawled news content in accordance with need. The system not
only reduces the difficulty of news editors, but also updates the news content in database real-time, improving
the efficiency of news gathering and publishing.
A domestic news collection project based on Python can be a valuable tool for keeping track of current events,
conducting research, or building data-driven applications. By identifying relevant sources, collecting and storing
data, cleaning and preprocessing it, and then analyzing and visualizing the information, you can stay informed
and gain insights into various aspects of domestic news.
REFARENCES
J. L. Zhang, “Design and Implementation of Intelligent News Collection and Processing
System,” Shandong University, 2017.
G. M. Yu, “Big data method and innovation in news communication: From theoretical definition
to operational route,” JAC Forum, vol. 266, No. 4, pp. 5-7, 2014.
S. Q. Long, Z. W. Zhao, H. Tang, “Chinese word segmentation Algorithm review,” Computer
Knowledge and Technology, vol.5, no. 10, pp. 2605-2607, 2009. [4] J. F. Hu, Y. B. Shen, “Web-
based news gathering system,” Computer Knowledge and Technology, vol.5, no.19, pp. 5111-
5113, 2009.
H. C. He, “Research and Implementation of Information Collection Technology in web mining
H. Zhang, “Keyword extraction algorithm based on automatic text Classification. Computer
Engineering,” vol. 35, no. 12, pp. 145-147,2009.