0% found this document useful (0 votes)
305 views20 pages

Design and Implementation of Domestic News Collection System

This document proposes a domestic news collection system based on Python. The system seeks to gather news from certain websites and present it to users in a succinct and easy-to-understand format. Users can use specific keywords to find news that interests them, allowing for personalization. The proposed system uses CNN techniques, which provide higher accuracy and less time consumption compared to existing systems using supervised classifiers with smaller datasets and longer computation times. The feasibility study, system architecture, UML diagrams, modules, and requirements of the proposed news collection system are also outlined.

Uploaded by

Reddy Reddy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
305 views20 pages

Design and Implementation of Domestic News Collection System

This document proposes a domestic news collection system based on Python. The system seeks to gather news from certain websites and present it to users in a succinct and easy-to-understand format. Users can use specific keywords to find news that interests them, allowing for personalization. The proposed system uses CNN techniques, which provide higher accuracy and less time consumption compared to existing systems using supervised classifiers with smaller datasets and longer computation times. The feasibility study, system architecture, UML diagrams, modules, and requirements of the proposed news collection system are also outlined.

Uploaded by

Reddy Reddy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 20

DESIGN AND IMPLEMENTATION OF DOMESTIC

NEWS COLLECTION SYSTEM BASED ON PYTHON


ABSTRACT
By its speed and vast reach, network media has become a new window for individuals
to understand the outside world in the era of rapid Internet development. News is a way
for people to learn about what's going on in the world, but thousands of news stories are
published every day on the Internet. These updates are either required or not required on
the inside. A big necessity in people's lives is to figure out how to get the news items we
need from the website in a timely and correct manner.

This system seeks to gather news from certain websites and present it to users in a
succinct and easy-to-understand format.Users can use specific keywords to find news
that they are interested in, which allows for personalisation.
EXISTING SYSTEM
Different types of Supervised classifier is implemented in the existing model of
system.

Drawbacks :-

Small level data-set.

Time computation was more.


PROPOSED SYSTEM
Proposed system:-
we are using CNN techniques in these model of system.It out comes with the
best accurary score
Advantages :-
Accuracy level is good
Time Consumption is less
FEASIBILITY STUDY
• 1. First, we take dataset.

• 2. Filter dataset according to requirements and create a new dataset which has
attribute according to analysis to be done

• 3. Perform Pre-Processing on the dataset

• 4. Split the data into training and testing

• 5. Train the model with training data then analyze testing dataset over
classification algorithm

• 6. Finally you will get results as accuracy metrics


REQUIREMENTS
Hardware requirements​
Software requirements
• Python • RAM -- 4GB
• Anaconda • OS -- Windows 7, 8 and 10 (32 and
64 bit) ​
• Jupyter Notebook
• ​Hard Disk --20GB
• Key Board -- Standard Windows Key
board
• Mouse -- Two or Three Button
Mouse
• Monitor -- SVGA
MODULES
• Extendable: You can add low-level modules to the Python interpreter. These modules
enable programmers to add to or customize their tools to be more efficient.

• Databases: Python provides interfaces to all major commercial databases.

• GUI Programming: Python supports GUI applications that can be created and ported
to many system calls, libraries, and windows systems, such as Windows MFC,
Macintosh, and the X Window system of Unix.

• Scalable: Python provides a better structure and support for large programs than shell
scripting.
SYSTEM
ARCHITECTURE
UML DIAGRAMS
UML stands for Unified Modeling Language. UML is a standardized general-purpose modeling
language in the field of object-oriented software engineering. The standard is managed, and was
created by, the Object Management Group. The goal is for UML to become a common language
for creating models of object oriented computer software. In its current form UML is comprised
of two major components: a Meta model and a notation. In the future, some form of method or
process may also be added to; or associated with, UML. The Unified Modeling Language is a
standard language for specifying, Visualization, Constructing and documenting the artifacts of
software system, as well as for business modeling and other non-software systems. The UML
represents a collection of best engineering practices that have proven successful in the modeling
of large and complex systems.
USE CASE DIAGRAM
import module

• UML is a standard language for specifying,


Input image
visualizing, constructing, and documenting selection

the artifacts of software systems. Pre-process

• UML was created by Object Management segmentation


U
Group (OMG) and UML 1.0 specification s
e
r Feature extraction

draft was proposed to the OMG in January


1997. Data base

• OMG is continuously putting effort to make Data base


training

a truly industry standard.


Result

• UML stands for Unified modelling


Language.
CLASS DIAGRAM
The class diagram is the main building block of
object-oriented modelling. It is used for general user INPUT Feature extraction
Pre-
conceptual modelling of the systematic of the +i
process

m +p +f
roc ea
application, and for detailed modelling p
or es
s()
tur
e(
t(
) )

translating the models into programming code.


Data

Class diagrams can also be used for data Result Algorithm base

modelling. The classes in a class diagram +


b
+
d
u a

represent both the main elements, interactions in il


d
t
a
( (
) )
the application, and the classes to be
programmed.
SEQUENCE DIAGRAM

Sequence Diagrams Represent the objects participating user Pre-step


Feature
Data algorithm
the interaction horizontally and time vertically. A Use Extractio base
result

Case is a kind of behavioural classifier that represents a 1


import
:

declaration of an offered behaviour. Each use case ()


2 :
load
specifies some behaviour, possibly including variants ()
3
preprocessing
:

that the subject can perform in collaboration with one or () 4


operations
:

more actors. Use cases define the offered behaviour of ()


5 :
build
the subject without reference to its internal structure. ()

6 :
result
()
ACTIVITY DIAGRAM

Activity diagrams are graphical


Input

representations of Workflows of stepwise


activities and actions with support for choice, Pre-process

iteration and concurrency.In the Unified


modelling Language, activity diagrams can be
preprocess Feature extraction

used to describe the business and operational database

step-by-step workflows of components in a


system. An activity diagram shows the overall
Algorithm with
flow of control. result
TESTING
Software testing is an investigation conducted to provide stakeholders with information about the
quality of the product or service under test. Software Testing also provides an
objective, independent view of the software to allow the business to appreciate and understand
the risks at implementation of the software. Test techniques include, but are not limited to, the
process of executing a program or application with the intent of finding software bugs​
TEST CASES
Test case1:(packages testing)
Input: downloading packages in interactive mode
Output: importing packages in script mode

Test case2: (IDLE testing)


Input : user execution in IDLE
Output: Ip camera in command prompt

Test case3:(data process)


Input : load data
Output: load data and display data in output code
TESTING METHODS
Functional Testing: ​

Functional testing provide systematic demonstrations that functions tested are available as specified by the
business and technical requirements, system documentation, and user manuals. ​
Functions: Identified functions must be exercised. ​
Output: Identified classes of software outputs must be exercised. ​
Systems/Procedures: system should work properly ​

Integration Testing :​

Software integration testing is the incremental integration testing of two or more integrated software
components on a single platform to produce failures caused by interface defects. ​
Test Case for Excel Sheet Verification: ​
Here in machine learning we are dealing with dataset which is in excel sheet format so if any test case we
need means we need to check excel file. Later on classification will work on the respective columns
of dataset​
FUTURE ENHANCEMENT
Future enhancements for a domestic news collection system based on Python can involve
incorporating advanced technologies, improving user experience, and ensuring the system
remains relevant and efficient. Here are some enhancement theories for your domestic news
collection project

Utilize machine learning and NLP techniques to improve article recommendation systems.
Implement sentiment analysis to gauge public opinion on news topics. Develop algorithms for
content stigmatization and topic categorization
Enhance user profiles and recommendation engines. Implement user behavior tracking to provide
personalized news feeds based on reading history and preferences.
CONCLUSION
Thus we have tried to implement the paper “Haixia Lv”, “Design And Implementation Of Domestic News
Collection System Based On Python” published in Web crawler is an important way to obtain data from the
Internet. This paper designs a set of configurable news collection system based on web crawler, which can crawl
news from target news website. It can crawl a variety of multi-source data and the crawler is customized highly.
In addition, it can do corresponding processing to crawled news content in accordance with need. The system not
only reduces the difficulty of news editors, but also updates the news content in database real-time, improving
the efficiency of news gathering and publishing.

A domestic news collection project based on Python can be a valuable tool for keeping track of current events,
conducting research, or building data-driven applications. By identifying relevant sources, collecting and storing
data, cleaning and preprocessing it, and then analyzing and visualizing the information, you can stay informed
and gain insights into various aspects of domestic news.
REFARENCES
J. L. Zhang, “Design and Implementation of Intelligent News Collection and Processing
System,” Shandong University, 2017.
G. M. Yu, “Big data method and innovation in news communication: From theoretical definition
to operational route,” JAC Forum, vol. 266, No. 4, pp. 5-7, 2014.
S. Q. Long, Z. W. Zhao, H. Tang, “Chinese word segmentation Algorithm review,” Computer
Knowledge and Technology, vol.5, no. 10, pp. 2605-2607, 2009. [4] J. F. Hu, Y. B. Shen, “Web-
based news gathering system,” Computer Knowledge and Technology, vol.5, no.19, pp. 5111-
5113, 2009.
H. C. He, “Research and Implementation of Information Collection Technology in web mining
H. Zhang, “Keyword extraction algorithm based on automatic text Classification. Computer
Engineering,” vol. 35, no. 12, pp. 145-147,2009.

You might also like