0% found this document useful (0 votes)
26 views42 pages

News Document

Uploaded by

rajakarthik0118
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views42 pages

News Document

Uploaded by

rajakarthik0118
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 42

Design and Implementation of Domestic News Collection System

1.INTRODUCTION
1.1 Overview of the project

News is an important way to convey information. Among the tens of thousands of

news generated every day, obtaining effective news is an important objective. How to get

news conveniently and efficiently has become an important orientation. Nowadays, a full-

featured news-gathering platform has become more and more popular and has good

development prospects [1].This paper designs and develops a convenient automatic news-

gathering system. The system uses crawler analysis to collect domestic news, saves it after

deduplication, and finally provides news services for retrieving and viewing. It can help users

find similar news and extract hot news that users are interested in, and improve the efficiency

of readin news . News is an important way to convey information. Among the tens of

thousands of news generated every day,obtaining effective news is an important objective.

How to get news conveniently and efficiently has become an important orientation.

Nowadays, a full-featured news-gathering platform has become more and more popular and

has good development prospects [1].This paper designs and develops a convenient automatic

news-gathering system. The system uses crawler analysis to collect domestic news, saves it

after deduplication, and finally provides news services for retrieving and viewing. It can help

users find similar news and extract hot news that users are interested in, and improve the

efficiency of reading news [2].

Dept. of MCA Page | PAGE \* MERGEFORMAT 31


Design and Implementation of Domestic News Collection System

1.2 Project Description

The rapid development of the Internet, network media has become a new window for
people to understand the outside world due to its fast speed and wide spread. News is a
channel for people to know about Surrounding Information, but thousands of news are
produced every day on the Internet. These news are needed or not in inside. How to
efficiently and accurately obtain the news content we need from the website is a great need in
people's life.This system aims to collect news on specific websites and return it to users with
concise and clear pages. Users can search specific keywords to select news that they are
interested in so as to realize personalization for users. This system crawls and processes the
domestic financial news content, which is convenient for people to process the information.
In order to avoid duplication of information, the system has also implemented a self-defined
deduplication rule. In the specific implementation, the system is written using Python in
conjunction with the Scrapy framework and Django framework, which can simplify the
system code to a certain extent. The practical value of this system lies in the timely, efficient
and convenient access to domestic financial news that people care about, need and are
interested .

Dept. of MCA Page | PAGE \* MERGEFORMAT 31


Design and Implementation of Domestic News Collection System

2. LITERATURE REVIEW

2.1. Existing System


Among the tens of thousands of news generated every day,News is a channel for
people to know about Surrounding Information, but thousands of news are produced every
day on the Internet.How to efficiently and accurately obtain the news content we need from
the website is a great need in people's life.
2.1.1. Disadvantages

 Low Efficiency.
 We use Large amount of Code.
 Deduplication is not allowed

2.2 Proposed System

Designs and develops a convenient automatic news-gathering system.The domestic


financial news collection system based on python needs to realize the functions of crawling,
formatting, storing data, displaying data, operating data (viewing or deleting a news) of
various websites.Users can search specific keywords to select news that they are interested in
so as to realize personalization for users.Deduplication avoids repeated visits to web pages.

The goals that are achieved by the software are:

 Instant access.
 Improved productivity.
 Optimum utilization of resources.
 Efficient management of records.
 Simplification of the operations.
2.2.1. Advantages

 High Efficiency.
 Simplifies the code writing and improves Speed and efficiency of reptiles
 Deduplication is not allowed.

Dept. of MCA Page | PAGE \* MERGEFORMAT 31


Design and Implementation of Domestic News Collection System

3.SYSTEM DESIGN

Analysis is a logical process. The objective of this phase is to determine exactly what must be
done to solve the problem. Tools such as Class Diagrams, Sequence diagrams, data flow
diagrams and data dictionary are used in developing a logical model of system.

3.1 Software development lifecycle

A software life cycle model (also termed process model) is a pictorial and diagrammatic
representation of the software life cycle. A life cycle model represents all the methods
required to make a software product transit through its life cycle stages. It also captures the
structure in which these methods are to be undertaken.

A life cycle model maps the various activities performed on a software product from
its inception to retirement.

Fig3.1: SDLC Model

Dept. of MCA Page | PAGE \* MERGEFORMAT 31


Design and Implementation of Domestic News Collection System

There are different software development life cycle models specify and design, which are
followed during the software development phase. These models are also called "Software
Development Process Models." Each process model follows a series of phase unique to its
type to ensure success in the step of software development.

 Waterfall Model

 RAD Model

 Spiral Model

 Incremental Model

 Iterative Model

 Among all these models’ spiral model is the one of the best models.

Spiral Model
This SDLC model helps the group to adopt elements of one of more process models like a
waterfall, incremental. The spiral technique is a combination of rapid prototyping and
concurrency in design and development activities. Each cycle in the spiral begins with the
identification of objective for that cycle.

Fig3.2: Spiral Model

Dept. of MCA Page | PAGE \* MERGEFORMAT 31


Design and Implementation of Domestic News Collection System

3.2 Feasibility Study

All projects are feasible when provided with unlimited resources and infinite time
unfortunately, the development of computer-based system or product is more likely plagued
by a scarcity of resources and difficult delivery dates. It is both necessary and prudent to
evaluate the feasibility of a project at the earliest possible time. Months or years of effort,
thousands or millions of dollars, and untold professional embarrassment can be averted if an
ill-conceived system is recognized early in the definition phase.
Feasibility and risk analysis are related in many ways. If project risk is great the
feasibility of producing quality software is reduced. During product engineering, however,
we concentrate our attention on four primary areas of interest.
3.2.1 Technical Feasibility
This application in going to be used in an Internet environment called www (World Wide
Web). So, it is necessary to use a technology that is capable of providing the networking
facility to the application. This application as also able to work on distributed environment.

GUI is developed using HTML to capture the information from the customer. HTML
is used to display the content on the browser. It uses TCP/IP protocol. It is an interpreted
language.

3.2.2 Economical Feasibility

The economical issues usually arise during the economical feasibility stage are whether the
system will be used if it is developed and implemented, whether the financial benefits are
equal are exceeds the costs. The cost for developing the project will include cost conducts full
system investigation, cost of hardware and software forthe class of being considered, the
benefits in the form of reduced costs or fewer costly errors.
3.2.3 Operational Feasibility

In our application front end is developed using GUI. So, it is very easy to the customer to
enter the necessary information. But customer must have some knowledge on using web
applications before going to use our application.

Dept. of MCA Page | PAGE \* MERGEFORMAT 31


Design and Implementation of Domestic News Collection System

3.3 Requirements Analysis


A requirement is a relatively short and concise piece of information, expressed as a fact. It
can be written as a sentence or can be expressed using some kind of diagram.

3.3.1. Functional Requirement

Functional requirements describe what the system should do. The functional
requirements can be further categorized as follows:

1.What inputs the system should accept?


2.What outputs the system should produce?
3.What data the system must store?
4.What are the computations to be done?

The input design is the link between the information system and the user. It comprises the
developing specification and procedures for data preparation and the steps are necessary to
put transaction data in to a usable form for processing that can be achieved by inspecting the
computer to read data from a written or printed document or it can occur by having people
keying the data directly into the system.

Input Design considered the following things:

1.What data should be given as input?


2.How the data should be arranged or coded?
3.3.2. Non-Functional Requirements
Non-functional requirements are the constraints that must be adhered during development.
They limit what resources can be used and set bounds on aspects of the software’s quality.

3.3.3. User Interfaces

The User Interface is a GUI developed using python.

3.3.4. Software Interfaces

The main processing is done in python and console application.

3.3.5. Manpower Requirements

5 members can complete the project in 2 – 4 months if they work fulltime on it.

Dept. of MCA Page | PAGE \* MERGEFORMAT 31


Design and Implementation of Domestic News Collection System

3.4 Modules
3.4.1. Service Provider

In this module, the sp has to login by using valid user name and password. After login
successful he can do some operations such as View Flight Delay Data Set Details,Search &
Predict Flight Delay Data Sets,Calculate and View All Flight Delay Prediction,View All
Flights with No Delay,View All Remote Users,View Actual Flight Delay Results by Line
Chart,View Actual Flight Delay Results,View Flight Delay Prediction Results.

3.4.2.User

In this module, there are n numbers of users are present. User should register before
doing some operations. After registration successful he has to login by using authorized user
name and password. Login successful he will do some operations like post flight delay data
sets,search & predict flight delay data sets,view your profile.

Dept. of MCA Page | PAGE \* MERGEFORMAT 31


Design and Implementation of Domestic News Collection System

3.5 System Designing


UML Introduction
The unified modelling language allows the software engineer to express an analysis model
using the modelling notation that is governed by a set of syntactic, semantic and pragmatic
rules.
A UML system is represented using five different views that describe the system from
distinctly different perspective.
UML is specifically constructed through two different domains they are:
UML Analysis modelling, this focuses on the user model and structural model views of the
system.
UML design modeling, which focuses on the behavioral modeling, implementation modeling
and environmental model views.
System design aspects
Once the analysis stage is completed, the next stage is to determine in broad outline form
how the problem might be solved. During system design, we are beginning to move from the
logical to physical level.
System design involves architectural and detailed design of the system. Architectural
design involves identifying software components, decomposing them into processing
modules and conceptual data structures, and specifying the interconnections among
components.
Two kinds of approaches are available:
 Top-down approach
 Bottom-up approach
Design of the code
Since information systems projects are designed with space, time and cost saving in mind,
coding methods in which conditions, words, ideas or control errors and speed the entire
process. The purpose of the code is to facilitate the identification and retrieval of the
information. A code is an ordered collection of symbols designed to provide unique
identification of an entity or an attribute.
Design of input
Design of input involves the following decisions.
 Input data
 Input medium

Dept. of MCA Page | PAGE \* MERGEFORMAT 31


Design and Implementation of Domestic News Collection System

 The way data should be arranged or coded


Validation needed to detect every step to follow when error occurs.

The input controls provide ways to ensure that only authorized users access the system
guarantee the valid transactions, validate the data for accuracy and determine whether any
necessary data has been omitted. The primary input medium chosen is display. Screens have
been developed for input of data using HTML.
Design of output
Design of output involves the following decisions
 Information to present
 Output medium
 Output layout
Output of this system is given in easily understandable, user-friendly manner, Layout of the
output is decided through the discussions with the different users.
Design of control
The system should offer the means of detecting and handling errors.
Input controls provides ways to:
 Valid transactions are only acceptable
 Validates the accuracy of data
 Ensures that all mandatory data have been captured
All entities to the system will be validated. And updating of tables is allowed for only valid
entries. Means have been provided to correct, if any by change incorrect entries have been
entered into the system they can be edited.

3.6 UML diagram

As the strategic value of software increases for many companies, the industry looks for
techniques to automate the production of software and to improve quality and reduce cost and
time-to-market. These techniques include component technology, visual programming,
patterns and frameworks.

Businesses also seek techniques to manage the complexity of systems as they increase
in scope and scale. In particular, they recognize the need to solve recurring architectural
problems, such as physical distribution, concurrency, replication, security, load balancing and
fault tolerance.

Dept. of MCA Page | PAGE \* MERGEFORMAT 31


Design and Implementation of Domestic News Collection System

The Unified Modeling Language (UML) was designed to respond to these needs.
Simply, Systems design refers to the process of defining the architecture, components,
modules, interfaces, and data for a system to satisfy specified requirements which can be
done easily through UML diagrams.
In the project four basic UML diagrams have been explained among the following list:

 Class Diagram
 Use Case Diagram
 Sequence Diagram
 Activity Diagram

3.6.1 Class diagram

In software engineering, a class diagram in the Unified Modeling Language (UML) is a type
of static structure diagram that describes the structure of a system by showing the system's
classes, their attributes, and the relationships between the classes.
This is one of the most important of the diagrams in development. The diagram
breaks the class into three layers.
The relationships are drawn between the classes. Developers use the Class Diagram to
develop the classes. Analyses use it to show the details of the system.

Dept. of MCA Page | PAGE \* MERGEFORMAT 31


Design and Implementation of Domestic News Collection System

Fig.3.3:UML Diagram

Fig.1. Figure Showing Classes and Attributes

Dept. of MCA Page | PAGE \* MERGEFORMAT 31


Design and Implementation of Domestic News Collection System

3.6.2 Use case diagram


In software engineering, a use case diagram in the Unified Modelling Language (UML) is a
type of behavioural diagram defined by and created from a Use-case analysis. Its purpose is
to present a graphical overview of the functionality provided by a system in terms of actors,
their goals (represented as use cases), and any dependencies between those use cases.
The main purpose of a use case diagram is to show what system functions are
performed for which actor. Roles of the actors in the system can be depicted. Use cases are
used during requirements elicitation and analysis to represent the functionality of the system.
Use cases focus on the behavior of the system from the external point of view.

Fig 2 Use Case Diagram for Administrator

Dept. of MCA Page | PAGE \* MERGEFORMAT 31


Design and Implementation of Domestic News Collection System

3.6.3 Sequence diagram


A sequence diagram in Unified Modeling Language (UML) is a kind of interaction diagram
that shows how processes operate with one another and in what order. It is a construct of a
Message Sequence Chart. Sequence diagrams are sometimes called Event-trace diagrams,
event scenarios, and timing diagrams.

Fig.3.5:Sequence Diagram

Dept. of MCA Page | PAGE \* MERGEFORMAT 31


Design and Implementation of Domestic News Collection System

3.6.4 Activity diagram

Activity diagrams are a loosely defined diagram technique for showing workflows of
stepwise activities and actions, with support for choice, iteration and concurrency. In the
Unified Modeling Language, activity diagrams can be used to describe the business and
operational step-by-step workflows of components in a system. An activity diagram shows
the overall flow of control.

Fig.3.6:Activity Diagram

Dept. of MCA Page | PAGE \* MERGEFORMAT 31


Design and Implementation of Domestic News Collection System

3.7 Software and Hardware Requirements

3.7.1 Software requirements

Coding Language : Python

Front end : HTML

Back end : Mysql

Web Server : WampServer

Operating system : Windows 7

3.7.2 Hardware requirements

Processor : Pentium Dual Core/ Core to Duo/ I Core with


Minimum 1.2 GHZ Speed
RAM : 2 GB

Hard disk : 120 GB

Dept. of MCA Page | PAGE \* MERGEFORMAT 31


Design and Implementation of Domestic News Collection System

4.IMPLEMENTATION

4.1. Python

Below are some facts about Python. Python is currently the most widely used multi-
purpose, high-level programming language. Python allows programming in Object-Oriented
and Procedural paradigms. Python programs generally are smaller than other programming
languages like Python. Programmers have to type relatively less and indentation requirement
of the language, makes them readable all the time. Python language is being used by almost a
ll tech-giant companies like – Google, Amazon, Facebook, Instagram, Dropbox, Uber… etc.
The biggest strength of Python is huge collection of standard library which can be used for th
e following –

• Machine Learning

• GUI Applications (like Kivy, Tkinter, PyQt etc. )

• Web frameworks like Django (used by YouTube, Instagram, Dropbox)

• Image processing (like Opencv, Pillow)

• Web scraping (like Scrapy, BeautifulSoup, Selenium)

• Test frameworks

• Multimedia

Advantages of Python :-

Let’s see how Python dominates over other languages.

1. Extensive Libraries

Python downloads with an extensive library and it contain code for various purposes l
ike regular expressions, documentation-generation, unit-testing, web browsers, threading, dat
abases, CGI, email, image manipulation, and more. So, we don’t have to write the complete c
ode for that manually.

Dept. of MCA Page | PAGE \* MERGEFORMAT 31


Design and Implementation of Domestic News Collection System

2. Extensible

As we have seen earlier, Python can be extended to other languages. You can write so
me of your code in languages like C++ or C. This comes in handy, especially in projects.

3. Embeddable

Complimentary to extensibility, Python is embeddable as well. You can put your Pyth
on code in your source code of a different language, like C++. This lets us add scripting capa
bilities to our code in the other language.

4. Improved Productivity

The language’s simplicity and extensive libraries render programmers more productiv
e than languages like Python and C++ do. Also, the fact that you need to write less and get m
ore things done.

5. IOT Opportunities

Since Python forms the basis of new platforms like Raspberry Pi, it finds the future br
ight for the Internet Of Things. This is a way to connect the language with the real world.

6. Simple and Easy

When working with Python, you may have to create a class to print ‘Hello World’. Bu
t in Python, just a print statement will do. It is also quite easy to learn, understand, and code.
This is why when people pick up Python, they have a hard time adjusting to other more verbo
se languages like Python.

7. Readable

Because it is not such a verbose language, reading Python is much like reading Englis
h. This is the reason why it is so easy to learn, understand, and code. It also does not need cur
ly braces to define blocks, and indentation is mandatory. This further aids the readability of th
e code.

Dept. of MCA Page | PAGE \* MERGEFORMAT 31


Design and Implementation of Domestic News Collection System

8. Object-Oriented

This language supports both the procedural and object-oriented programming paradig
ms. While functions help us with code reusability, classes and objects let us model the real w
orld. A class allows the encapsulation of data and functions into one.

9. Free and Open-Source

Like we said earlier, Python is freely available. But not only can you download Pytho
n for free, but you can also download its source code, make changes to it, and even distribute
it. It downloads with an extensive collection of libraries to help you with your tasks.

10. Portable

When you code your project in a language like C++, you may need to make some cha
nges to it if you want to run it on another platform. But it isn’t the same with Python. Here, y
ou need to code only once, and you can run it anywhere. This is called Write Once Run Any
where (WORA). However, you need to be careful enough not to include any system-depende
nt features.

11. Interpreted

Lastly, we will say that it is an interpreted language. Since statements are executed on
e by one, debugging is easier than in compiled languages.
Any doubts till now in the advantages of Python? Mention in the comment section.
4.2. HISTORY OF PYTHON
What do the alphabet and the programming language Python have in common? Right,
both start with ABC. If we are talking about ABC in the Python context, it's clear that the
programming language ABC is meant. ABC is a general-purpose programming language and
programming environment, which had been developed in the Netherlands, Amsterdam, at the
CWI (Centrum Wiskunde &Informatica). The greatest achievement of ABC was to influence
the design of Python.Python was conceptualized in the late 1980s. Guido van Rossum
worked that time in a project at the CWI, called Amoeba, a distributed operating system. In
an interview with Bill Venners1, Guido van Rossum said: "In the early 1980s, I worked as an
implementer on a team building a language called ABC at Centrum voor Wiskunde en
Informatica (CWI). I don't know how well people know ABC's influence on Python. I try to

Dept. of MCA Page | PAGE \* MERGEFORMAT 31


Design and Implementation of Domestic News Collection System

mention ABC's influence because I'm indebted to everything I learned during that project and
to the people who worked on it."Later on in the same Interview, Guido van Rossum
continued: "I remembered all my experience and some of my frustration with ABC. I decided
to try to design a simple scripting language that possessed some of ABC's better properties,
but without its problems. So I started typing. I created a simple virtual machine, a simple
parser, and a simple runtime. I made my own version of the various ABC parts that I liked. I
created a basic syntax, used indentation for statement grouping instead of curly braces or
begin-end blocks, and developed a small number of powerful data types: a hash table (or
dictionary, as we call it), a list, strings, and numbers."

What is Machine Learning : -

Before we take a look at the details of various machine learning methods, let's start by
looking at what machine learning is, and what it isn't. Machine learning is often categorized
as a subfield of artificial intelligence, but I find that categorization can often be misleading at
first brush. The study of machine learning certainly arose from research in this context, but in
the data science application of machine learning methods, it's more helpful to think of
machine learning as a means of building models of data. Fundamentally, machine learning
involves building mathematical models to help understand data. "Learning" enters the fray
when we give these models tunable parameters that can be adapted to observed data; in this
way the program can be considered to be "learning" from the data. Once these models have
been fit to previously seen data, they can be used to predict and understand aspects of newly
observed data. I'll leave to the reader the more philosophical digression regarding the extent
to which this type of mathematical, model-based "learning" is similar to the "learning"
exhibited by the human brain.Understanding the problem setting in machine learning is
essential to using these tools effectively, and so we will start with some broad categorizations
of the types of approaches we'll discuss here.

Categories Of Machine Leaning :-

At the most fundamental level, machine learning can be categorized into two main
types: supervised learning and unsupervised learning. Supervised learning involves somehow
modeling the relationship between measured features of data and some label associated with
the data; once this model is determined, it can be used to apply labels to new, unknown data.
This is further subdivided into classification tasks and regression tasks: in classification, the
labels are discrete categories, while in regression, the labels are continuous quantities. We

Dept. of MCA Page | PAGE \* MERGEFORMAT 31


Design and Implementation of Domestic News Collection System

will see examples of both types of supervised learning in the following section. Unsupervised
learning involves modeling the features of a dataset without reference to any label, and is
often described as "letting the dataset speak for itself." These models include tasks such as
clustering and dimensionality reduction. Clustering algorithms identify distinct groups of
data, while dimensionality reduction algorithms search for more succinct representations of
the data. We will see examples of both types of unsupervised learning in the following
section.

Need for Machine Learning

Human beings, at this moment, are the most intelligent and advanced species on earth
because they can think, evaluate and solve complex problems. On the other side, AI is still in
its initial stage and haven’t surpassed human intelligence in many aspects. Then the question
is that what is the need to make machine learn? The most suitable reason for doing this is, “to
make decisions, based on data, with efficiency and scale”. Lately, organizations are investing
heavily in newer technologies like Artificial Intelligence, Machine Learning and Deep
Learning to get the key information from data to perform several real-world tasks and solve
problems. We can call it data-driven decisions taken by machines, particularly to automate
the process. These data-driven decisions can be used, instead of using programing logic, in
the problems that cannot be programmed inherently. The fact is that we can’t do without
human intelligence, but other aspect is that we all need to solve real-world problems with
efficiency at a huge scale. That is why the need for machine learning arises.

Challenges in Machines Learning :-

While Machine Learning is rapidly evolving, making significant strides with


cybersecurity and autonomous cars, this segment of AI as whole still has a long way to go.
The reason behind is that ML has not been able to overcome number of challenges. The
challenges that ML is facing currently are −

Quality of data − Having good-quality data for ML algorithms is one of the biggest
challenges. Use of low-quality data leads to the problems related to data preprocessing and
feature extraction.

Time-Consuming task − Another challenge faced by ML models is the consumption of time


especially for data acquisition, feature extraction and retrieval.

Dept. of MCA Page | PAGE \* MERGEFORMAT 31


Design and Implementation of Domestic News Collection System

Lack of specialist persons − As ML technology is still in its infancy stage, availability of


expert resources is a tough job.

No clear objective for formulating business problems − Having no clear objective and well-
defined goal for business problems is another key challenge for ML because this technology
is not that mature yet.

Issue of overfitting & underfitting − If the model is overfitting or underfitting, it cannot be


represented well for the problem.

Curse of dimensionality − Another challenge ML model faces is too many features of data
points. This can be a real hindrance.

Difficulty in deployment − Complexity of the ML model makes it quite difficult to be


deployed in real life.

4.2 Overview of DBMS


A Database Management System (DBMS) is a collection of interrelated data and set of
programs to access those data. The primary goal of DBMS is to provide a way to store and
retrieve database information.
4.2.1 Data Abstraction
Abstraction means to provide necessary information without considering the background
details. There are three levels of abstraction for a DBMS.
 Physical level: It is lowest level of abstraction, which describes how the data was
actually store on secondary device such as disks and tapes.
 Logical level: It is a second level of abstraction, which describes what data are stored
in the database, and what relationships exist among those data. Database
Administrators decide what data is to be kept in the database.
 View level: It is the highest level of abstraction, which describes only a part of the
entire database. The view level of abstraction exists to simplify their interaction with
the system. The system may provide many views for the same database.
4.2.2 Instances and Schema
The collection of information stored in a database at a particular moment is called an
instance. The overall design of a database is called a schema.

Dept. of MCA Page | PAGE \* MERGEFORMAT 31


Design and Implementation of Domestic News Collection System

4.3 Data Models


A Data Model is a collection of conceptual tools for describing data, data relationship, data
semantic and consistency constraints. Various data models available are discussed below.
4.3.1The Entity Relationship Model
E-R model is a data model used to describe the data involved in a real world enterprise. It
describes the data in the form of entities and relationships. An entity is a ‘thing’ (or ‘object’)
in the real world that can be easily distinguishable from other things. A relationship is an
association among several entities.
4.3.2 Relational Model
The Relational Model uses a collection of tables to represent both data and the relationships
among the data. Each table has multiple columns, and each column has a unique name.

4.4 Database Languages


A database system provides data definition language and data manipulation language.
4.4.1 Data Definition Language
Data Definition Language (DDL) consists of a set of definitions used to specify data base
schema. Execution of DDL statement results in a set of tables. These tables are stored in a
specific area known as data dictionary or data directory. A data directory contains Meta data.
Meta data is data about data.
4.4.2 Data Manipulation Language
 Retrieval of information stored in the database.
 Insertion of new information into the database.
 Deletion of information from the database.
 Modification of information stored in the database.
Data Manipulation Language (DML) is a language that enables users to access or manipulate
data. There are basically two types.
 Procedural DMLs require a user to specify what data are needed and how to get those
data.
 Declarative DMLs require user to specify what data needed without specifying how to
get those data.

4.5 MYSQL

Dept. of MCA Page | PAGE \* MERGEFORMAT 31


Design and Implementation of Domestic News Collection System

MYSQL is a relational database management system, which organizes data in the form of
tables. MYSQL is one of many database servers based on RDBMS model, which manages a
series of data that attends three specific things-data structures, data integrity and data
manipulation. With MYSQL cooperative server technology we can realize the benefits of
open, relational systems for all the applications. MYSQL makes efficient use of all systems
resources, on all hardware architecture to deliver unmatched performance, price performance
and scalability. Any DBMS to be called as RDBMS has to satisfy Dr.E.F. Codd’s rules.

 MYSQL is portable
The MYSQL RDBMS is available on wide range of platforms ranging from PCs to super
computers and as a multi user loadable module for Novel NetWare, if you develop
application on system, you can run the same application on other systems without any
modifications.
 MYSQL is compatible
MYSQL commands can be used for communicating with IBM DB2 mainframe RDBMS
that is different from MYSQL, that is MYSQL compatible with DB2. MYSQL RDBMS is a
high-performance fault tolerant DBMS, which is specially designed for online transaction
processing and for handling large database applications.
 Multithreaded server architecture
MYSQL adaptable multithreaded server architecture delivers scalable high performance for
very large number of users on all hardware architecture including symmetric multiprocessors
(sumps) and loosely coupled multiprocessors. Performance is achieved by eliminating CPU,
I/O, memory and operating system bottlenecks and by optimizing the Sql Server 2005,
DBMS server code to eliminate all internal bottlenecks.
4.5.1 Features of MYSQL
Most popular RDBMS in the market because of its ease of use
 Client/server architecture.
 Ensuring data integrity and data security.
 Parallel processing support for speed up data entry and online transaction processing
used for applications.
 DB procedures, functions and packages.
Dr.E.F. CODD’s RULES

Dept. of MCA Page | PAGE \* MERGEFORMAT 31


Design and Implementation of Domestic News Collection System

These rules are used for valuating a product to be called as relational database management
systems. Out of 12 rules, a RDBMS product should satisfy at least 8 rules, +rule called rule 0
that must be satisfied.
RULE 0. FOUNDATION RULE
For any system that is to be advertised as, or claimed to be relational DBMS. That system
should manage database with in itself, without using an external language.
RULE 1. INFORMATION RULE
All information in relational database is represented at logical level in only one way as values
in tables.
RULE 2. GUARANTEED ACCESS
Each and every data in a relational database is guaranteed to be logically accessibility by
using to a combination of table name, primary key value and column name.
RULE 3. SYSTEMATIC TREATMENT OF NULL VALUES
Null values are supported for representing missing information and inapplicable information.
They must be handled in systematic way, independent of data types.
RULE 4. DYNAMIC ONLINE CATALOG BASED RELATION MODEL
The database description is represented at the logical level in the same way as ordinary data
so that authorized users can apply the same relational language to its interrogation as they do
to the regular data.
RULE 5. COMPRHENSIVE DATA SUB LANGUAGE
A relational system may support several languages and various models of terminal use.
However, there must be one language whose statement can express all of the following Data
Definitions, View Definitions, Data Manipulations, Integrity, Constraints, Authorization and
transaction boundaries.
RULE 6. VIEW UPDATING
Any view that is theoretical can be updatable if changes can be made to the tables that effect
the desired changes in the view.
RULE 7. HIGH LEVEL UPDATE, INSERT and DELETE
The capability of handling a base relational or derived relational as a single operand applies
not only retrieval of data also to its insertion, updating, and deletion.
RULE 8. PHYSICAL DATA INDEPENDENCE
Application program and terminal activities remain logically unimpaired whenever any
changes are made in either storage representation or access method.

Dept. of MCA Page | PAGE \* MERGEFORMAT 31


Design and Implementation of Domestic News Collection System

RULE 9. LOGICAL DATA INDEPENDENCE


Application programs and terminal activities remain logically unimpaired whenever any
changes are made in either storage representation or access methods.
RULE 10. INTEGRITY INDEPENDENCE
Integrity constraints specific to particular database must be definable in the relational data
stored in the catalog, not in application program.
RULE 11. DISTRIBUTED INDEPENDENCE
Whether or not a system support database distribution, it must have a data sub-language that
can support distributed databases without changing the application program.

MYSQL Supports the following the Codd’s rule:


 Rule 1: Information Rule (Representation of information)-YES.
 Rule 2: Guaranteed Access-YES.
 Rule 3: Systematic treatment of Null values-YES.
 Rule 4: Dynamic on-line catalog-based Relational Model-YES.
 Rule 5: Comprehensive data sub language-YES.
 Rule 6: View Updating-PARTIAL.
 Rule 7: High-level Update, Insert and Delete-YES.
 Rule 8: Physical data Independence-PARTIAL.
 Rule 9: Logical data Independence-PARTIAL.
 Rule 10: Integrity Independence-PARTIAL.
 Rule 11: Distributed Independence-YES.
 Rule 12: Non-subversion-YES.

Dept. of MCA Page | PAGE \* MERGEFORMAT 31


Design and Implementation of Domestic News Collection System

5.TESTING
5.1 Software Testing Techniques
Software Testing is a critical element of software quality assurance and represents the
ultimate review of specification, design and coding, Testing presents an interesting anomaly
for the software engineer.

5.1.1 Testing Objectives

1.Testing is a process of executing a program with the intent of finding an error.

2.A good test case is one that has a probability of finding an as yet undiscovered error.

3. A successful test is one that uncovers an undiscovered error.

5.1.2Test cases

Status
Test Input Expected Behavior Observed P = Passed
S.No. behavior F = Failed
1 Login as user or Administrator or user -do- P
admin with Home page for
correct login manager should be
details displayed
2 Login as user or Error message should -do- P
admin with wrong be displayed
login details

Dept. of MCA Page | PAGE \* MERGEFORMAT 31


Design and Implementation of Domestic News Collection System

5.2 Test case Reports

5.2.1 Test case Design


Any engineering product can be tested in one of two ways:

White Box Testing

This testing is also called as glass box testing. In this testing, by knowing the specified
function that a product has been designed to perform test can be conducted that demonstrates
each function is fully operation at the same time searching for errors in each function. It is a
test case design method that uses the control structure of the procedural design to derive test
cases. Basis path testing is a white box testing.

Basis Path Testing

 Flow graph notation


 Cyclomatic Complexity
Deriving test cases Control Structure Testing

 Condition testing
 Data flow testing
 Loop testing
 Loop testing
Black Box Testing
In this testing by knowing the internal operation of a product, tests can be conducted to
ensure that “all gears mesh”, that is the internal operation performs according to specification
and all internal components have been adequately exercised. It fundamentally focuses on the
functional requirements of the software.
The steps involved in black box test case design are:
Graph based testing methods
 Equivalence partitioning
 Boundary value analysis
 Comparison testing
 Graph matrices

Dept. of MCA Page | PAGE \* MERGEFORMAT 31


Design and Implementation of Domestic News Collection System

5.2.2 Unit Testing


Unit testing focuses verification effort on the smallest unit of software design that is the
module. Using procedural design description as a guide, important control paths are tested to
uncover errors within the boundaries of the module. The unit test is normally white box
testing oriented and the step can be conducted in parallel for multiple modules.

5.2.3 Integration Testing

Integration testing is a systematic technique for constructing the program structure, while
conducting test to uncover errors associated with the interface. The objective is to take unit
tested methods and build a program structure that has been dictated by design.

5.2.4 Validation Testing

At the end of integration testing software is completely assembled as a package. Validation


testing is the next stage, which can be defined as successful when the software functions in
the manner reasonably expected by the customer. Reasonable expectations are those defined
in the software requirements specifications. Information contained in those sections form a
basis for validation testing approach.

Reasonable expectation is defined in the software requirement specification – a


document that describes all user-visible attributes of the software. The specification contains
a section titled “Validation Criteria”. Information contained in that section forms the basis for
a validation testing approach.

Dept. of MCA Page | PAGE \* MERGEFORMAT 31


Design and Implementation of Domestic News Collection System

6.FUTURE ENHANCEMENT
Furthermore, if a website is frequently accessed, this website may detect crawlers and
block the crawlers. For this problem, you can set a certain anti-crawling strategy to avoid
system failure. On the page display, the system can be further optimized to make the
interface more concise and intuitive; in the system functions, the functions can be further
expanded. These are the goals and directions of this system. This process needs to be
optimized step by step to achieve.

Dept. of MCA Page | PAGE \* MERGEFORMAT 31


Design and Implementation of Domestic News Collection System

7.CONCLUSION

This system makes every effort to facilitate the processing of news information for users,
and presents the news information obtained from various websites to the users. The
simple and efficient interface enables users to read the news clearly, and only crawls and
displays the key information of the news and ignores other unnecessary information, so
that users can find the content they are interested in or need more quickly. In short, this
system, as a comprehensive information, analysis and retrieval tool, will facilitate
people's lives to a certain extent.Certainly, this system can't be perfect, there are still
many functions that can be expected, and there are some deficiencies that can be
improved. For example, the system currently only implements crawling of a few sites,
and the number of crawled sites can continue to be expanded to make news content richer
and more complete

Dept. of MCA Page | PAGE \* MERGEFORMAT 31


Design and Implementation of Domestic News Collection System

APPENDIX-A: PROJECT SCREENSHOTS

Home page

Dept. of MCA Page | PAGE \* MERGEFORMAT 31


Design and Implementation of Domestic News Collection System

Manager login:

Dept. of MCA Page | PAGE \* MERGEFORMAT 31


Design and Implementation of Domestic News Collection System

Manager Home

Dept. of MCA Page | PAGE \* MERGEFORMAT 31


Design and Implementation of Domestic News Collection System

View users:

Dept. of MCA Page | PAGE \* MERGEFORMAT 31


Design and Implementation of Domestic News Collection System

Add news:

Dept. of MCA Page | PAGE \* MERGEFORMAT 31


Design and Implementation of Domestic News Collection System

View news:

Dept. of MCA Page | PAGE \* MERGEFORMAT 31


Design and Implementation of Domestic News Collection System

User Registration:

Dept. of MCA Page | PAGE \* MERGEFORMAT 31


Design and Implementation of Domestic News Collection System

User login:

Dept. of MCA Page | PAGE \* MERGEFORMAT 31


Design and Implementation of Domestic News Collection System

User home:

Dept. of MCA Page | PAGE \* MERGEFORMAT 31


Design and Implementation of Domestic News Collection System

View news:

Dept. of MCA Page | PAGE \* MERGEFORMAT 31


Design and Implementation of Domestic News Collection System

APPENDIX-B: REFERENCES
1. Roger S Pressman, “Software Engineering - A Practitioner’s approach”
McGraw – Hill International Editions, Fifth Edition, 2001.
2. Henry F Korth, S. Sudarshan, “Database System Concepts” McGraw – Hill
International Editions, Fourth Edition, 2002.
3. George Koch, Kevin Loney, “Oracle – The Complete Reference”, Tata
McGraw Hill, Third Edition, 2001.
4. Herbert Schildt & Patrick Naughton, “Python2 Complete Reference”, Tmh
3/e, 1999.
5. James Jawroski, “Mastering Python Script”, Tmh 3/e, 2000.
6. JSP Architecture “Karl Avedal”, Tata McGraw – Hill International Editions,
Fourth Edition, 2002.

Dept. of MCA Page | PAGE \* MERGEFORMAT 31

You might also like