0% found this document useful (0 votes)
200 views

Implementation of Web Application For Disease Prediction Using AI

The Internet is the largest source of information created by humanity. It contains a variety of materials available in various formats such as text, audio, video and much more. In all web scraping is one way. It is a set of strategies here in which we get information from the website instead of copying the data manually. Many Web-based data extraction methods are designed to solve specific problems and work on ad-hoc domains.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
200 views

Implementation of Web Application For Disease Prediction Using AI

The Internet is the largest source of information created by humanity. It contains a variety of materials available in various formats such as text, audio, video and much more. In all web scraping is one way. It is a set of strategies here in which we get information from the website instead of copying the data manually. Many Web-based data extraction methods are designed to solve specific problems and work on ad-hoc domains.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

BOHR International Journal of Computer Science

2022, Vol. 1, No. 1, pp. 6–10


https://doi.org/10.54646/bijcs.002
www.bohrpub.com

Implementation of Web Application for Disease Prediction Using AI


Manasvi Srivastava, Vikas Yadav and Swati Singh∗
IILM, Academy of Higher Learning, College of Engineering and Technology Greater Noida,
Uttar Pradesh, India
∗ Corresponding Author: [email protected]

Abstract. The Internet is the largest source of information created by humanity. It contains a variety of materials
available in various formats such as text, audio, video and much more. In all web scraping is one way. It is a set of
strategies here in which we get information from the website instead of copying the data manually. Many Web-based
data extraction methods are designed to solve specific problems and work on ad-hoc domains. To enable Web Scrap-
ing, a variety of tools and technologies have been created. Regrettably, the propriety and ethics of employing these
Web Scraping programmes are frequently neglected. There are hundreds of online scraping applications available
today, the most of which are written in Java, Python, or Ruby. There is both commercial software and open source
software. For novices in web cutting, web-based applications such as YahooPipes, Google Web Scrapers, and Outwit
Firefox plugins are the finest options. Web extraction is basically used to cut this manual extraction and editing pro-
cess and provide an easy and better way to collect data from a web page and convert it into the desired format and
save it to a local or archive directory. . In this paper, among others the kind of scrub, we focus on those techniques
that extract the content of a Web page. In particular, we use scrubbing techniques for a variety of diseases with their
own symptoms and precautions.
Keywords: Web Scraping, Disease, Legality, Software, Symptons.

INTRODUCTION have the opportunity to find, analyze and use informa-


tion in the way we need. Web logging therefore paves
Web Scraper is a process for downloading and extract- the way for data acquisition, speeds up automation and
ing important data by scanning a web page. Web scrap- makes it easier to access extracted data by rendering it in
ers work best when page content is either transferred, CSV pattern. Web publishing often removes a lot of data
searched, or modified. The collected information is then from websites for example, monitoring consumer interests,
copied to a spreadsheet or stored in a database for fur- price monitoring eg price checking, advancing AI mod-
ther analysis. For the ultimate purpose of analysis, data els, data collection, tracking issues, and so on. So there
needs to be categorized by progressively different devel- is no doubt that web removal is a systematic way to get
opments, for example, by starting with its specification more data from websites. It requires two stages mainly
collection, editing process, cleaning process, remodeling, crawling and removal. A search engine is an algorithm
and using different models and various algorithms and designed by a person who goes through the web to look
end result. There are two ways to extract data from web- for specific information needed by following online links.
sites, the first is the manual extraction process and the Deleter is a specific tool designed to extract data from
second is the automatic extraction process. Web scrapers sites.
compile site information in the same way that a per- Web Scraper will work that way if the patient is suf-
son can do that by removing access to a webpage of fering from any kind of illness or illness, he will add his
the site, finding relevant information, and moving on to symptoms and problems and when the crawl work starts
the next web page. Each website has a different struc- and he will start scrolling and look like a disease from
ture which is why web scrapers are usually designed the database provided on the website and will show the
to search through a website. Web deletion can help in best disease like patient symptoms. And when those spe-
finding any kind of targeted information. We will then cific diseases show up, it will also show the precautionary

6
Implementation of Web Application for Disease Prediction Using AI 7

measures that the patient needs to take care of in order to Hyper Text Markup Language (HTML)
overcome them and to treat the infection.
Exploration languages for query data, such as XQuery and
Hyper Text Query Language (HTQL), can be used to scan
OVERVIEW OF WEB SCRAPING HTML pages and obtain and alter material on the page.
Web scraping is an excellent method for extracting random
data from websites and organising that data so that it can Release Structure
be saved and examined in a database. Web scraping is also
known as data extraction from the online, data removal The main purpose is to convert the published content into
from the web, web harvesting, or screen scanning. Web a formal representation for further analysis and retention.
scraping is a type of data mining. The goal of web crawl- Although this final stage is on the Web scraping side, some
ing is to collect information from websites and transform technologies are aware of post-results, including memory
it into a usable format, such as spreadsheets, databases, data formats and text-based solutions, such as cables or
or comma-separated files (CSV), as illustrated in Figure files (XML or CSV files).
1. With web termination, data such as item pricing, stock
price, different reports, market prices, and product details
may be gathered. Extracting information from websites LITERATURE SURVEY
allows you to make more informed business decisions.
Python has a rich set of libraries available for downloading
digital content online. Among the libraries available, the
following three are the most popular: BeautifulSoup, LXml
and RegEx. Statistical research was performed on the avail-
able data sets; indicates that RegEx was able to deliver the
requested information at an average rate of 153.6 ms. How-
ever, RegEx has those limitations of data extraction of web
pages with internal HTML tags. Because of this demerit
RegEx is used to perform complex data extraction only.
Some libraries such as BeautifulSoup and LXml are able
to extract content from web pages under a complex envi-
Figure 1. Web scraping structure.
ronment that has yielded a response rate of 457.66 ms and
203 ms respectively.
PRACTICES OF WEB SCRAPING The main purpose of data analysis is to get useful
information from data and make decisions based on data
• Data scraping analysis. Web deletion refers to the collection of data on the
• Research web. Web scraping is also known as data scraping. For the
• Web mash up—integrate data from multiple sources purpose of data analysis can be divided into several steps
• Extract business details from business directory web- such as cleaning, editing etc. Scrapy is the most widely
sites such as Yelp and Yellow pages used source of information needed by the user. The main
purpose of using scrapy is to extract data from its sources.
• Collect government data
Scrapy, which crawls on the web and is based on python
• Market Analysis programming language, is very helpful in finding the data
The Web Data Scraper process, a software agent, also we need by using the URLs needed to clear the data from
known as a Web robot, mimics browsing communica- its sources? Web scraper is a useful API to retrieve data
tion between Web servers and a person on a normal Web from a website. Scrapy provides all the necessary tools to
browser. Step by step, the robot enters as many Websites as extract data from a website and process data according to
it needs, transfers its content to find and extracts interest- user needs and store data in a specific format as defined by
ing data and builds that content as desired. The following users.
text describes how AP scraping APIs and frameworks meet The Internet is very much looking at web pages that
the most frequent online data scrapers engaged in attaining include a large number of descriptive elements including
various recovery goals: text, audio, graphics, video etc. This process, Web Scraping
is mainly responsible for the collection of raw data from the
website. It is a process in which you extract data automa-
Hypertext Transfer Protocol (HTTP)
tion very quickly. The process enables us to extract specific
This approach extracts data from both static and dynamic data requested by the user. The most popular method used
web pages. Data may be obtained by utilising a socket is to create individual web data structure using any known
system to send HTTP requests to a remote web server. language.
8 Manasvi Srivastava et al.

EXPERIMENTAL WORK COMPATIBILITY


TECHNOLOGY USED OS X
Only 64bit binaries are provided for OS X, and the lower
Firebase version of OS X is supported by OS X 10.9.

For Database we have used Cloud Firestore from firebase. Windows


It is a real-time NOSQL database that stores two key value
data in the form of collections and documents. Electron supports Windows 7 and later, older versions of
the OS are not supported.
Both x86 and amd64 (x64) binary are provided for Win-
Tensor Flow dows and are not supported in the ARM version of Win-
dows.
Tensor Flow is used to train the database model and to
make predictions. There are various algorithms for mod-
eling training, or use line format in our project.
Software Used
– VSCode
Microsoft Visual Studio Code is a freeware source
JavaScript Frameworks code editor available for Windows, Linux, and
• NodeJS MacOS. Debugging assistance, syntax highlighting,
intelligent coding, captions, code reuse, and inte-
Node.js is an open source, cross-platform JavaScript grated Git are among the features.
runtime environment that runs back to the V8 engine
and extracts JavaScript code without the use of a web – Google Collab Notebook
browser. Collaboratory, or Colab for short, is a Google
Research tool that allows developers to write and
Our rewriting code is written for Nodejs as its fast
run Python code in their browser. Google Colab is an
and platform language.
amazing tool for hands-on learning activity. It is a lit-
• ElectronJS tle Jupyter notebook that requires no installation and
includes a fantastic free edition that gives you free
Electron is a framework for developing native apps access to Google computer resources like GPUs and
using web technologies like as JavaScript, HTML, TPUs.
and CSS. As Electron is used to create a short web
application it helps us to write our code and thus – PyCharm
reduce the development time. PyCharm is an integrated development environment
• ReactJS for computer applications, mostly in the Python pro-
gramming language. JetBrains, a Czech firm, created
React makes it less painful to create interactive UIs. it.
Design a simple view of each state in your app and
React will carefully review and provide relevant sec-
tions as your data changes. Data Source
As we did not get more than 40 diseases so to get dataset
React may also be used to render to the server using
we have created our own dataset. And the dataset which
Node and to power mobile applications using React
we have used for our training and testing process have
Native.
taken from various sources. One of them is added below.
– https://github.com/DiseaseOntology/HumanDise
PYTHON aseOntology

Python is a high-level programming language translated


Use of Scrapy
into high-level translations.
In this project various libraries such as pandas, NumPy, Scrapy is a framework for crawling and retrieving non-
good soup. etc. is used to create our database. Pandas and fiction data that can be used for the size of a support-
NumPy are used to filter and process data needed to train ive application such as data mining, managed or actual
our model by extracting and removing it from a separate reported data. Apart from the way it was originally
data source. expected for Scrapy to be removed from the web, it could
Implementation of Web Application for Disease Prediction Using AI 9

be used in the same way to extract data using APIs and then write integration tests to ensure that acceptance
for example Amazon AWS or as a very important web criteria are met before approving a feature to be used. Con-
browser. Scrappy is written in python. Let’s take a Wiki tinuous integration servers can ensure that all these tests
example related to one of these problems. A simple online are passed before they are incorporated into production.
photo gallery can provide three options to users as defined
by HTTP GET parameters at URL. If there are four ways
Algorithm Used
to filter images with three thumbnail-sized options, two
file formats, and a user-provided disabling option, then the Linear Regression is a standard mathematical method that
same content set can be accessed with different URLs, all of allows us to study a function or relationship in a given set
which can be linked to the site. This carefully crafted com- of continuous data. For example, we are given some of the
bination creates a problem for the pages as they have to corresponding x and y data points and we need to study
plan with an endless combination of subtitle changes to get the relationship between them called hypothesis.
different content. In the event of a line reversal, the hypothesis is a straight
line, i.e.
Methodology Where the vector is called Weights and b is a scale called
Bias. Weights and Bias are called model parameters.
The method used by the project to collect all the required All we need to do is estimate the value of w and b
data is extracted and extracted from various sources such from the set of data given that the result of the assump-
as the CDC’s database database and Kaggle resources. tion has produced the minimum cost J defined by the next
Then analyze the extracted data using texts written in cost function.
python language according to project requirements. Pan- Where m the number of data points in the data pro-
das and NumPy are widely used to perform various func- vided. This cost function is also called Mean Squared Error.
tions on the database. To find the optimized value of the J’s minimum param-
After sorting the data according to each need, it is then eters, we will be using a widely used optimizer algorithm
uploaded to the database. In the database we have used called Gradient Descent. The following is a fake Gradient
Cloud Firestore as it is a real-time NoSQL database with Descent code:
extensive API support.
Further in the TensorFlow project is used to train our
model according to needs. RESULT DISCUSSION
In this project we predict the disease because of the
The overall results of the project are useful in predicting
given symptoms.
diseases with the given symptoms. The script that was
Training data set – 70% written to extract data can be used later to compile and for-
mat it according to needs.
Setting test data – 30% Users can pick up symbols by typing them themselves
or by selecting them in the given options. The training
TensorFlow supports Linear Regression which is used to
model will predict the disease according to it. Users are
predict diseases based on the given indicators.
able to create their own medical profile, where they can
submit their medical records and prescribed medication,
Coding this greatly helps us to feed our database and better predict
disease over time as some of these diseases occur directly
Project Frontend is written using ReactJS & TypeScript.
during the season.
Although we have used the MaterialUI kit from Google
Moreover, the analysis performed showed a very sim-
ReactJS to speed up our development process.
ilar disease, but the training model lacks the size of the
To provide our app, Electron is used. Our web system
database.
supports MacOS and Windows. Most of today’s web app
is written with the help of Electron JS.
CONCLUSIONS AND FUTURE SCOPE
Testing
The use of the Python program also emphasizes under-
The project is tested using an Electron built-in test frame- standing the use of pattern matching and general expres-
work called Spectron. sions for web releases. Database data is compiled from
The project is being implemented in the browser. The factual reports, directly to Government media outlets for
output generated turns out to be completely consistent and local media where it is considered reliable. A team of
the generated analysis is approximate. experts and analysts who validate information from a con-
Electron’s standard workflow with Spectron can involve tinuous list of more than 5,000 items is likely to be the
engineers who write unit tests in the standard TDD format site that collects data effectively. User-provided inputs are
10 Manasvi Srivastava et al.

analyzed and deleted from the website and the output is [2] Shreya Upadhyay, Vishal Pant, Shivansh Bhasin and Mahantesh K
extracted as the user enters the user interface encounters. Pattanshetti “Articulating the Construction of a Web Scraper for Mas-
sive Data Extraction”, 2017 IEEE.
Output is generated in the form of text. This method is [3] Amruta Kulkarni, Deepa Kalburgi and Poonam Ghuli, “Design of
simple and straightforward to eradicate the disease from Predictive Model for Healthcare Assistance Using Voice Recogni-
companies and provides vigilance against that disease. tion”, 2nd IEEE International Conference on Computational Sys-
For future work, we plan tests that aim to show the med- tems and Information Technology for Sustainable Solutions 2017,
PP: 61–64.
ication that a patient can take for treatment. Also, we are [4] Dimitri Dojchinovski, Andrej Ilievski, Marjan Gusev Interactive
looking to link this website to various hospitals and phar- home healthcare system with integrated voice assistant MIPRO 2019,
macies for easy use. PP: 284–288 Posted: 2019.
[5] Mohammad Shahnawaz, Prashant Singh, Prabhat Kumar and Dr.
Anuradha Konidena, “Grievance Redressal System”, International
REFERENCES Journal of Data Mining and Big Data, 2020, Vol. 1, No. 1,
PP. 1–4.
[1] Thivaharan. S, Srivatsun. G and Sarathambekai. S, “A Survey on
Python Libraries Used for Social Media Content Scraping”, Proceed-
ings of the International Conference on Smart Electronics and Com-
munication (ICOSEC 2020) IEEE Xplore Part Number: CFP20V90-
ART; ISBN: 978-1-7281-5461-9, PP: 361–366.

Hyperlink of Research Paper


1. https://www.sciencedirect.com/science/article/pii/S2352914817302253

You might also like