HTML Forms Built On User Trait Detection

This document discusses building adaptive HTML forms using machine learning. It proposes using predictive models trained on user data to cluster users and infer their traits in order to optimize the user interface and experience. The system architecture involves using scikit-learn, pandas and other libraries to analyze user data with algorithms like random forest classification and hierarchical clustering. Visualizations like dendograms and cluster plots are used to validate the results. Future enhancements could include extracting more data from datasets and customizing features for different business categories.

Uploaded by

saikiran

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

40 views16 pages

HTML Forms Built On User Trait Detection

Uploaded by

saikiran

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 16

HTML forms built on User

Trait Detection
Contents
 Abstract
 Project Introduction
 Existing System
 Proposed System
 System Architecture
 Libraries
 Project Requirements
 Dendograms
 Cluster Visualizations
 Future Enhancements
ABSTRACT
 The Extent of an individual to fill a questionnaire form decreases with increase in number of queries requested,
with an intension to enhance the ability of an user to fill large questionnaires in web forms this project is put forth.

 The Potential of user to complete large questionnaires can be achieved by pliable HTML forms. Such pliability is
built on Machine Learning (ML) procedures.

 To estimate user behavior, preferences, traits and understand various types of users, predictive models based on
Machine Learning Algorithms (MLA) trained with large user data from users who participated previously in the
questionnaires, inferring the most admissible factors that report the model and clustering the users depending upon
their likely characteristics and other factors. Depending upon these groups and their ability in the system of users to
review various things and provide the most optimal version (improving User Interface and User Experience) to
them.

 To validate the approach and confirm the improvements, further tests of these classifications have been
commenced. The Aim if this project is to understand Human Computer Interaction (HCI) and deduce a dependable
outcome from input data and enhance the process and find its applicability in various fields.
Project Introduction

 A web form, also called an HTML form, is an online page that

allows for user input. It is an interactive page that mimics a paper
document or form, where users fill out particular fields. Web forms
can be rendered in modern browsers using HTML and related web-
oriented languages.
 These web forms require at most attention for optimizing user input.
 Data filled into Web forms is of at most use towards understanding
of specific needs and improvements.
Existing system

 Typically, a web form contains a combination of form elements such

as a checkbox, submit button, text box, etc. For added interactivity,
web designers may use elements or classes such as "input" along
with "action" and "method" attributes. They can also use the "GET"
or "POST" method for submitting data.
 The Fields in each web form cannot be removed unless they are
proved unnecessary which is complex to determine using traditional
methods.
Proposed system

 The goal of this paper is to present a new approach for enabling

adaptability in web-based systems using A/B testing methods and
user-tracking and machine-learning algorithms that could lead to
improving user performance in completing a (large) web form,
validating the obtained results through statistical tests.
 As a secondary goal, the research presented in this paper also aims
to produce all machine learning processes in a white-box way, using
algorithms and techniques that allow researchers to understand what
is happening in every moment.
System Architecture
Libraries
 Scikit Learn library version 0.19.1
 Pandas software library version 0.22.0
 Numpy library
 Scipy library
 Matplotlib library
 Sklearn library
 Ipython library
 Seaborn library
 Textblob library
 Beautifulsoup library
Sample Code
WORKING OF ALGORITHMS
 Random Forest Classifier : A random forest is a meta estimator that fits a number of decision
tree classifiers on various sub-samples of the dataset and use averaging to improve the predictive
accuracy and control over-fitting.
 Hierarchial Clustering with Euclidian Distance : Hierarchical clustering is set of methods that
recursively cluster two items at a time. There are basically two different types of algorithms,
agglomerative and partitioning. In partitioning algorithms, the entire set of items starts in a
cluster which is partitioned into two more homogeneous clusters. Then the algorithm restarts
with each of the new clusters, partitioning each into more homogeneous clusters until each
cluster contains only identical items.
 Reinforcement Learning : Reinforcement Learning is learning what to do and how to map
situations to actions. The end result is to maximize the numerical reward signal. The learner is
not told which action to take, but instead must discover which action will yield the maximum
reward.
Software Requirements
 OEEU Dataset/Yelp dataset
 Python version 3.6
 Jupyter Notebook version 5.0
 Git version control version 2.16.1
 Visual Studio c++ 2015
 windows 10 sdk
Hardware Requirements
 RAM : 4GB
 Processor : core i3(recommended) x64
 Storage : 10GB of free space (softwares-4GB)
DENDOGRAMS
 A dendrogram is a type of tree diagram showing hierarchical clustering — relationships
between similar sets of data. They are frequently used in biology to show clustering
between genes or samples, but they can represent any type of grouped data.
CLUSTER VISUALIZATION
FUTURE ENHANCEMENTS
 In conclusion, we have experimented with various feature selection and supervised learning
algorithms to predict star ratings of the Yelp dataset using review text alone. We evaluate the
effectiveness of different algorithms based on precision and recall measures. We conclude that
Support Vector Machine combined with feature selection with stop words removed and
stemming is the best in our context of Trait analysis. Possible improvement could be extracting
additional information from the dataset such as Business Categories and use customized feature
sets for each Category, because different word features might be more or less relevant in
different Business Categories.
 Runtime of the algorithm could possibly be improved by training and testing within each
business category, because of a smaller feature set. We could also try using parts-of-speech in
feature selection process to differentiate between the same word features that are used as
different parts-of-speech.
THANK YOU
EVERYONE