Cluster Based Analyasis For Google YouTube Videos Viewer
Cluster Based Analyasis For Google YouTube Videos Viewer
Abstract In the modern scenario, of the huge number of internet users and everyone are friendly with Google
YouTube apps. Online videos are especially interesting as a potential vector for social correspondence.
Video is additionally ready to catch social encounters. YouTube is where sound and video have been seen
by the clients. Every person needs to watch/views their fascinating sound and recording materials they visit
YouTube is where there is no bar age or sexual orientation. Everyone inquiries they vary own decision.
YouTube have lots of videos collections of different types. In these regard, a number of viewers have seen
different channels and videos. YouTube has a great part of the client watching their most loving channels
and videos. In the previous research paper title “Data Mining Techniques for Videos Subscribers Google
YouTube” taken data mining classifiers and find out the results. Now in this research paper analysis
YouTube videos views and analysis with clustering and achieve a cluster quality of 0.75.
1 Introduction
Classification proceeds based on classifier selection on dataset and proposes a clustering based classifier
se0lection method. In this method many clusters are selected for ensemble process. then the standard
presentation of each classifier on selection cluster is calculated and the classifier’s with the best average
technique is used. Weight values are calculated according to distances between the given data and each
selected cluster. Online video, a universal, visual, and exceedingly shareable medium is appropriate to
intersection geographic, social, and semantic obstructions. Inclining recordings specifically, by uprightness
of achieving an expansive number of watchers in a limited capacity to focus time, are incredible as both
influencers and pointers of global correspondence streams. Be that as it may, are new correspondence
innovations really being utilized to share thoughts all around, or would they say they are essentially
reflecting previous social channels? What's more, how do social, political, and geographic variables impact
global correspondence? By dissecting utilization information from computerized correspondences stages,
we can start to respond to these inquiries. In this paper, we center around slanting information from the
YouTube video sharing stage to look at the global utilization of online video
.
1. Literature Survey
Rahul deo Sah [1] The proposed system relies upon isolating the course of action of features from making
tires out of male and female writers and getting ready classifiers to make sense of how to isolate between
the two. Pictures are isolated using Otsu thresholding estimation. The going with features have been
considered. Forming properties like tendency, ebb, and stream, surface and hazard are evaluated by enlisting
neighborhood and overall features. The author Osama Abu Abbas [2] has presented paper Comparisons
between data clustering algorithms. In his paper different clustering algorithms are k-means, self-organizing
map algorithm (SOM), a hierarchical clustering algorithm and expectation maximization clustering
algorithm (EM). These four algorithm are chosen per the popularity, flexibility and high dimensionality. It
is compared based on the factors are size of dataset, type of dataset, a number of clusters and software used.
Experimental result analyzes k-means and expectation maximization clustering algorithms are better result
than hierarchical clustering algorithm. Self-organizing map algorithm shows better accuracy as compared
to k-means and expectation maximization clustering algorithms. Dr. S.K. Jyanthi et. al [3] has proposed a
paper clustering approach for classification of research articles based on a keyword search. In his paper-
means, hierarchical clustering and fuzzy C-means clustering algorithms are used for clustering.
Experimental analysis shows fuzzy C-means shows a better result than k-means and hierarchical clustering
algorithm. Bhagyashree Pathak et al. [4] has presented a survey paper of the clustering method. In her survey
paper, different types of clustering partition clustering, hierarchical clustering, and density based clustering,
grid based clustering, the model based clustering, soft computing clustering was discussed. In this paper, it
is observed, hierarchical clustering (agglomerative hierarchical clustering / divisive hierarchical clustering)
give a better result than partitioning cluster (k-mean algorithm / k-medoid algorithm) and soft computing
technique give a better result for a large dataset. Author Mythili S et al. [5] has presented a research article
on an analysis of clustering algorithms in data mining. This paper described a broad survey of a different
technique of clustering and their issues on a basis of accuracy and complexity of the algorithm on a large
dataset. Mythili, et.al [6] presented a paper which provides an overview of algorithms along with their
advantages and disadvantages. The different clustering methods that have been studied are partitioning
clustering, hierarchical clustering, density based clustering Amandeep Kaur Mann [8] discussed the
different data mining techniques used in cloud computing. It would help to evaluating all possible software
services on the cloud computing by using the clustering technique. This paper determines that the K-means
algorithm is the more efficient algorithm as compared to remaining algorithms and it is suitable for large
database Tamilkili.M [7] presented a paper on various clustering techniques namely partitioning, density
based, hierarchical, based, a model based, a constraint based technique along with their specialty, advantages
and disadvantages. Madura phatak.et al. [9] proposed the new software using Cluster Knowledge Discovery
in Databases and Classification knowledge Discovery in a database (KDD). It concluded that Clustering
Knowledge Discovery is suitable for larger dataset but the software contains more complication.Mihika
shah.et.al [10] presented a paper that discussed the various types of algorithms like k-means clustering
algorithm. This paper provides a broad survey of the most basic techniques such as hierarchical and partition
algorithm. Amandeep Kaur Mann [8] discussed the different data mining techniques used in cloud
computing. It would help to evaluating all possible software services on the cloud computing by using a
clustering technique. This paper determines that the K-means algorithm is a more efficient algorithm as
compared to remaining algorithms and it is suitable for large database We consider de-anonymization of
Bitcoin addresses as a clustering problem. Clustering is an important class of unsupervised learning
problems [14], which focuses on splitting data into groups Hierarchical Clustering method merged or splits
the similar data objects by constructing a hierarchy of clusters also known as dendrogram. Hierarchical
Clustering method forms clusters progressively [11]. Divisive clustering: This is a "top down" approach.
This clustering observations start in one cluster, and splits are performed recursively as one moves down
the hierarchy [12]. A cluster is a dense region of points that is separated by low density regions from the
tightly dense regions. This clustering algorithm can be used when the clusters are irregular It finds core
objects i.e. objects that have dense neighborhoods [13] A cluster is a dense region of points that is separated
by low density regions from the tightly dense regions. This clustering algorithm can be used when the
clusters are irregular It finds core objects i.e. objects that have dense neighborhoods.
.
3. Research Methodology
In the research methodology, Clustering methods are apply when there is no class to be early as predicted
but rather when the instances are to be divided into supernatural groups. These clusters obviously reflect
some mechanism at work in the domain from which instances are drawn, a mechanism that causes some
instances to bear a stronger resemblance to each other than they do to the remaining instances. The algorithm
developed result with one type of data may fail unfortunate result with a dataset of other models.
Scalability–data must be scalable; we may get the wrong results.
1) Always clustering algorithm able to handle with different types of instances.
2) Clustering must be able to evaluate clustered data with the arbitrary nature
3) The clustering should be insensitive to noise and outliers.
4) Illustrate –eligibility and usability –Result obtained should be interpretable and usable. So that
maximum skills about the input parameters can be obtained.
5) The clustering algorithm should be able to find with a data set of more dimensionality.
The clustering algorithm can be extensively grouped into two categories.
1) Unsupervised cubic clustering algorithms.
2) Unsupervised non-cubic clustering algorithms.
3) Unsupervised cubic algorithm-K-Means clustering algorithm. Simple Means clusters data
using k-means; the number of clusters is specified by a parameter. The user can choose
between the Euclidean and Manhattan Distance metrics. In the latter case the algorithm is
actually k-medians instead of K-means, and the centroids are based on medians rather than
means in order to minimize the within-cluster distance function Simple Means for the weather
data, with default options: two clusters and Euclidean distance. The result of clustering is
shown as a table with rows that are attribute names and columns that correspond to the cluster
centroids; an additional cluster at the beginning shows the entire dataset
Where,
'||xi–vj||' is the Euclidean separation among xi and vj.
Vi= (1/ci)∑𝐶𝑖
𝑗=1 𝑥𝑖
Data: the data set after it has been transformed for modeling.
Text: only shown if feature extraction from text data was activated. Shows the words in the text
columns which are used for the analysis as a table and as a word cloud. In addition, we can inspect
all the training and scoring documents where those words have been highlighted. Finally, if we
activated the calculation of sentiment or language, we can inspect the distribution of those values
for all our text columns as well.
Fig.2 Optimal features sets for k-Means Clustering
The plot on the left shows the result of the feature selection run. Each point represents a different
feature set, i.e. a subset of the original columns. A feature set could, for example, have a complexity
of 5 and achieves a cluster quality of 0.75. Please note that this quality measure is the Davies
Bouldin Index of the clustering and smaller values are better. Unlike for classification or
regression, the goal of clustering is to describe the data. Therefore, we want to stay as close to the
original data as possible and only remove noise in the data. Typically, the most meaningful results
can be found in the middle area of the Pareto front on the left. The original feature space is shown
as square and is typically in the top right corner. Using fewer features will improve the cluster
quality, but may no longer accurately describe the underlying patterns. We will find those features
toward the bottom left corner. The feature set which has been used to build the final model is shown
bigger.
Fig.3 Scatter Plot for 0 cluster and Scatter Plot for 01 Cluster
Fig.6-Decision Tree for different Clusters (Cluster 0, Cluster 01, Cluster 02)
5. Conclusion:
All other sections in the results menu are reserved for the cluster models. Each cluster model gets
a section of its own and in general provides the entries above. Shows the size of all found clusters
together with some information about the clusters and their quality. Heat Map: identifies the most
important Attributes for each cluster. Cluster Tree: displays a decision tree describing the main
differences between the clusters. Centroid Chart: shows the values for the cluster centroids in a
parallel chart. Centroid Table: shows the values for the cluster centroids in a table. Scatter Plot:
with a choice of the cluster, displays a scatter plot in terms of the two most important Attributes.
Clustered Data: displays a table with all the data, including the cluster label for each data point
only shown if feature selection was activated. Shows all optimal trade-offs between feature set
complexities and clustering qualities. We can select any of the points in the trade-off plot and see
the specific feature sets at the bottom General Data: the data set after it has been transformed for
modeling. Text: only shown if feature extraction from text data was activated. Shows the words in
the text columns which are used for the analysis as a table and as a word cloud. In addition, we can
inspect all the training and scoring documents where those words have been highlighted. Finally,
if you activated the calculation of sentiment or language, we can inspect the distribution of those
values for all our text columns as well. Correlations: a matrix showing the correlation. YouTube
videos views and analysis with clustering and achieve a cluster quality of 0.75.
6. References-
[1] Rahul Deo Sah “Review of Medical Disease Symptoms Prediction Using Data Mining Technique” IOSR Journal of
Computer Engineering (IOSR-JCE) a-ISSN: 2278-0661,p-ISSN: 2278-8727, Volume 19, Issue 3, Ver. I (May.-June.
2017), PP 59-70
[2] Osama Abu Abbas "Comparisons between data clustering algorithms", The International Arab Journal of Information
Technology,Vol. 5No. 3,July 2008
[3] Dr. S.K. Jayanthi, C. Kavi Priya "Clustering Approach for classification of research articles based on keyword search",
International Journal of Advanced Research in Computer Engineering & Technology(IJARCET) Volume 7,Issue
1,January 2018,ISSN: 2278-1323
[4] Bhagyashree Pathak, Nilanjan Lal "A Survey on Clustering Methods in Data Mining", International Journal of Computer
Applications(0975-8887) Volume 159- No 2, February 2017
[5]Mythili S et al, International Journal of Computer Science and Mobile Computing, Vol. 3,Issue. 1,January 2014, pg.335-
340
[6]. An analysis on Clustering Algorithm in Data Mining Mythili S1, Madhiya E2 International Journal of Computer Science
and mobile Computing.
[7]. A Survey on Recent Traffic Classification Techniques Using Machine Learning Methods in M.Tamilkili journal of
Advanced Research in Computer Science and Software Engineering.
[8]. Survey paper on Clustering Techniques in Amandeep Kaur Mann(M.TECH C.S.E)International journal of
Science,Engineering and Technology Research(IJSETR).
[9]. Clustering Techniques and the Similarity Measures used in Clustering Survey Jasmine lrani Nitinpise Maduraphatak
International of Computer Application(0975-8887)Volume 134- No.7,January 2016.
[10]. A Survey of Data Mining Clustering Algorithm in Mihika Shah Sindhu Nair International Journal of Computer
Applications..
[11]. Jun Zhang, Yang Xiang, Wanlei Zhou, Yu Wang, Unsupervised traffic classification using flow statistical properties
and IP packet payload, Journal of Computer and System Sciences 79 (2013) 573-585.
[12]. JyotiYadav, Monika Sharma, A Review of Kmean Algorithm, International Journal of Engineering Trends and
Technology (IJETT) - Volume 4 Issue 7- July 2013.
[13]. G. Sathiya and P. Kavitha, An Efficient Enhanced K-Means Approach with Improved Initial Cluster Centers, Middle-
East Journal of Scientific Research 20 (4): 485-491, 2014.
[14] Ghahramani, Zoubin.” Unsupervised learning.” Advanced lectures on machine learning. Springer Berlin Heidelberg,
2004. 72-112.