0% found this document useful (0 votes)
27 views6 pages

Key Frame Extraction Analysis Based On Optimized Convolution Neural Network OCNN Using Intensity Feature Selection IFS

The document presents a novel method for key frame extraction from video using an Optimized Convolution Neural Network (OCNN) and Intensity Feature Selection (IFS), aimed at improving visualization and understanding of motion content. The proposed approach involves noise reduction, dimensionality reduction, and clustering techniques to efficiently identify crucial frames while minimizing redundancy. Experimental results indicate that this method enhances key frame retrieval accuracy and performance compared to existing techniques.

Uploaded by

loveleentak
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views6 pages

Key Frame Extraction Analysis Based On Optimized Convolution Neural Network OCNN Using Intensity Feature Selection IFS

The document presents a novel method for key frame extraction from video using an Optimized Convolution Neural Network (OCNN) and Intensity Feature Selection (IFS), aimed at improving visualization and understanding of motion content. The proposed approach involves noise reduction, dimensionality reduction, and clustering techniques to efficiently identify crucial frames while minimizing redundancy. Experimental results indicate that this method enhances key frame retrieval accuracy and performance compared to existing techniques.

Uploaded by

loveleentak
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

2022 2nd International Conference on Technological Advancements in Computational Sciences (ICTACS)

Key Frame Extraction Analysis Based on Optimized


Convolution Neural Network (OCNN) using Intensity
2022 2nd International Conference on Technological Advancements in Computational Sciences (ICTACS) | 978-1-6654-7657-7/22/$31.00 ©2022 IEEE | DOI: 10.1109/ICTACS56270.2022.9988474

Feature Selection (IFS)


Dr. T. Prabakaran Loveleen Kumar Dr. Ashabharathi S
Associate Professor Assistant Professor, Visvesvaraya technological University
Department of CSE Department of Computer Science and India
Joginpally B.R. Engineering College, Engineering, [email protected]
India Swami Keshvanand Institute of Technology,
[email protected] Management & Gramothan, Jaipur,
Rajasthan
[email protected]
Mochammad Fahlevi
Dr Prabhavathi S Dr Maneesh Vilas Deshpande Department of Management,
Professor Assistant Professor BINUS Online Learning,
RYMEC Ballari Department of Computer Science Bina Nusantara University,
India Tai Golwalkar Mahavidyalaya Ramtek Indonesia 11480
[email protected] India [email protected]
[email protected]

Abstract-The multimedia is playing role of timing frames in detection is the basis for locating duplicate video. Based on
videos. The representation frame shows the intention on video sophisticated analysis in duplicate key frame detection [2], this
definition. The keyframes the important factor for extraction paper enables grayscale pyramid (GSP) to enhance the global
information from video frames. The non-related frames is a features of color maps. By creating a spatial pyramid of
problem for finding new key exposure. In this paper, we present
brightness and sheer size, the algorithm enhances the
a new method for extracting essential frames from motion
capture data using Optimized Convolution Neural Network brightness of the global features and the strength of the shear-
(OCNN) and Intensity Feature Selection (IFS) for better scale changes [3]. Experiments have shown that GSP is much
visualisation and understanding of motion content. It first stronger than color charts in detecting recurring key frames
removes noise from motion capture data using the Butterworth near light and scale transformations, and for other
filter, then reduces the size via principal component analysis modifications, both are almost equally effective.
(PCA). Finding the zero-crosses of velocity in the main
components yields the initial set of crucial frames. To avoid This movement allows us to obtain fully (or at least)
redundancy, the first batch of important frames is divided into fully closed and fully open key frames and initiate the
identical poses. Experiments are based on data access from partitioning process. Gabor filtration is done by analyzing the
frames in the motion capture database, and experimental results image structure used [4]. After this stage, use derivative
suggest that crucial frames retrieved by our method can improve techniques and interpolations to define where (if any) the
motion capture visualisation and comprehension.
decay is. Finally, the pathology is localized and some features
Keyframes: Video framing, keyframes, deep learning, feature can be extracted for each fold.
extraction, classification, principal component analysis.
II. RELATED WORK
I. INTRODUCTION Key frame Extraction is a simple and effective
method for accomplishing this. Because it just gives the
Key frames are commonly utilised in non-linear
video's main material, key frame extraction is also known as
browsing and video content analysis applications. It is critical
video abstraction. Because key frames are used to summarise
to understand how to swiftly extract key frames from video.
basic video content, they are very significant in video data
This article [1] presents a method for extracting key frames
applications, particularly for redundant monitoring. The
from compressed video streams. The similarity packages of
motion as the main feature that is calculated using the
nearby I-frame DC images are determined first, followed by
difference between the frames can be used to define key
the clustering technique, and lastly the major frames are
frames. To generalise video content, the suggested technique
chosen based on the clustering findings. Experiment findings
makes use of the interaction of colour channels.
show that our method can extract relevant key frames from test
video files fast and easily. Key frame extraction is one of the most important
issues in video comprehension and recovery research,
The rapid growth of online videos has created an
especially as the number of uploaded individual videos is
urgent need to find almost copy video. Nearby copy key frame

842
978-1-6654-7657-7/22/$31.00 ©2022 IEEE

Authorized licensed use limited to: Malaviya National Institute of Technology Jaipur. Downloaded on December 30,2022 at 04:26:05 UTC from IEEE Xplore. Restrictions apply.
2022 2nd International Conference on Technological Advancements in Computational Sciences (ICTACS)

increasing rapidly. Frame clustering is theoretically mature 101 datasets employing Conv LSDM-based video classifiers.
method of core frame extraction. However, the clustering The test results are analysed using one-way ANOVA, which
process can take a long time, which seems impractical. In this demonstrates that the proposed key frame extraction approach
paper, we propose an improved clustering method based on can be utilised to greatly increase video classification
video features and demonstrate that it is useful in key frame accuracy.
extraction.
III. PROPOSED METHOD
Key frame extraction is the basic process of video
content analysis and retrieval. This paper proposes an effective A key frame method has been proposed to reduce the
fast method to effectively extract key frames from compressed operating data by extracting key frames using motion analysis
video streams. It first calculates the homogeneity of DC methods in the sample window. The proposed Optimized
images of adjacent I frames, then clusters the homogeneity Convolution Neural Network (OCNN) And Intensity Feature
using the k means algorithm and finally selects the main Selection (IFS) from motion capture data for better
frames according to the clustering results. Experimental visualization and understanding of motion content. It first uses
results show that our system can extract the correct key frames the Butterworth filter to remove noise from motion capture
from the test video files and extract the key frames in less time. data, and then performs principal component analysis (PCA)
to reduce the size Calculates the motion variance in the model
Color disparity is a pressing issue in stereoscopic window without the excluded frames and in the original
video. This paper presents a robust method for colour motion. The key factor in determining whether a frame in a
correction in stereoscopic video. To begin, we employ Scale- sample window is a key frame choice is the variation of the
Invariant Feature (SIFT) based feature point matching to operating variation. Simulation results show that this method
locate the image's matching points. The parameter model can achieve good overall visual quality for different types of
between the photos is then determined using the value of each movements. Improves average square error measurement by
matching point pair, and the optimal value is produced using up to 52% compared to existing key frame extraction methods
the minimal square approach. Finally, color-based key frame (i.e. curve simplification)
extraction technology is employed to perform consistent
colour correction in the parameter model of each key frame
and its subsequent frame. The experimental findings suggest
that the proposed method can create good editing quality for
stereoscopic videos.
Key frame extraction is critical for video recovery.
We present a new approach for retrieving the frequency-
adaptive human motion sequence model, producing high-
quality key frames, and extracting the human motion sequence
key frame. First, we define an inter-frame similarity measure
based on body part characteristics. The Affine Propagation
Clustering Algorithm then extracts critical frames. This
method begins with video information distribution, finds the
optimal key frame of the video with adaption, and accelerates
the system speed. Finally, the key frame-based sequence
reconstruction rating was validated. Comparative experiments
on the CMU database assess the effectiveness of our system. Figure1: proposed architecture (OCNN-IFS)
Combining Conventional Neural Networks (CNN) Creating movie codes is a difficult and expensive
with Recurrent Neural Networks (RNN) provides a powerful process that seeks to automate. Algorithms for finding the
framework for video classification difficulties since boundaries of a scene are readily available, but little work has
spatiotemporal information can be successfully processed been done on selecting individual frames to briefly represent
simultaneously. This research compares how CNNs and the scene. This paper introduces a new method for
RNNs can be used to improve video classification automatically selecting custom key frames based on the
performance when transliteration is employed to leverage content of the display. Figure1 shows the proposed
transient information, employing transmission learning. architecture OCNN-IFS. Following a detailed description of
some of the algorithms, we analyze how humans perceive the
An new action template-based key frame extraction selected law as representing the scenario. Finally, we will
method was proposed to improve the performance of the show you how to combine these algorithms with existing ones
authentication framework to effectively combine CNN and to find the boundaries of your scenario.
RNN by detecting the information regions of each frame and
picking key frames based on the similarity between these 3.1 Pre-Processing Query frame
areas. Extensive tests were carried out on the KTH and UCF-

843

Authorized licensed use limited to: Malaviya National Institute of Technology Jaipur. Downloaded on December 30,2022 at 04:26:05 UTC from IEEE Xplore. Restrictions apply.
2022 2nd International Conference on Technological Advancements in Computational Sciences (ICTACS)

& '

!"#($, %) = ( ()%( . +) − $( , +)-. (.)


The natural scene images contains the realistic part of
&'
object contents with noise like shadowing, blur contents,
lightening, defected contrasting etc., in this stage / +/
preprocessing is used to remove the noisy content to resize the
structural image. Using the filters are applicable to remove the Where m and n are the number of rows and columns in the
non-gradient gray conversion to specify the structure of region of interest, respectively.

(.1$ 1% + 3 )(.4$% + 3. )
""0!($, %) = (5)
keyframe content
3.2 key Frame localization (1.$ 1.% + 3 )(4.$ 4.% + 3. )
The focus is mainly on the central area when Then the best fit is found worldwide max. About color
recording and viewing. Therefore, the area outside the video channels and templates. The sum of operations is performed
frame is usually clipped before finding the area of interest. The on all channels, and a different average is applied to each
action area is then created as the centerpiece of the video channel.
frame. This will create the greatest or least similarity between
successive frames and create a video template for tracking the R (x, y) is the correlation coefficient (x, y) for a single
action area. connection, where the coordinates of each pixel in the frame
Keyframes can be extracted more accurately and efficiently by are represented. Is there an average pixel value in the T '(x ', y
calculating only the difference in the running area between the ') template? T, T '(x ', y ') returns the coordinates of each pixel
frames throughout the video, minimizing the effects of in the template, as seen below:
possible dynamic backgrounds. Use the first two laws to
67 (87 , 97 ) = 6(87 , 97 ) − ∙ ( 6(8' , 9' ) . (;)
( . )
calculate the average square error (MSE) for each possible
8' 9'
area. Select the area that makes up the largest MSE as the
action template for all framesKey frame extraction:
• Calculate the structural similarity measure Si 3.4 Frame Feature intensity rate
between regions of interest on consecutive frames fi;
fi 1. I' (x+x',y+y') on the other hand is the average of pixel values
• Compare the similarity score to thresholds T1 14 of a specific frame I in the region overlapped with the
20:65; 0:90 and T2 14 20:65; 0:95 (these threshold template, T specified as;
07 (8 + 87 , 9 + 97 )
values were established by examination of

= 0(8 + 87 , 9 + 97 ) −
significance in action changes in our experiments):
( . )
Add fi to primary list ðpf Þ

∙ ( 0<8 + 8 , 9 + 9 >
" "
(?)
Add fi to alternative list ðaf Þ
• Repeat until the video is finished, with Npf frames extracted
into pf and Naf frames extracted into af.. 8' 9'
Extract
Where x "= 0,......, w-1andy "= 0,..., h-1 is in the
Key frame selection: template (x, y) after relocating the template's centre in the
• Set the number of key frames ðNkf Þ. frame. This term refers to new integrations. T (x ', y ')
• Find key frame ratio ðkÞ: represents the pixel value (x, y) of the template pixel, and I (x
+ x ', y + y ') represents the pixel value of the matching pixel
position. In. Following the template matching technique, the

⎧ , ≥
region of interest for each frame will be customised to the

= ( )
location with the highest chance of maximum fitting.

⎨ !( ) = ( (|AB8 ( , +, )| + CAB9 ( , +, )C (D)


⎪ ,
⎩ +

Return the indexes of key frames by choosing a frame from OF x I j; t) is the x component of the optical flow in
every k frame from key frame list pixel I as well as the frame's j, t, and y elements. Because the

p If N ≥N , or from key frame list a otherwise.


optical flow records all points across time, the sum equals the
amount of motion across the frames. This function's slope
represents the change in motion between succeeding frames,
3.3 Frame Intensity specialization
therefore local and local reflect a constant or significant
Background changes between successive frames function between rows.
reduce structural uniformity (SSIM) and different actions 3.5 CNN spectralization
enhance MSE. The candidate area with the largest MSE
between successive frames is assigned as an action template.

844

Authorized licensed use limited to: Malaviya National Institute of Technology Jaipur. Downloaded on December 30,2022 at 04:26:05 UTC from IEEE Xplore. Restrictions apply.
2022 2nd International Conference on Technological Advancements in Computational Sciences (ICTACS)

This CNN model has been trained to predict a score performance values are evaluated by precision and recall rate
based on the quality of the face in the shot. Without the need with tested trained set of positive and negative values.
for face recognition, the key frame is chosen based on this
score. For key frame recognition, the selected key frames are Analysis of precision
then transferred to an intensity-based back-end deep neural 100

performance in %
network (DNN) with optimised PCA from feature assessment. 90

E= (F .
− F
. .
(G)
80
.
/
70
60
Step 1: Start the procedure.
50
Step 2: Then Granulate Computing takes neural construction Support Max- Discrete OCNN-
form L images. vector region wavelet IFS
Step 3: After that process of image granulation with PCA, VID-NAT ICDAR Multi-motion
build a relationship between feature substance of key frames
process ‘p’.
Figure 2: Analysis of a precision rate
Step 4: Then, image partitioning transfers the image into the
feature vector and merges the vector space images. The above figure 2 shows the observed true positive
precision rates from different dataset with dissimilar
Step 5: After that, Deep CCN creates feed forwards successive methods, the proposed implementation produce higher
threshold margins in LSTM function to fix threshold features. efficient rate than other methods.
Step 6: Then acceptance of pooling weight to learn about the
frame definitions.
Step 7: Next, the return the match case key frames relatively TABLE 4.1: ANALYSIS OF PRECISION RATE
immense to max support. Analysis of precision in %
Techniques Support Max- Discrete OCNN-
Step 8: Attention substitution for continue frame slot (Fs) dataset used vector region wavelet IFS
Step 9: return Frame slot (Fs). VID-NAT 67.3 71.1 74.3 88.4
ICDAR 69.3 66.7 74.1 89.5
The proposed design aims to improve face-to-face Multi-motion 68.2 71.3 75.2 86.4
face recognition accuracy in the video recognition system, video dataset
reduce the amount of data transfer over the network and The above table 1 shows the resultant of precision
improve the real-time processing power of the face in the rate with different image collection dataset produced by
video recognition system. During the training phase, the CNN different methods. The proposed OCNN-IFS produce 88.4
training utilizes the Euclidean loss function, as shown in the % well than other methods.
key extraction based on equation formation.
The evaluation of recall observed by,
|S|
∑TUV NOP QR (WX)
HIJKLL = ∗ ZKL[I \][^_^`I −
III. RESULTS AND DISCUSSION
|W|
_HaI bIcK_^`I dK_JℎI[ …. (8)
The proposed implementation results are test with
image processing tool in mat lab with trained features. The
performance evaluation are carried to test with sensitivity and
specificity measure of precision and recall values obtains in
execution stage. Optimized Convolution Neural Network
(OCNN) And Intensity Feature Selection (IFS) from motion
capture data for better visualization and understanding of
motion content. It first uses the Butterworth filter to remove
noise from motion capture data, and then performs principal
component analysis (PCA) to reduce the size. We have
evaluated the proposed computation is compared with
different methodologies inspected previously. The collected
dataset from UCI repository contain natural scene image
collection VID-NAT (video nature), Multi-motion video
dataset, ICDAR. The experimentation of various detection
calculations was completed on different images. The

845

Authorized licensed use limited to: Malaviya National Institute of Technology Jaipur. Downloaded on December 30,2022 at 04:26:05 UTC from IEEE Xplore. Restrictions apply.
2022 2nd International Conference on Technological Advancements in Computational Sciences (ICTACS)

100
Analysis of Recall false extraction ratio
14
performance in %

False classificatinon in %
90 12
80 10
8
70
6
60 4
50 2
Support Max-region Discrete OCNN-IFS 0
vector wavelet Support Max-region Discrete OCNN-IFS
VID-NAT ICDAR Multi-motion vector wavelet
VID-NAT ICDAR Multi-motion
Figure 3: Analysis of recall
The above 3 shows the analysis of recall rate tested Figure 4: Analysis of false extraction
with different dataset. The collected dataset have
differential tested value produced by different methods. The The above analysis in figure 4 shows the differential
proposed OCNN-IFS system have higher recall rate than evaluation of false results compared with dissimilar methods.
other methods. The implementation of new system have higher efficient result
than other result.
TABLE 3: ANALYSIS OF FALSE EXTRACTION
TABLE 2: ANALYSIS OF RECALL
Analysis of false extraction in %
Analysis of recall in Techniques Support Max- Discrete OCNN-
% dataset used vector region wavelet IFS
VID-NAT 10.6 9.6 8.6 5.2
ICDAR 11.3 10.2 9.7 6.3
Techniqu Supp Max Discr OCN Multi-motion
12.3 11.3 10.2 5.6
es dataset ort - ete N-IFS video dataset
used vect regi wavel
or on et
VID- 67.3 75.2 78.2 89.3 The above table 3 shows the analysis of false
NAT extraction produced by dissimilar methods tested with
ICDAR 66.3 71.4 79.1 87.2 differential dataset, the VID-NAT, ICDAR and Multi-motion
Multi- 68.7 73.2 76.2 88.4 video dataset produce consecutive low false rate the proposed
motion system OCNN-IFS produce 3.4 % low false rate.
video
×
vwvxy z{x|}~ •xu€}y}v€ vw •w‚}~~ zu €xvx~}v
Time complexity (Tc)
=∑ /u
dataset
/j ……. (10)
ƒz{} vx }u(ƒ~)

The above table 2 shows the analysis of recall rate


tested with extraction of positive negative key frames . Analysis of time complexity
20
Time complexity in ms

Resultant proves the proposed system have higher efficiency 18


of recall rate up to 88.4 % well than other methods. 16
14
12

=∑h/j ×
h/i PkPOl ROPOmnP oOplnR pNOqnm (rns)
False retrieval ratio (Frr) 10
8
tkPOl ik ko pNOqn sOPn(rs)
…….(9) 6
4
2
0
Support Max- Discrete OCNN-IFS
vector region wavelet
VID-NAT ICDAR
Multi-motion

846

Authorized licensed use limited to: Malaviya National Institute of Technology Jaipur. Downloaded on December 30,2022 at 04:26:05 UTC from IEEE Xplore. Restrictions apply.
2022 2nd International Conference on Technological Advancements in Computational Sciences (ICTACS)

Figure 5: Analysis of time complexity [5] C. Lv and Y. Huang, "Effective Key frame Extraction from
Personal Video by Using Nearest Neighbor Clustering," 2018 11th
The above figure 5 shows the dissimilar methods of International Congress on Image and Signal Processing,
BioMedical Engineering and Informatics (CISP-BMEI), 2018, pp.
with differential VID-NAT, ICDAR and Multi-motion 1-4, doi: 10.1109/CISP-BMEI.2018.8633207.
video dataset. The collected dataset represent the [6] F. Shi and X. Guo, "Key frame extraction based on kmeas results
differential key frame formats in natural videos as well to adjacent DC images similarity," 2010 2nd International
extracted proposed system with high efficiency. Conference on Signal Processing Systems, 2010, pp. V1-611-V1-
613, doi: 10.1109/ICSPS.2010.5555457.
[7] C. Lü, J. Li, X. Chen and J. Pan, "Stereoscopic Video Color
Correction Based on Key frame Extraction," 2013 Sixth
International Symposium on Computational Intelligence and
Table 4: Execution of time complexity Design, 2013, pp. 250-253, doi: 10.1109/ISCID.2013.176.
[8] B. Sun, D. Kong, S. Wang and J. Li, "Key frame extraction for
Execution of time evaluation (milliseconds- ms)
human motion capture data based on affinity propagation," 2018
Techniques Support Max- Discrete OCNN-IFS IEEE 9th Annual Information Technology, Electronics and Mobile
dataset used vector region wavelet Communication Conference (IEMCON), 2018, pp. 107-112, doi:
VID-NAT 12.7 10.1 8.2 6.7 10.1109/IEMCON.2018.8614862.
ICDAR 14.1 14.5 9.6 7.2 [9] M. Kim, L. Chau and W. Siu, "Key frame selection for motion
Multi-motion 18.1 17.6 12.3 9.1 capture using motion activity analysis," 2012 IEEE International
video dataset Symposium on Circuits and Systems (ISCAS), 2012, pp. 612-615,
doi: 10.1109/ISCAS.2012.6272106.
[10] D. Diklic, D. Petkovic and R. Danielson, "Automatic extraction of
The above table 4 shows the execution state of representative key frames based on scene content," Conference
different methods with different time taken process. The Record of Thirty-Second Asilomar Conference on Signals,
proposed system test with dataset image collected dataset Systems and Computers (Cat. No.98CH36284), 1998, pp. 877-881
VID-NAT, ICDAR and Multi-motion video dataset, the vol.1, doi: 10.1109/ACSSC.1998.751008.
proposed OCNN-IFS system produce 6.7 (ms) higher
efficiency than other methods with lower execution state.

V. CONCLUSION
The research implementation concludes the detection
of Optimized Convolution Neural Network (OCNN) And
Intensity Feature Selection (IFS) from motion capture data for
better visualization and understanding of motion content. The
features are extracted by redundant form of discriminating
factors frame content based on the leaning, support objects ,
entities etc. The segmenting frame region are carried out using
connected component feature selection with multi objective
intensive access process. The proposed system higher
efficiency as well precision rate 90.1 %, similar recall rate
91.6%, the false rate reduction up to 5.2 % compared to other
system with lower 5.2 (ms) execution time complexity.

REFERENCES
[1] Y. Yang, L. Zeng and H. Leung, "Key frame Extraction from
Motion Capture Data for Visualization," 2016 International
Conference on Virtual Reality and Visualization (ICVRV), 2016,
pp. 154-157, doi: 10.1109/ICVRV.2016.33.
[2] E. M. I. Alaoui, A. Mendez, E. Ibn-Elhaj and B. Garcia, "Key
frames detection and analysis in vocal folds recordings using
hierarchical motion techniques and texture information," 2009
16th IEEE International Conference on Image Processing (ICIP),
2009, pp. 653-656, doi: 10.1109/ICIP.2009.5413745.
[3] Xiaojun Guo and Fangxia Shi, "Quick extracting key frames from
compressed video," 2010 2nd International Conference on
Computer Engineering and Technology, 2010, pp. V4-163-V4-
165, doi: 10.1109/ICCET.2010.5485659.
[4] B. F. Momin and G. B. Rupnar, "Key frame extraction in
surveillance video using correlation," 2016 International
Conference on Advanced Communication Control and Computing
Technologies (ICACCCT), 2016, pp. 276-280, doi:
10.1109/ICACCCT.2016.7831645.

847

Authorized licensed use limited to: Malaviya National Institute of Technology Jaipur. Downloaded on December 30,2022 at 04:26:05 UTC from IEEE Xplore. Restrictions apply.

You might also like