Real-Time Face Swapping System Using OpenCV
Real-Time Face Swapping System Using OpenCV
Abstract— Face swapping has been a thriving genre of work interpretation system was developed for the blind community
which primarily is associated with the replacement or and advancements began reaching out to the masses [4].
substitution of one referral face on the face of the other person,
be it by means of still images or in real time. This paper With gradual results, several enthusiasts of this field
exhibits the research on Real-Time Face S wapping algorithm, began taking leaps on this interesting field of study. It was
using the reference image of the user and henceforth keenly observed that there were drastic advancements in the
considering it as our input image instead of using a data set field when the internet hit the world! The internet was in
thereby working on a real time model by the use of the python itself a huge turnover and with the help of this worldwide
open-source model of Computer Vision i.e., OpenCV. The connection there came a different plethora of working
input image facial features and attributes are extracted and are environments in this field [5]. With cameras, webcams,
replaced to make the final resulting output through our model. integrated webcams becoming household items, and there
The training of the model starts with learning the facial came up an era with extremities of advancements in digital
alignments and features to be able to recognize a face and photography, it started pulling out the interests of the masses,
extract its features before swapping it. Now, heading over to there were a series of manipulations that began [11]. Be it the
the process of image warping by the dissociation and clicking of pictures for memories, or recording videos, or
partitioning of the user’s face and its background. To increase using filters, or editing the pictures and videos, and this way
the precision of the model, in the second part of parsing the
Computer Vision became limitless.
face is performed to cancel out other features of the live image,
hence the output is obtained. In this research, it was tried to Similarly, face-swapping is not a new sub-branch of
bring accuracy an d precision to the model by improving the study and there are still works that are going around every
condition of the image, editing out borders, and training it to a day to have a system that can show the most efficient results
bunch of data for face recognition and correction. On and meet the demands of various sectors [12]. Be it in image
analyzing it delivers the most subtle outputs alongside capture, entertainment purposes, or insecurity and privacy
performing the tasks with the process including the analysis of models. We too attempted to take a short step in this genre
our image and its output too. The model delivered is to improve and worked on a model which can carry out the process of
the conditions of security, privacy, image capture, and
face swapping in a real-time system [13].
entertainment purposes.
So basically, our model on face swapping deals with the
Keywords— Face Swapping, Real time system, Computer real-time camera, be it a webcam or any external camera
Vision, Face arrangement, Face substitution, Masking, and which is integrated with the constructed software model. It
Warping sets up the connected camera [14]. Then it checks up on the
reference im-age of the background code and detects the
I. INT RODUCT ION
necessary facial points, scrapes the im-age, and keeps it
Co mputer Vision is an ocean deep topic of Artificial ready. Finally, as output, we see our face, with the image that
Intelligence that focuses primarily on and deals with the is scraped from the reference image source, and warped on
visual world’s structured understanding by an artificial the live output. So basically, we are using a 2D image and
system [1]. The in itial works and findings on Computer integrating it on a 3D live output [15].
Vision began back in the 1950s which arose as a discipline of
study to make a non-living system visualize the outside Although the existing face-swapping software or
world, just in a manner the way the hu man eye interprets [2]. algorithms that were developed by researchers all over the
The concept initially was no doubt considered pretty vague world mainly deal with entertainment purposes be it the
but if science never believed in the vague then we would still creation of filters or the used in movies for stuntmen, and so
have a world running on assumptions and notions. So, the forth [16]. But as we dig deeper into the understanding of
scientists began their work on Neural Networks and face swapping and try to create algorithms that use a
developed their first basic system on object detection. It is minimu m amount of pre-set data to train the model, we could
used to understand the presence of objects and demarcate also formu late some more usage of this proposed Computer
between the objects as unique entities [3]. Gradually as time Vision approached system [17]. Like in the current world
passed and people developed an intent to explore the field scenario we have observed that mas king is a serious problem
furthermore, a relat ively advanced system came up in the in fraudulent activities, so if our software assists in warping a
1970s wherein computer vision was taken up to commercial face on our face in a live camera, then it should also be able
industries and it is then when people understood the concept to do the reverse [18,19]. Hence with the assistance of our
where something of this sort exists. The system could read approach on face swapping, an algorithm could be set up
handwritten or typed content employing recognition such that if a masked face is given as input, we are
algorithms of Co mputer Vision. This way a written content successfully able to obtain the original face in real-time [11].
1082
Authorized licensed use limited to: National Institute of Technology- Meghalaya. Downloaded on June 06,2024 at 05:50:33 UTC from IEEE Xplore. Restrictions apply.
Now with this, we have obtained what was our primary
requirement, i.e., the angle. Hence our basic task now is to
just rotate the image up to this angle and finally, we shall
obtain both the frontal faces aligned. The only difference is
that the alignment in the case of a 3-D image is not the same
as that of a 2-D image. Hence primarily for a 3-D matrix
image rotation takes place through “Geo metrical
Transformation” [19]. Therefore, we just have to transform
our matrix with the rotation matrix. The rotation matrix is
given in below (2).
[ ]
Fig. 2. Shows face warping and morphing
It is a clear observation that if we obtain the center point
of our image at the origin, the rotation will be easy but that is And as we know that the face detected in a live webcam
a pretty rare and hypothetical situation hence we shall go will have possible two kinds of movements, facial muscle
forth with the generalized form. In this, we first rotate and movements, and movements of the whole face. Therefore,
then carry out the process of translation [16]. Thus, the the warped image will adjust itself with the probable muscle
matrix looks like this as shown in below (3) after the final movements of the face but when it comes to the movements
transformation. of the whole face, the detection should be strong enough else
the warped image will be placed somewhere else [23].
[ ] [ ][ ] [ ]
( ) (4)
Although it seems to be a greater chunk of calculations,
all of this is guided and con-trolled by the Nu mPy module of
Python. Hence with this, we have our reference im-age and
also the live frames of input face aligned and subject to Fig. 3. Shows the desired output that we are expecting from our
warping. system
B. Warping of the Aligned Face This fact depends on two primary factors, the first being
Once the image has its land mark points, i.e., set of all the the camera strength and range that we are using alongside
stable points, set and aligned as we discussed previously, our our software and the second being the lighting that is
next primary step is to warp the reference image. Warping provided. Both of these shall matter on how efficiently we
the image means distorting it in such a manner that the image obtain the results.
be-comes flexib le enough to take in any possible changes. The above Fig.3 exactly shows in still format what we
This way our aligned reference image can take up the live desire to obtain as output from our system in real-t ime. In
frontal face and thereby allow us to get the swapped face this image, parsing is playing a ma jor role, because without it
[22]. Thus, basically image warp ing changes the domain of there will be a distinct boundary between the reference image
the image allowing it to itself fit in other shapes and sizes and the input face [24]. Parsing allows the reference image to
without losing much of its individuality. Mathematically it adjust on the landmark points of the input face and thereafter
can be shown as in (5), show a comp letely merged face as output with no specific
( ) ( ( )) (4) demarcation.
IV. A PPROACH
where g(x) is the original representation of the facial
The expected efficiency of our model is relatively higher
landmarks and f(T(x)) represents the distortion amount of the
than the other existing models since we are using just the two
landmark points as a curve allowing the image to make
primary libraries OpenCV and Nu mPy which are the base
minor stretches and flexibilities for our necessary facial modules of our approach to the problem and works over to
replacement on any face. The triangular face warping and reading and mapping the face of our reference image to set
the features lines obtained for the still images are represented them to training[14]. The following Fig. 4 is the pseudo-code
in the below Fig. 2. that shall give a basic structured idea about the proposed
C. Replacement of the Warped Face solution. The Delaunay triangulation is a mathematical
concept which states that when we point sets in a series we
Now with the previously followed steps, we are just left with ought to assemble and organize the data of point sets. The
the last and final touch to get our desired output. This warped concise way to approach this assembling is by triangulating
reference image is now carefully p laced over the face the point set.
detected by the system on the live camera.
1083
Authorized licensed use limited to: National Institute of Technology- Meghalaya. Downloaded on June 06,2024 at 05:50:33 UTC from IEEE Xplore. Restrictions apply.
Aligning both reference image and the frontal face
to understand the axes and make transformat ions if
needed [13]
Crop out the image over these stable points and
warp the image such that it can easily be placed
The aligned and warped reference image is then
placed on the input frontal face
The first output shall show how well the reference
image is placed and is subject to sticking to the face
in our desired location subject to movements of all
sorts [20].
Lastly, parsing takes place, removing the unwanted
broadly demarcated boundaries and merging the
Fig. 4. The setting of landmark points for the image two, such that, no difference is observed giving us
the said output
We post the possible landmarks of our reference image’s
face structure and start triangulating them, as in Fig.2. This
makes it possible to capture the slightest of changes in V. RESULT S AND DISCUSSION
postures or movement [12,13]. Warping the image of the
The results obtained during our research of choosing a
landmark is a component in Computer Vision, which
prior algorith m to work on and trying which suits our output
basically focuses on triangulation. The existing frames, be it
an image or a segment of a v ideo or any 3-D structure can be the best and processes sustainable compatible images.
reduced to smaller segments, i.e. triangles, by the process of
warping [23]. The Pseudo-codes and above approach are the
base for our model to work, which as mentioned above starts
by cropping the facial features of the image, performing
triangulation, using the Delaunay approach for triangulation,
and finally warp ing the image [24]. The sample code
frag ments for performing the triangulation and warping of
the im-age after triangulation is illustrated in the below Fig. 5
and Fig. 6.
Fig. 5. Triangulation of the reference images with the help of We verify the attributes of our code and analyze the
landmark points precision and accuracy of the merge of both the images. Our
code uses the live image as its data and the second swapping
image can be updated whatever image would like to use for
swapping. Here in Fig.3 we choose a celebrity’s image as our
reference. The images are irrespective of different skin color,
eye color, eye shape, forehead size, face shape, ear lobes,
chin shape, and different face structures. Here, we also accept
different accessories like g lasses, headbands, jewelry p ieces,
etc. The unfiltered frontal face observed on a live camera is
Fig. 6. Warping the image after triangulation is set
depicted in Fig. 7. Our output images after swapping with
our reference images, are shown in Fig. 8 and Fig. 9. The
A. Algorithmic Approach
output image is well versed in merged fac-tors such as our
In terms of the algorithmic approach, we followed the skin color, our eye color, eye shape, ear lobes, nose shape,
following pipeline structure for our system as proposed in and other facial features with the reference image. We
[12,13]. perform this task only after training our model using a neural
Detection of the frontal face on live camera network model to provide precise output because without
Reading the reference image that is the only input train-ing the model, we lose our advantages of intricate
we are putting into our system details like background removal and borderline segregation.
Setting up the landmark points for the reference The first output in Fig. 8 is that we obtained and is
image somewhat like an attempt to check whether our reference
image is placing well on our frontal face in the live cam-era
and also can take in the necessary movements. It is more like
1084
Authorized licensed use limited to: National Institute of Technology- Meghalaya. Downloaded on June 06,2024 at 05:50:33 UTC from IEEE Xplore. Restrictions apply.
the warped image is worn as a mask over the input face. In REFERENCES
this image, a clear resemb lance is observed with the
reference image. The final output takes care of the final [1] Dongyue Chen, Qiusheng Chen, Jianjun Wu, Xiaosheng Yu, Tong
merging of both images not allowing any demarcation Jia,: "Face Swapping: Realistic Image Synthesis Based on Facial
between the reference and input frontal face as shown in Fig. Landmarks Alignment", Mathematical Problems in Engineering, vol.
2019, Article ID 8902701, 11 pages, 2019.
9. It is portrayed with utter precision making no denial
[2] Y. Zhang, L. Zheng, and V. L. L. Thing, "Automated face swapping
towards the said output. and its detection,:" 2017 IEEE 2nd International Conference on Signal
and Image Processing (ICSIP), Singapore, 2017, pp. 15-19, DOI:
10.1109/SIPROCESS.2017.8124497.
[3] I. Korshunova, W. Shi, J. Dambre and L. Theis,: "Fast Face-Swap
Using Convolutional Neural Networks," 2017 IEEE International
Conference on Computer Vision (ICCV), Venice, Italy, 2017, p p.
3697-3705, DOI: 10.1109/ICCV.2017.397.
[4] Luming Ma and Zhigang Deng. 2020.: “Real-time Face Video
Swapping From A Single Portrait”. In Symposium on Interactive 3D
Graphics and Games (I3D '20). Association for Computing
Machinery, New York, NY, USA, Article 3, 1–10.
[5] Zhang, W.; Zhao, C. :”Exposing Face-Swap Images Based on Deep
Learning and ELA Detection”. Proceedings 2020, 46, 29.
[6] Sountharrajan, S., et al.: "Automatic classification on bio medical
prognosisof invasive breast cancer." Asian Pacific Journal of Cancer
Prevention: APJCP 18.9 (2017): 2541.
[7] Sountharrajan, S., et al.: "Automatic glioblastoma multiforme
detection using hybrid-SVM with improved particle swarm
Fig. 8. Shows the warped reference image placed flatly on the optimisation." International Journal of Biomedical Engineering and
frontal input face T echnology 26.3-4 (2018): 353-364.
[8] Suganya, E., et al.: "Mobile cancer prophecy system to assist patients:
Big data analysis and design." Journal of Computational and
T heoretical Nanoscience 16.8 (2019): 3623-3628.
[9] Karthiga, M., et al.: "Machine Learning Based Diagnosis of
Alzheimer’s Disease." International Conference on Image Processing
and Capsule Networks. Springer, Cham, 2020.
[10] Karthiga, M., et al.: "Malevolent Melanoma diagnosis using Deep
Convolution Neural Network." Research Journal of Pharmacy and
T echnology 13.3 (2020): 1248-1252.
[11] S. Mahajan, L. Chen and T. T sai,: "SwapItUp: A Face Swap
Application for Privacy Protection," 2017 IEEE 31st International
Conference on Advanced Information Networking and Applications
(AINA), T aipei, Taiwan, 2017, pp. 46-50, DOI:
10.1109/AINA.2017.53.
[12] Qian Zhang, Hao Zheng, Tao Yan, Jiehui Li,: "3D Large-Pose Face
Alignment Method Based on the Truncated Alexnet Cascade
Network", Advances in Condensed Matter Physics, vol. 2020, Article
ID 6675014, 8 pages, 2020.
[13] Mohammed Alghaili, Zhiyong Li, Hamdi A. R. Ali,: "FaceFilter: Face
Identification with Deep Learning and Filter Algorithm", Scientific
Fig. 9. Showing the desired and final output of our software Programming, vol. 2020, Article ID 7846264, 9 pages, 2020.
[14] Qian Zhang, Hao Zheng, Tao Yan, Jiehui Li,: "3D Large-Pose Face
VI. CONCLUSION Alignment Method Based on the Truncated Alexnet Cascade
Network", Advances in Condensed Matter Physics, vol. 2020, Article
The uniqueness of our paper lies in performing numerous ID 6675014, 8 pages, 2020.
experiments to conduct the prior model with more dedication [15] Justus Thies, Michael Zollhöfer, Marc Stamminger, Christian
to the details and selectively choosing our algorithm with Theobalt, and Matthias Niessner. 2018a.: “FaceVR: Real-T ime Gaze-
trained reference images and conduct of live dataset and Aware Facial Reenactment in Virtual Reality”. ACM Trans. Graph.
resulting in synchronized, sustainable, fast running program 37, 2, Article 25 (June 2018), 15 pages
[16] Thibaut Weise, Sofien Bouaziz, Hao Li, and Mark Pauly. 2011.:
and undetectable difference of the swapping, is our output. A “Realtime Performance based Facial Animation”. ACM Trans. Graph.
live dataset makes the process much feasible to decrease the 30, 4 (July 2011), 77:1–77:10.
chance of training another set of data before performing the [17] Chenglei Wu, Takaaki Shiratori, and Yaser Sheikh. 2018.: “ Deep
output (which in result will also decrease the time Incremental Learning for Efficient High-fidelity Face Tracking”.
efficiency). By considering the simplest details of different ACM T rans. Graph. 37, 6, Article 234 (Dec. 2018), 12 pages.
elements such as accessories, or hair color, g lasses, and [18] Feng Xu, Jinxiang Chai, Yilong Liu, and Xin Tong. 2014.:
“Controllable High-fidelity Facial Performance Transfer”. ACM
adaptation to the swapping by also merging aspects such as T rans. Graph. 33, 4, Article 42 (July 2014), 11 pages.
color, features, and shape plays an important role. We [19] Li Zhang, Noah Snavely, Brian Curless, and Steven M. Seitz. 2004.:”
understand our model has many future ventures to still work Spacetime Faces: High Resolution Capture for Modeling and
through. The model takes space to adapt to differences in Animation”. ACM Trans. Graph. 23, 3 (Aug. 2004), 548–558.
posture, and though the spontaneity of the model tremb les [20] Michael Zollhöfer, Matthias Nießner, Shahram Izadi, Christoph
but our future research will try to improve this condition. Rehmann, Christopher Zach, Matthew Fisher, Chenglei Wu, Andrew
Fitzgibbon, Charles Loop, Christian Theobalt, and Marc Stamminger.
We’ll want to integrate and explore different technologies 2014.: “Real-time Non-rigid Reconstruction Using an RGB-D
and software to precise the results at deeper levels and Camera”. ACM T rans. Graph. 33, 4 (July 2014), 156:1–156:12.
improve the sustainability of the code.
1085
Authorized licensed use limited to: National Institute of Technology- Meghalaya. Downloaded on June 06,2024 at 05:50:33 UTC from IEEE Xplore. Restrictions apply.
[21] Chen Cao, Yanlin Weng, Shun Zhou, Yiying Tong, and Kun Zhou.
2014b.:” Facewarehouse: A 3d facial expression database for visual
computing”. IEEE Transactions on Visualization and Computer
Graphics 20, 3 (2014), 413–425.
[22] Yen-Lin Chen, Hsiang-Tao Wu, Fuhao Shi, Xin Tong, and Jinxiang
Chai. 2013.:”Accurate and robust 3d facial capture using a single rgbd
camera”. In 2013 IEEE International Conference on Computer Vision.
3615–3622.
[23] Kevin Dale, Kalyan Sunkavalli, Micah K. Johnson, Daniel Vlasic,
Wojciech Matusik, and Hanspeter Pfister. 2011.: “Video Face
Replacement”. ACM Trans. Graph. 30, 6, Article 130 (Dec. 2011), 10
pages.
[24] Deep Fakes. 2019. Retrieved May 6, 2019 from
https://github.com/deepfakes/faceswap Graham Fyffe, Andrew Jones,
Oleg Alexander, Ryosuke Ichikari, and Paul Debevec. 2014. Driving
High-Resolution Facial Scans with Video Performance Capture. ACM
T rans. Graph. 34, 1 (Dec. 2014), 8:1–8:14.
1086
Authorized licensed use limited to: National Institute of Technology- Meghalaya. Downloaded on June 06,2024 at 05:50:33 UTC from IEEE Xplore. Restrictions apply.