Search | arXiv e-print repository

Downlink CCM Estimation via Representation Learning with Graph Regularization

Authors: Melih Can Zerin, Elif Vural, Ali Özgür Yılmaz

Abstract: In this paper, we propose an algorithm for downlink (DL) channel covariance matrix (CCM) estimation for frequency division duplexing (FDD) massive multiple-input multiple-output (MIMO) communication systems with base station (BS) possessing a uniform linear array (ULA) antenna structure. We make use of the inherent similarity between the uplink (UL) CCM and the DL CCM due to angular reciprocity. W… ▽ More In this paper, we propose an algorithm for downlink (DL) channel covariance matrix (CCM) estimation for frequency division duplexing (FDD) massive multiple-input multiple-output (MIMO) communication systems with base station (BS) possessing a uniform linear array (ULA) antenna structure. We make use of the inherent similarity between the uplink (UL) CCM and the DL CCM due to angular reciprocity. We consider a setting where the UL CCM is mapped to DL CCM by a mapping function. We first present a theoretical error analysis of learning a nonlinear embedding by constructing a mapping function, which points to the importance of the Lipschitz regularity of the mapping function for achieving high estimation performance. Then, based on the theoretical ground, we propose a representation learning algorithm as a solution for the estimation problem, where Gaussian RBF kernel interpolators are chosen to map UL CCMs to their DL counterparts. The proposed algorithm is based on the optimization of an objective function that fits a regression model between the DL CCM and UL CCM samples in the training dataset and preserves the local geometric structure of the data in the UL CCM space, while explicitly regulating the Lipschitz continuity of the mapping function in light of our theoretical findings. The proposed algorithm surpasses benchmark methods in terms of three error metrics as shown by simulations. △ Less

Submitted 26 July, 2024; originally announced July 2024.

arXiv:2406.10144 [pdf, other]

Improving rule mining via embedding-based link prediction

Authors: N'Dah Jean Kouagou, Arif Yilmaz, Michel Dumontier, Axel-Cyrille Ngonga Ngomo

Abstract: Rule mining on knowledge graphs allows for explainable link prediction. Contrarily, embedding-based methods for link prediction are well known for their generalization capabilities, but their predictions are not interpretable. Several approaches combining the two families have been proposed in recent years. The majority of the resulting hybrid approaches are usually trained within a unified learni… ▽ More Rule mining on knowledge graphs allows for explainable link prediction. Contrarily, embedding-based methods for link prediction are well known for their generalization capabilities, but their predictions are not interpretable. Several approaches combining the two families have been proposed in recent years. The majority of the resulting hybrid approaches are usually trained within a unified learning framework, which often leads to convergence issues due to the complexity of the learning task. In this work, we propose a new way to combine the two families of approaches. Specifically, we enrich a given knowledge graph by means of its pre-trained entity and relation embeddings before applying rule mining systems on the enriched knowledge graph. To validate our approach, we conduct extensive experiments on seven benchmark datasets. An analysis of the results generated by our approach suggests that we discover new valuable rules on the enriched graphs. We provide an open source implementation of our approach as well as pretrained models and datasets at https://github.com/Jean-KOUAGOU/EnhancedRuleLearning △ Less

Submitted 14 June, 2024; originally announced June 2024.

Comments: 13 pages, 2 figures, 11 tables

arXiv:2406.07426 [pdf, other]

DERM12345: A Large, Multisource Dermatoscopic Skin Lesion Dataset with 38 Subclasses

Authors: Abdurrahim Yilmaz, Sirin Pekcan Yasar, Gulsum Gencoglan, Burak Temelkuran

Abstract: Skin lesion datasets provide essential information for understanding various skin conditions and developing effective diagnostic tools. They aid the artificial intelligence-based early detection of skin cancer, facilitate treatment planning, and contribute to medical education and research. Published large datasets have partially coverage the subclassifications of the skin lesions. This limitation… ▽ More Skin lesion datasets provide essential information for understanding various skin conditions and developing effective diagnostic tools. They aid the artificial intelligence-based early detection of skin cancer, facilitate treatment planning, and contribute to medical education and research. Published large datasets have partially coverage the subclassifications of the skin lesions. This limitation highlights the need for more expansive and varied datasets to reduce false predictions and help improve the failure analysis for skin lesions. This study presents a diverse dataset comprising 12,345 dermatoscopic images with 38 subclasses of skin lesions collected in Turkiye which comprises different skin types in the transition zone between Europe and Asia. Each subgroup contains high-resolution photos and expert annotations, providing a strong and reliable basis for future research. The detailed analysis of each subgroup provided in this study facilitates targeted research endeavors and enhances the depth of understanding regarding the skin lesions. This dataset distinguishes itself through a diverse structure with 5 super classes, 15 main classes, 38 subclasses and its 12,345 high-resolution dermatoscopic images. △ Less

Submitted 11 June, 2024; originally announced June 2024.

Comments: 12 pages, 2 figures, 1 table

arXiv:2406.01029 [pdf, other]

CYCLO: Cyclic Graph Transformer Approach to Multi-Object Relationship Modeling in Aerial Videos

Authors: Trong-Thuan Nguyen, Pha Nguyen, Xin Li, Jackson Cothren, Alper Yilmaz, Khoa Luu

Abstract: Video scene graph generation (VidSGG) has emerged as a transformative approach to capturing and interpreting the intricate relationships among objects and their temporal dynamics in video sequences. In this paper, we introduce the new AeroEye dataset that focuses on multi-object relationship modeling in aerial videos. Our AeroEye dataset features various drone scenes and includes a visually compre… ▽ More Video scene graph generation (VidSGG) has emerged as a transformative approach to capturing and interpreting the intricate relationships among objects and their temporal dynamics in video sequences. In this paper, we introduce the new AeroEye dataset that focuses on multi-object relationship modeling in aerial videos. Our AeroEye dataset features various drone scenes and includes a visually comprehensive and precise collection of predicates that capture the intricate relationships and spatial arrangements among objects. To this end, we propose the novel Cyclic Graph Transformer (CYCLO) approach that allows the model to capture both direct and long-range temporal dependencies by continuously updating the history of interactions in a circular manner. The proposed approach also allows one to handle sequences with inherent cyclical patterns and process object relationships in the correct sequential order. Therefore, it can effectively capture periodic and overlapping relationships while minimizing information loss. The extensive experiments on the AeroEye dataset demonstrate the effectiveness of the proposed CYCLO model, demonstrating its potential to perform scene understanding on drone videos. Finally, the CYCLO method consistently achieves State-of-the-Art (SOTA) results on two in-the-wild scene graph generation benchmarks, i.e., PVSG and ASPIRe. △ Less

Submitted 3 June, 2024; originally announced June 2024.

arXiv:2404.10156 [pdf, other]

SegFormer3D: an Efficient Transformer for 3D Medical Image Segmentation

Authors: Shehan Perera, Pouyan Navard, Alper Yilmaz

Abstract: The adoption of Vision Transformers (ViTs) based architectures represents a significant advancement in 3D Medical Image (MI) segmentation, surpassing traditional Convolutional Neural Network (CNN) models by enhancing global contextual understanding. While this paradigm shift has significantly enhanced 3D segmentation performance, state-of-the-art architectures require extremely large and complex a… ▽ More The adoption of Vision Transformers (ViTs) based architectures represents a significant advancement in 3D Medical Image (MI) segmentation, surpassing traditional Convolutional Neural Network (CNN) models by enhancing global contextual understanding. While this paradigm shift has significantly enhanced 3D segmentation performance, state-of-the-art architectures require extremely large and complex architectures with large scale computing resources for training and deployment. Furthermore, in the context of limited datasets, often encountered in medical imaging, larger models can present hurdles in both model generalization and convergence. In response to these challenges and to demonstrate that lightweight models are a valuable area of research in 3D medical imaging, we present SegFormer3D, a hierarchical Transformer that calculates attention across multiscale volumetric features. Additionally, SegFormer3D avoids complex decoders and uses an all-MLP decoder to aggregate local and global attention features to produce highly accurate segmentation masks. The proposed memory efficient Transformer preserves the performance characteristics of a significantly larger model in a compact design. SegFormer3D democratizes deep learning for 3D medical image segmentation by offering a model with 33x less parameters and a 13x reduction in GFLOPS compared to the current state-of-the-art (SOTA). We benchmark SegFormer3D against the current SOTA models on three widely used datasets Synapse, BRaTs, and ACDC, achieving competitive results. Code: https://github.com/OSUPCVLab/SegFormer3D.git △ Less

Submitted 23 April, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

Comments: Accepted at CVPR Workshop 2024

arXiv:2404.10140 [pdf, other]

doi 10.5194/isprs-archives-XLVIII-2-2024-297-2024

A Probabilistic-based Drift Correction Module for Visual Inertial SLAMs

Authors: Pouyan Navard, Alper Yilmaz

Abstract: Positioning is a prominent field of study, notably focusing on Visual Inertial Odometry (VIO) and Simultaneous Localization and Mapping (SLAM) methods. Despite their advancements, these methods often encounter dead-reckoning errors that leads to considerable drift in estimated platform motion especially during long traverses. In such cases, the drift error is not negligible and should be rectified… ▽ More Positioning is a prominent field of study, notably focusing on Visual Inertial Odometry (VIO) and Simultaneous Localization and Mapping (SLAM) methods. Despite their advancements, these methods often encounter dead-reckoning errors that leads to considerable drift in estimated platform motion especially during long traverses. In such cases, the drift error is not negligible and should be rectified. Our proposed approach minimizes the drift error by correcting the estimated motion generated by any SLAM method at each epoch. Our methodology treats positioning measurements rendered by the SLAM solution as random variables formulated jointly in a multivariate distribution. In this setting, The correction of the drift becomes equivalent to finding the mode of this multivariate distribution which jointly maximizes the likelihood of a set of relevant geo-spatial priors about the platform motion and environment. Our method is integrable into any SLAM/VIO method as an correction module. Our experimental results shows the effectiveness of our approach in minimizing the drift error by 10x in long treverses. △ Less

Submitted 15 April, 2024; originally announced April 2024.

arXiv:2312.05953 [pdf]

RadImageGAN -- A Multi-modal Dataset-Scale Generative AI for Medical Imaging

Authors: Zelong Liu, Alexander Zhou, Arnold Yang, Alara Yilmaz, Maxwell Yoo, Mikey Sullivan, Catherine Zhang, James Grant, Daiqing Li, Zahi A. Fayad, Sean Huver, Timothy Deyer, Xueyan Mei

Abstract: Deep learning in medical imaging often requires large-scale, high-quality data or initiation with suitably pre-trained weights. However, medical datasets are limited by data availability, domain-specific knowledge, and privacy concerns, and the creation of large and diverse radiologic databases like RadImageNet is highly resource-intensive. To address these limitations, we introduce RadImageGAN, t… ▽ More Deep learning in medical imaging often requires large-scale, high-quality data or initiation with suitably pre-trained weights. However, medical datasets are limited by data availability, domain-specific knowledge, and privacy concerns, and the creation of large and diverse radiologic databases like RadImageNet is highly resource-intensive. To address these limitations, we introduce RadImageGAN, the first multi-modal radiologic data generator, which was developed by training StyleGAN-XL on the real RadImageNet dataset of 102,774 patients. RadImageGAN can generate high-resolution synthetic medical imaging datasets across 12 anatomical regions and 130 pathological classes in 3 modalities. Furthermore, we demonstrate that RadImageGAN generators can be utilized with BigDatasetGAN to generate multi-class pixel-wise annotated paired synthetic images and masks for diverse downstream segmentation tasks with minimal manual annotation. We showed that using synthetic auto-labeled data from RadImageGAN can significantly improve performance on four diverse downstream segmentation datasets by augmenting real training data and/or developing pre-trained weights for fine-tuning. This shows that RadImageGAN combined with BigDatasetGAN can improve model performance and address data scarcity while reducing the resources needed for annotations for segmentation tasks. △ Less

Submitted 10 December, 2023; originally announced December 2023.

arXiv:2308.11983 [pdf, other]

doi 10.1109/LRA.2023.3295254

Multi-Modal Multi-Task (3MT) Road Segmentation

Authors: Erkan Milli, Özgür Erkent, Asım Egemen Yılmaz

Abstract: Multi-modal systems have the capacity of producing more reliable results than systems with a single modality in road detection due to perceiving different aspects of the scene. We focus on using raw sensor inputs instead of, as it is typically done in many SOTA works, leveraging architectures that require high pre-processing costs such as surface normals or dense depth predictions. By using raw se… ▽ More Multi-modal systems have the capacity of producing more reliable results than systems with a single modality in road detection due to perceiving different aspects of the scene. We focus on using raw sensor inputs instead of, as it is typically done in many SOTA works, leveraging architectures that require high pre-processing costs such as surface normals or dense depth predictions. By using raw sensor inputs, we aim to utilize a low-cost model thatminimizes both the pre-processing andmodel computation costs. This study presents a cost-effective and highly accurate solution for road segmentation by integrating data from multiple sensorswithin a multi-task learning architecture.Afusion architecture is proposed in which RGB and LiDAR depth images constitute the inputs of the network. Another contribution of this study is to use IMU/GNSS (inertial measurement unit/global navigation satellite system) inertial navigation system whose data is collected synchronously and calibrated with a LiDAR-camera to compute aggregated dense LiDAR depth images. It has been demonstrated by experiments on the KITTI dataset that the proposed method offers fast and high-performance solutions. We have also shown the performance of our method on Cityscapes where raw LiDAR data is not available. The segmentation results obtained for both full and half resolution images are competitive with existing methods. Therefore, we conclude that our method is not dependent only on raw LiDAR data; rather, it can be used with different sensor modalities. The inference times obtained in all experiments are very promising for real-time experiments. △ Less

Submitted 23 August, 2023; originally announced August 2023.

Journal ref: in IEEE Robotics and Automation Letters, vol. 8, no. 9, pp. 5408-5415, Sept. 2023

arXiv:2306.16544 [pdf, other]

Multi-Scale Deformable Alignment and Content-Adaptive Inference for Flexible-Rate Bi-Directional Video Compression

Authors: M. Akın Yılmaz, O. Ugur Ulas, A. Murat Tekalp

Abstract: The lack of ability to adapt the motion compensation model to video content is an important limitation of current end-to-end learned video compression models. This paper advances the state-of-the-art by proposing an adaptive motion-compensation model for end-to-end rate-distortion optimized hierarchical bi-directional video compression. In particular, we propose two novelties: i) a multi-scale def… ▽ More The lack of ability to adapt the motion compensation model to video content is an important limitation of current end-to-end learned video compression models. This paper advances the state-of-the-art by proposing an adaptive motion-compensation model for end-to-end rate-distortion optimized hierarchical bi-directional video compression. In particular, we propose two novelties: i) a multi-scale deformable alignment scheme at the feature level combined with multi-scale conditional coding, ii) motion-content adaptive inference. In addition, we employ a gain unit, which enables a single model to operate at multiple rate-distortion operating points. We also exploit the gain unit to control bit allocation among intra-coded vs. bi-directionally coded frames by fine tuning corresponding models for truly flexible-rate learned video coding. Experimental results demonstrate state-of-the-art rate-distortion performance exceeding those of all prior art in learned video coding. △ Less

Submitted 28 June, 2023; originally announced June 2023.

Comments: Accepted for publication in IEEE International Conference on Image Processing (ICIP) 2023

arXiv:2306.04354 [pdf, other]

doi 10.1109/LCOMM.2024.3363878

Quasi-Newton FDE in One-Bit Pseudo-Randomly Quantized Massive MIMO-OFDM Systems

Authors: Gökhan Yılmaz, Ali Özgür Yılmaz

Abstract: This letter offers a new frequency domain equalization (FDE) scheme that can work with a pseudo-random quantization (PRQ) scheme utilizing non-zero threshold quantization in one-bit uplink multi-user massive multiple-input multiple-output (MIMO) systems to mitigate quantization distortion and support high-order modulation schemes. The equalizer is based on Newton's method (NM) and applicable for o… ▽ More This letter offers a new frequency domain equalization (FDE) scheme that can work with a pseudo-random quantization (PRQ) scheme utilizing non-zero threshold quantization in one-bit uplink multi-user massive multiple-input multiple-output (MIMO) systems to mitigate quantization distortion and support high-order modulation schemes. The equalizer is based on Newton's method (NM) and applicable for orthogonal frequency division multiplexing (OFDM) transmission under frequency-selective fading by exploiting the properties of massive MIMO. We develop a low-complexity FDE scheme to obtain a quasi-Newton method. The proposed detector outperforms the benchmark detector with comparable complexity. △ Less

Submitted 23 December, 2023; v1 submitted 7 June, 2023; originally announced June 2023.

arXiv:2306.04329 [pdf, other]

doi 10.1109/TWC.2023.3318081

Pseudo-Random Quantization Based Two-Stage Detection in One-Bit Massive MIMO Systems

Authors: Gökhan Yılmaz, Ali Özgür Yılmaz

Abstract: Utilizing low-resolution analog-to-digital converters (ADCs) in uplink massive multiple-input multiple-output (MIMO) systems is a practical solution to decrease power consumption. The performance gap between the low and high-resolution systems is small at low signal-to-noise ratio (SNR) regimes. However, at high SNR and with high modulation orders, the achievable rate saturates after a finite SNR… ▽ More Utilizing low-resolution analog-to-digital converters (ADCs) in uplink massive multiple-input multiple-output (MIMO) systems is a practical solution to decrease power consumption. The performance gap between the low and high-resolution systems is small at low signal-to-noise ratio (SNR) regimes. However, at high SNR and with high modulation orders, the achievable rate saturates after a finite SNR value due to the stochastic resonance (SR) phenomenon. This paper proposes a novel pseudo-random quantization (PRQ) scheme by modifying the quantization thresholds that can help compensate for the effects of SR and makes communication with high-order modulation schemes such as $1024$-QAM in one-bit quantized uplink massive MIMO systems possible. Moreover, modified linear detectors for non-zero threshold quantization are derived, and a two-stage uplink detector for single-carrier (SC) multi-user systems is proposed. The first stage is an iterative method called Boxed Newton Detector (BND) that utilizes Newton's Method to maximize the log-likelihood with box constraints. The second stage, Nearest Codeword Detector (NCD), exploits the first stage solution and creates a small set of most likely candidates based on sign constraints to increase detection performance. The proposed two-stage method with PRQ outperforms the state-of-the-art detectors from the literature with comparable complexity while supporting high-order modulation schemes. △ Less

Submitted 7 June, 2023; originally announced June 2023.

arXiv:2302.01041 [pdf, other]

TAPS Responsibility Matrix: A tool for responsible data science by design

Authors: Visara Urovi, Remzi Celebi, Chang Sun, Linda Rieswijk, Michael Erard, Arif Yilmaz, Kody Moodley, Parveen Kumar, Michel Dumontier

Abstract: Data science is an interdisciplinary research area where scientists are typically working with data coming from different fields. When using and analyzing data, the scientists implicitly agree to follow standards, procedures, and rules set in these fields. However, guidance on the responsibilities of the data scientists and the other involved actors in a data science project is typically missing.… ▽ More Data science is an interdisciplinary research area where scientists are typically working with data coming from different fields. When using and analyzing data, the scientists implicitly agree to follow standards, procedures, and rules set in these fields. However, guidance on the responsibilities of the data scientists and the other involved actors in a data science project is typically missing. While literature shows that novel frameworks and tools are being proposed in support of open-science, data reuse, and research data management, there are currently no frameworks that can fully express responsibilities of a data science project. In this paper, we describe the Transparency, Accountability, Privacy, and Societal Responsibility Matrix (TAPS-RM) as framework to explore social, legal, and ethical aspects of data science projects. TAPS-RM acts as a tool to provide users with a holistic view of their project beyond key outcomes and clarifies the responsibilities of actors. We map the developed model of TAPS-RM with well-known initiatives for open data (such as FACT, FAIR and Datasheets for datasets). We conclude that TAPS-RM is a tool to reflect on responsibilities at a data science project level and can be used to advance responsible data science by design. △ Less

Submitted 2 February, 2023; originally announced February 2023.

MSC Class: I.2.1

arXiv:2209.09857 [pdf, other]

Fine-grained Classification of Solder Joints with α-skew Jensen-Shannon Divergence

Authors: Furkan Ulger, Seniha Esen Yuksel, Atila Yilmaz, Dincer Gokcen

Abstract: Solder joint inspection (SJI) is a critical process in the production of printed circuit boards (PCB). Detection of solder errors during SJI is quite challenging as the solder joints have very small sizes and can take various shapes. In this study, we first show that solders have low feature diversity, and that the SJI can be carried out as a fine-grained image classification task which focuses on… ▽ More Solder joint inspection (SJI) is a critical process in the production of printed circuit boards (PCB). Detection of solder errors during SJI is quite challenging as the solder joints have very small sizes and can take various shapes. In this study, we first show that solders have low feature diversity, and that the SJI can be carried out as a fine-grained image classification task which focuses on hard-to-distinguish object classes. To improve the fine-grained classification accuracy, penalizing confident model predictions by maximizing entropy was found useful in the literature. Inline with this information, we propose using the α-skew Jensen-Shannon divergence (α-JS) for penalizing the confidence in model predictions. We compare the α-JS regularization with both existing entropyregularization based methods and the methods based on attention mechanism, segmentation techniques, transformer models, and specific loss functions for fine-grained image classification tasks. We show that the proposed approach achieves the highest F1-score and competitive accuracy for different models in the finegrained solder joint classification task. Finally, we visualize the activation maps and show that with entropy-regularization, more precise class-discriminative regions are localized, which are also more resilient to noise. Code will be made available here upon acceptance. △ Less

Submitted 20 September, 2022; originally announced September 2022.

Comments: Submitted to IEEE Transactions on Components, Packaging and Manufacturing Technology

arXiv:2208.12251 [pdf, other]

A Gis Aided Approach for Geolocalizing an Unmanned Aerial System Using Deep Learning

Authors: Jianli Wei, Deniz Karakay, Alper Yilmaz

Abstract: The Global Positioning System (GPS) has become a part of our daily life with the primary goal of providing geopositioning service. For an unmanned aerial system (UAS), geolocalization ability is an extremely important necessity which is achieved using Inertial Navigation System (INS) with the GPS at its heart. Without geopositioning service, UAS is unable to fly to its destination or come back hom… ▽ More The Global Positioning System (GPS) has become a part of our daily life with the primary goal of providing geopositioning service. For an unmanned aerial system (UAS), geolocalization ability is an extremely important necessity which is achieved using Inertial Navigation System (INS) with the GPS at its heart. Without geopositioning service, UAS is unable to fly to its destination or come back home. Unfortunately, GPS signals can be jammed and suffer from a multipath problem in urban canyons. Our goal is to propose an alternative approach to geolocalize a UAS when GPS signal is degraded or denied. Considering UAS has a downward-looking camera on its platform that can acquire real-time images as the platform flies, we apply modern deep learning techniques to achieve geolocalization. In particular, we perform image matching to establish latent feature conjugates between UAS acquired imagery and satellite orthophotos. A typical application of feature matching suffers from high-rise buildings and new constructions in the field that introduce uncertainties into homography estimation, hence results in poor geolocalization performance. Instead, we extract GIS information from OpenStreetMap (OSM) to semantically segment matched features into building and terrain classes. The GIS mask works as a filter in selecting semantically matched features that enhance coplanarity conditions and the UAS geolocalization accuracy. Once the paper is published our code will be publicly available at https://github.com/OSUPCVLab/UbihereDrone2021. △ Less

Submitted 25 August, 2022; originally announced August 2022.

Comments: Paper published at SENSORS 2022 Conference

arXiv:2208.12125 [pdf, other]

UAS Navigation in the Real World Using Visual Observation

Authors: Yuci Han, Jianli Wei, Alper Yilmaz

Abstract: This paper presents a novel end-to-end Unmanned Aerial System (UAS) navigation approach for long-range visual navigation in the real world. Inspired by dual-process visual navigation system of human's instinct: environment understanding and landmark recognition, we formulate the UAS navigation task into two same phases. Our system combines the reinforcement learning (RL) and image matching approac… ▽ More This paper presents a novel end-to-end Unmanned Aerial System (UAS) navigation approach for long-range visual navigation in the real world. Inspired by dual-process visual navigation system of human's instinct: environment understanding and landmark recognition, we formulate the UAS navigation task into two same phases. Our system combines the reinforcement learning (RL) and image matching approaches. First, the agent learns the navigation policy using RL in the specified environment. To achieve this, we design an interactive UASNAV environment for the training process. Once the agent learns the navigation policy, which means 'familiarized themselves with the environment', we let the UAS fly in the real world to recognize the landmarks using image matching method and take action according to the learned policy. During the navigation process, the UAS is embedded with single camera as the only visual sensor. We demonstrate that the UAS can learn navigating to the destination hundreds meters away from the starting point with the shortest path in the real world scenario. △ Less

Submitted 25 August, 2022; originally announced August 2022.

arXiv:2208.09063 [pdf, other]

How important are socioeconomic factors for hurricane performance of power systems? An analysis of disparities through machine learning

Authors: Alexys Herleym Rodríguez Avellaneda, Abdollah Shafieezadeh, Alper Yilmaz

Abstract: This paper investigates whether socioeconomic factors are important for the hurricane performance of the electric power system in Florida. The investigation is performed using the Random Forest classifier with Mean Decrease of Accuracy (MDA) for measuring the importance of a set of factors that include hazard intensity, time to recovery from maximum impact, and socioeconomic characteristics of the… ▽ More This paper investigates whether socioeconomic factors are important for the hurricane performance of the electric power system in Florida. The investigation is performed using the Random Forest classifier with Mean Decrease of Accuracy (MDA) for measuring the importance of a set of factors that include hazard intensity, time to recovery from maximum impact, and socioeconomic characteristics of the affected population. The data set (at county scale) for this study includes socioeconomic variables from the 5-year American Community Survey (ACS), as well as wind velocities, and outage data of five hurricanes including Alberto and Michael in 2018, Dorian in 2019, and Eta and Isaias in 2020. The study shows that socioeconomic variables are considerably important for the system performance model. This indicates that social disparities may exist in the occurrence of power outages, which directly impact the resilience of communities and thus require immediate attention. △ Less

Submitted 18 August, 2022; originally announced August 2022.

Comments: 6 pages, 5 figures, 4 tables, 2022 IEEE International Conference on Power Systems Technology (PowerCon)

arXiv:2206.13613 [pdf, other]

Flexible-Rate Learned Hierarchical Bi-Directional Video Compression With Motion Refinement and Frame-Level Bit Allocation

Authors: Eren Cetin, M. Akin Yilmaz, A. Murat Tekalp

Abstract: This paper presents improvements and novel additions to our recent work on end-to-end optimized hierarchical bi-directional video compression to further advance the state-of-the-art in learned video compression. As an improvement, we combine motion estimation and prediction modules and compress refined residual motion vectors for improved rate-distortion performance. As novel addition, we adapted… ▽ More This paper presents improvements and novel additions to our recent work on end-to-end optimized hierarchical bi-directional video compression to further advance the state-of-the-art in learned video compression. As an improvement, we combine motion estimation and prediction modules and compress refined residual motion vectors for improved rate-distortion performance. As novel addition, we adapted the gain unit proposed for image compression to flexible-rate video compression in two ways: first, the gain unit enables a single encoder model to operate at multiple rate-distortion operating points; second, we exploit the gain unit to control bit allocation among intra-coded vs. bi-directionally coded frames by fine tuning corresponding models for truly flexible-rate learned video coding. Experimental results demonstrate that we obtain state-of-the-art rate-distortion performance exceeding those of all prior art in learned video coding. △ Less

Submitted 27 June, 2022; originally announced June 2022.

Comments: Accepted for publication in IEEE International Conference on Image Processing (ICIP 2022)

Report number: 1850

arXiv:2205.12128 [pdf, other]

Learning to Drive Using Sparse Imitation Reinforcement Learning

Authors: Yuci Han, Alper Yilmaz

Abstract: In this paper, we propose Sparse Imitation Reinforcement Learning (SIRL), a hybrid end-to-end control policy that combines the sparse expert driving knowledge with reinforcement learning (RL) policy for autonomous driving (AD) task in CARLA simulation environment. The sparse expert is designed based on hand-crafted rules which is suboptimal but provides a risk-averse strategy by enforcing experien… ▽ More In this paper, we propose Sparse Imitation Reinforcement Learning (SIRL), a hybrid end-to-end control policy that combines the sparse expert driving knowledge with reinforcement learning (RL) policy for autonomous driving (AD) task in CARLA simulation environment. The sparse expert is designed based on hand-crafted rules which is suboptimal but provides a risk-averse strategy by enforcing experience for critical scenarios such as pedestrian and vehicle avoidance, and traffic light detection. As it has been demonstrated, training a RL agent from scratch is data-inefficient and time consuming particularly for the urban driving task, due to the complexity of situations stemming from the vast size of state space. Our SIRL strategy provides a solution to solve these problems by fusing the output distribution of the sparse expert policy and the RL policy to generate a composite driving policy. With the guidance of the sparse expert during the early training stage, SIRL strategy accelerates the training process and keeps the RL exploration from causing a catastrophe outcome, and ensures safe exploration. To some extent, the SIRL agent is imitating the driving expert's behavior. At the same time, it continuously gains knowledge during training therefore it keeps making improvement beyond the sparse expert, and can surpass both the sparse expert and a traditional RL agent. We experimentally validate the efficacy of proposed SIRL approach in a complex urban scenario within the CARLA simulator. Besides, we compare the SIRL agent's performance for risk-averse exploration and high learning efficiency with the traditional RL approach. We additionally demonstrate the SIRL agent's generalization ability to transfer the driving skill to unseen environment. △ Less

Submitted 24 May, 2022; originally announced May 2022.

arXiv:2205.02125 [pdf]

Engineering deep learning methods on automatic detection of damage in infrastructure due to extreme events

Authors: Yongsheng Bai, Bing Zha, Halil Sezen, Alper Yilmaz

Abstract: This paper presents a few comprehensive experimental studies for automated Structural Damage Detection (SDD) in extreme events using deep learning methods for processing 2D images. In the first study, a 152-layer Residual network (ResNet) is utilized to classify multiple classes in eight SDD tasks, which include identification of scene levels, damage levels, material types, etc. The proposed ResNe… ▽ More This paper presents a few comprehensive experimental studies for automated Structural Damage Detection (SDD) in extreme events using deep learning methods for processing 2D images. In the first study, a 152-layer Residual network (ResNet) is utilized to classify multiple classes in eight SDD tasks, which include identification of scene levels, damage levels, material types, etc. The proposed ResNet achieved high accuracy for each task while the positions of the damage are not identifiable. In the second study, the existing ResNet and a segmentation network (U-Net) are combined into a new pipeline, cascaded networks, for categorizing and locating structural damage. The results show that the accuracy of damage detection is significantly improved compared to only using a segmentation network. In the third and fourth studies, end-to-end networks are developed and tested as a new solution to directly detect cracks and spalling in the image collections of recent large earthquakes. One of the proposed networks can achieve an accuracy above 67.6% for all tested images at various scales and resolutions, and shows its robustness for these human-free detection tasks. As a preliminary field study, we applied the proposed method to detect damage in a concrete structure that was tested to study its progressive collapse performance. The experiments indicate that these solutions for automatic detection of structural damage using deep learning methods are feasible and promising. The training datasets and codes will be made available for the public upon the publication of this paper. △ Less

Submitted 1 May, 2022; originally announced May 2022.

Comments: Thanks for the revivers' help for improving this paper. Structural Health Monitoring (2022)

arXiv:2202.03695 [pdf, other]

Network Comparison Study of Deep Activation Feature Discriminability with Novel Objects

Authors: Michael Karnes, Alper Yilmaz

Abstract: Feature extraction has always been a critical component of the computer vision field. More recently, state-of-the-art computer visions algorithms have incorporated Deep Neural Networks (DNN) in feature extracting roles, creating Deep Convolutional Activation Features (DeCAF). The transferability of DNN knowledge domains has enabled the wide use of pretrained DNN feature extraction for applications… ▽ More Feature extraction has always been a critical component of the computer vision field. More recently, state-of-the-art computer visions algorithms have incorporated Deep Neural Networks (DNN) in feature extracting roles, creating Deep Convolutional Activation Features (DeCAF). The transferability of DNN knowledge domains has enabled the wide use of pretrained DNN feature extraction for applications with novel object classes, especially those with limited training data. This study analyzes the general discriminability of novel object visual appearances encoded into the DeCAF space of six of the leading visual recognition DNN architectures. The results of this study characterize the Mahalanobis distances and cosine similarities between DeCAF object manifolds across two visual object tracking benchmark data sets. The backgrounds surrounding each object are also included as an object classes in the manifold analysis, providing a wider range of novel classes. This study found that different network architectures led to different network feature focuses that must to be considered in the network selection process. These results are generated from the VOT2015 and UAV123 benchmark data sets; however, the proposed methods can be applied to efficiently compare estimated network performance characteristics for any labeled visual data set. △ Less

Submitted 8 February, 2022; originally announced February 2022.

arXiv:2112.09529 [pdf, other]

End-to-End Rate-Distortion Optimized Learned Hierarchical Bi-Directional Video Compression

Authors: M. Akın Yılmaz, A. Murat Tekalp

Abstract: Conventional video compression (VC) methods are based on motion compensated transform coding, and the steps of motion estimation, mode and quantization parameter selection, and entropy coding are optimized individually due to the combinatorial nature of the end-to-end optimization problem. Learned VC allows end-to-end rate-distortion (R-D) optimized training of nonlinear transform, motion and entr… ▽ More Conventional video compression (VC) methods are based on motion compensated transform coding, and the steps of motion estimation, mode and quantization parameter selection, and entropy coding are optimized individually due to the combinatorial nature of the end-to-end optimization problem. Learned VC allows end-to-end rate-distortion (R-D) optimized training of nonlinear transform, motion and entropy model simultaneously. Most works on learned VC consider end-to-end optimization of a sequential video codec based on R-D loss averaged over pairs of successive frames. It is well-known in conventional VC that hierarchical, bi-directional coding outperforms sequential compression because of its ability to use both past and future reference frames. This paper proposes a learned hierarchical bi-directional video codec (LHBDC) that combines the benefits of hierarchical motion-compensated prediction and end-to-end optimization. Experimental results show that we achieve the best R-D results that are reported for learned VC schemes to date in both PSNR and MS-SSIM. Compared to conventional video codecs, the R-D performance of our end-to-end optimized codec outperforms those of both x265 and SVT-HEVC encoders ("veryslow" preset) in PSNR and MS-SSIM as well as HM 16.23 reference software in MS-SSIM. We present ablation studies showing performance gains due to proposed novel tools such as learned masking, flow-field subsampling, and temporal flow vector prediction. The models and instructions to reproduce our results can be found in https://github.com/makinyilmaz/LHBDC/ △ Less

Submitted 17 December, 2021; originally announced December 2021.

Comments: Accepted for publication in IEEE Transactions on Image Processing on 15 Dec. 2021

arXiv:2110.12270 [pdf, other]

Benchmarking of Lightweight Deep Learning Architectures for Skin Cancer Classification using ISIC 2017 Dataset

Authors: Abdurrahim Yilmaz, Mucahit Kalebasi, Yegor Samoylenko, Mehmet Erhan Guvenilir, Huseyin Uvet

Abstract: Skin cancer is one of the deadly types of cancer and is common in the world. Recently, there has been a huge jump in the rate of people getting skin cancer. For this reason, the number of studies on skin cancer classification with deep learning are increasing day by day. For the growth of work in this area, the International Skin Imaging Collaboration (ISIC) organization was established and they c… ▽ More Skin cancer is one of the deadly types of cancer and is common in the world. Recently, there has been a huge jump in the rate of people getting skin cancer. For this reason, the number of studies on skin cancer classification with deep learning are increasing day by day. For the growth of work in this area, the International Skin Imaging Collaboration (ISIC) organization was established and they created an open dataset archive. In this study, images were taken from ISIC 2017 Challenge. The skin cancer images taken were preprocessed and data augmented. Later, these images were trained with transfer learning and fine-tuning approach and deep learning models were created in this way. 3 different mobile deep learning models and 3 different batch size values were determined for each, and a total of 9 models were created. Among these models, the NASNetMobile model with 16 batch size got the best result. The accuracy value of this model is 82.00%, the precision value is 81.77% and the F1 score value is 0.8038. Our method is to benchmark mobile deep learning models which have few parameters and compare the results of the models. △ Less

Submitted 23 October, 2021; originally announced October 2021.

Comments: 4 pages, supplementary with 9 figures

arXiv:2109.04960 [pdf, other]

Automatic Displacement and Vibration Measurement in Laboratory Experiments with A Deep Learning Method

Authors: Yongsheng Bai, Ramzi M. Abduallah, Halil Sezen, Alper Yilmaz

Abstract: This paper proposes a pipeline to automatically track and measure displacement and vibration of structural specimens during laboratory experiments. The latest Mask Regional Convolutional Neural Network (Mask R-CNN) can locate the targets and monitor their movement from videos recorded by a stationary camera. To improve precision and remove the noise, techniques such as Scale-invariant Feature Tran… ▽ More This paper proposes a pipeline to automatically track and measure displacement and vibration of structural specimens during laboratory experiments. The latest Mask Regional Convolutional Neural Network (Mask R-CNN) can locate the targets and monitor their movement from videos recorded by a stationary camera. To improve precision and remove the noise, techniques such as Scale-invariant Feature Transform (SIFT) and various filters for signal processing are included. Experiments on three small-scale reinforced concrete beams and a shaking table test are utilized to verify the proposed method. Results show that the proposed deep learning method can achieve the goal to automatically and precisely measure the motion of tested structural members during laboratory experiments. △ Less

Submitted 10 September, 2021; originally announced September 2021.

Journal ref: IEEE Sensors 2021

arXiv:2109.03793 [pdf, other]

Adaptive Few-Shot Learning PoC Ultrasound COVID-19 Diagnostic System

Authors: Michael Karnes, Shehan Perera, Srikar Adhikari, Alper Yilmaz

Abstract: This paper presents a novel ultrasound imaging point-of-care (PoC) COVID-19 diagnostic system. The adaptive visual diagnostics utilize few-shot learning (FSL) to generate encoded disease state models that are stored and classified using a dictionary of knowns. The novel vocabulary based feature processing of the pipeline adapts the knowledge of a pretrained deep neural network to compress the ultr… ▽ More This paper presents a novel ultrasound imaging point-of-care (PoC) COVID-19 diagnostic system. The adaptive visual diagnostics utilize few-shot learning (FSL) to generate encoded disease state models that are stored and classified using a dictionary of knowns. The novel vocabulary based feature processing of the pipeline adapts the knowledge of a pretrained deep neural network to compress the ultrasound images into discrimative descriptions. The computational efficiency of the FSL approach enables high diagnostic deep learning performance in PoC settings, where training data is limited and the annotation process is not strictly controlled. The algorithm performance is evaluated on the open source COVID-19 POCUS Dataset to validate the system's ability to distinguish COVID-19, pneumonia, and healthy disease states. The results of the empirical analyses demonstrate the appropriate efficiency and accuracy for scalable PoC use. The code for this work will be made publicly available on GitHub upon acceptance. △ Less

Submitted 8 September, 2021; originally announced September 2021.

Comments: Biomedical Circuits and Systems Conference (BioCAS) 2021

arXiv:2109.01235 [pdf, other]

DeepTracks: Geopositioning Maritime Vehicles in Video Acquired from a Moving Platform

Authors: Jianli Wei, Guanyu Xu, Alper Yilmaz

Abstract: Geopositioning and tracking a moving boat at sea is a very challenging problem, requiring boat detection, matching and estimating its GPS location from imagery with no common features. The problem can be stated as follows: given imagery from a camera mounted on a moving platform with known GPS location as the only valid sensor, we predict the geoposition of a target boat visible in images. Our sol… ▽ More Geopositioning and tracking a moving boat at sea is a very challenging problem, requiring boat detection, matching and estimating its GPS location from imagery with no common features. The problem can be stated as follows: given imagery from a camera mounted on a moving platform with known GPS location as the only valid sensor, we predict the geoposition of a target boat visible in images. Our solution uses recent ML algorithms, the camera-scene geometry and Bayesian filtering. The proposed pipeline first detects and tracks the target boat's location in the image with the strategy of tracking by detection. This image location is then converted to geoposition to the local sea coordinates referenced to the camera GPS location using plane projective geometry. Finally, target boat local coordinates are transformed to global GPS coordinates to estimate the geoposition. To achieve a smooth geotrajectory, we apply unscented Kalman filter (UKF) which implicitly overcomes small detection errors in the early stages of the pipeline. We tested the performance of our approach using GPS ground truth and show the accuracy and speed of the estimated geopositions. Our code is publicly available at https://github.com/JianliWei1995/AI-Track-at-Sea. △ Less

Submitted 2 September, 2021; originally announced September 2021.

arXiv:2108.02800 [pdf]

A volumetric change detection framework using UAV oblique photogrammetry - A case study of ultra-high-resolution monitoring of progressive building collapse

Authors: Ningli Xu, Debao Huang, Shuang Song, Xiao Ling, Chris Strasbaugh, Alper Yilmaz, Halil Sezen, Rongjun Qin

Abstract: In this paper, we present a case study that performs an unmanned aerial vehicle (UAV) based fine-scale 3D change detection and monitoring of progressive collapse performance of a building during a demolition event. Multi-temporal oblique photogrammetry images are collected with 3D point clouds generated at different stages of the demolition. The geometric accuracy of the generated point clouds has… ▽ More In this paper, we present a case study that performs an unmanned aerial vehicle (UAV) based fine-scale 3D change detection and monitoring of progressive collapse performance of a building during a demolition event. Multi-temporal oblique photogrammetry images are collected with 3D point clouds generated at different stages of the demolition. The geometric accuracy of the generated point clouds has been evaluated against both airborne and terrestrial LiDAR point clouds, achieving an average distance of 12 cm and 16 cm for roof and facade respectively. We propose a hierarchical volumetric change detection framework that unifies multi-temporal UAV images for pose estimation (free of ground control points), reconstruction, and a coarse-to-fine 3D density change analysis. This work has provided a solution capable of addressing change detection on full 3D time-series datasets where dramatic scene content changes are presented progressively. Our change detection results on the building demolition event have been evaluated against the manually marked ground-truth changes and have achieved an F-1 score varying from 0.78 to 0.92, with consistently high precision (0.92 - 0.99). Volumetric changes through the demolition progress are derived from change detection and have shown to favorably reflect the qualitative and quantitative building demolition progression. △ Less

Submitted 5 August, 2021; originally announced August 2021.

Comments: 28 pages, 9 figures

arXiv:2108.02162 [pdf, other]

Mechatronic Investigation of Wound Healing Process by Using Micro Robot

Authors: Abdurrahim Yilmaz, Ali Anil Demircali, Serra Ozkasap, Leyla Yorgancioglu, Huseyin Uvet, Gizem Aydemir

Abstract: The purpose of this study is to find ideal forces for reducing cell stress in wound healing process by micro robots. Because of this aim, we made two simulations on COMSOL Multiphysics with micro robot to find correct force. As a result of these simulation, we created force curves to obtain the minimum force and friction force that could lift the cells from the surface will be determined. As the p… ▽ More The purpose of this study is to find ideal forces for reducing cell stress in wound healing process by micro robots. Because of this aim, we made two simulations on COMSOL Multiphysics with micro robot to find correct force. As a result of these simulation, we created force curves to obtain the minimum force and friction force that could lift the cells from the surface will be determined. As the potential of the system for two micro robots that have 2 mm x 0.25 mm x 0.4 mm dimension SU-8 body with 3 NdFeB that have 0.25 thickness and diameter, simulation results at maximum force in the x-axis calculated with 4.640 mN, the distance between the two robots is 150 um. △ Less

Submitted 5 August, 2021; v1 submitted 4 August, 2021; originally announced August 2021.

Comments: 4 pages, 5 figures

arXiv:2106.16139 [pdf, other]

Deep Convolutional Neural Networks for Onychomycosis Detection

Authors: Abdurrahim Yilmaz, Fatih Goktay, Rahmetullah Varol, Gulsum Gencoglan, Huseyin Uvet

Abstract: The diagnosis of superficial fungal infections in dermatology is still mostly based on manual direct microscopic examination with Potassium Hydroxide (KOH) solution. However, this method can be time consuming and its diagnostic accuracy rates vary widely depending on the clinician's experience. With the increase of neural network applications in the field of clinical microscopy, it is now possible… ▽ More The diagnosis of superficial fungal infections in dermatology is still mostly based on manual direct microscopic examination with Potassium Hydroxide (KOH) solution. However, this method can be time consuming and its diagnostic accuracy rates vary widely depending on the clinician's experience. With the increase of neural network applications in the field of clinical microscopy, it is now possible to automate such manual processes increasing both efficiency and accuracy. This study presents a deep neural network structure that enables the rapid solutions for these problems and can perform automatic fungi detection in grayscale images without dyes. 160 microscopic field photographs containing the fungal element, obtained from patients with onychomycosis, and 297 microscopic field photographs containing dissolved keratin obtained from normal nails were collected. Smaller patches containing 4234 fungi and 4981 keratin were extracted from these images. In order to detect fungus and keratin, VGG16 and InceptionV3 models were developed. The VGG16 model had 95.98% accuracy, and the area under the curve (AUC) value of 0.9930, while the InceptionV3 model had 95.90% accuracy and the AUC value of 0.9917. However, average accuracy and AUC value of clinicians is 72.8% and 0.87, respectively. This deep learning model allows the development of an automated system that can detect fungi within microscopic images. △ Less

Submitted 5 January, 2022; v1 submitted 30 June, 2021; originally announced June 2021.

arXiv:2106.04390 [pdf, other]

The Effect of Pore Structure in Flapping Wings on Flight Performance

Authors: Abdurrahim Yilmaz, Asli Tekeci, Meryem Ece Ozyetkin, Ali Anil Demircali, Kagan Unsal, Huseyin Uvet

Abstract: This study investigates the effects of porosity on flying creatures such as dragonflies, moths, hummingbirds, etc. wing and shows that pores can affect wing performance. These studies were performed by 3D porous flapping wing flow analyses on Comsol Multiphysics. In this study, we analyzed different numbers of the porous wing at different angles of inclination in order to see the effect of pores o… ▽ More This study investigates the effects of porosity on flying creatures such as dragonflies, moths, hummingbirds, etc. wing and shows that pores can affect wing performance. These studies were performed by 3D porous flapping wing flow analyses on Comsol Multiphysics. In this study, we analyzed different numbers of the porous wing at different angles of inclination in order to see the effect of pores on lift and drag forces. To compare the results 9 different analyses were performed. In these analyses, airflow velocity was taken as 5 m/s, angle of attack as 5 degrees, frequency as 25 Hz, and flapping angle as 30 degrees. By keeping these values constant, the number of pores was changed to 36, 48, and 60, and the pore angles of inclination to 60, 70, and 80 degrees. Analyses were carried out by giving laminar flow to this wing designed in the Comsol Multiphysics program. The importance of pores was investigated by comparing the results of these analyses. △ Less

Submitted 3 June, 2021; originally announced June 2021.

arXiv:2105.12794 [pdf, other]

DFPN: Deformable Frame Prediction Network

Authors: M. Akın Yılmaz, A. Murat Tekalp

Abstract: Learned frame prediction is a current problem of interest in computer vision and video compression. Although several deep network architectures have been proposed for learned frame prediction, to the best of our knowledge, there is no work based on using deformable convolutions for frame prediction. To this effect, we propose a deformable frame prediction network (DFPN) for task oriented implicit… ▽ More Learned frame prediction is a current problem of interest in computer vision and video compression. Although several deep network architectures have been proposed for learned frame prediction, to the best of our knowledge, there is no work based on using deformable convolutions for frame prediction. To this effect, we propose a deformable frame prediction network (DFPN) for task oriented implicit motion modeling and next frame prediction. Experimental results demonstrate that the proposed DFPN model achieves state of the art results in next frame prediction. Our models and results are available at https://github.com/makinyilmaz/DFPN. △ Less

Submitted 26 May, 2021; originally announced May 2021.

Comments: Accepted for publication in IEEE International Conference on Image Processing (ICIP) 2021

arXiv:2105.12107 [pdf, other]

Self-Organized Variational Autoencoders (Self-VAE) for Learned Image Compression

Authors: M. Akın Yılmaz, Onur Keleş, Hilal Güven, A. Murat Tekalp, Junaid Malik, Serkan Kıranyaz

Abstract: In end-to-end optimized learned image compression, it is standard practice to use a convolutional variational autoencoder with generalized divisive normalization (GDN) to transform images into a latent space. Recently, Operational Neural Networks (ONNs) that learn the best non-linearity from a set of alternatives, and their self-organized variants, Self-ONNs, that approximate any non-linearity via… ▽ More In end-to-end optimized learned image compression, it is standard practice to use a convolutional variational autoencoder with generalized divisive normalization (GDN) to transform images into a latent space. Recently, Operational Neural Networks (ONNs) that learn the best non-linearity from a set of alternatives, and their self-organized variants, Self-ONNs, that approximate any non-linearity via Taylor series have been proposed to address the limitations of convolutional layers and a fixed nonlinear activation. In this paper, we propose to replace the convolutional and GDN layers in the variational autoencoder with self-organized operational layers, and propose a novel self-organized variational autoencoder (Self-VAE) architecture that benefits from stronger non-linearity. The experimental results demonstrate that the proposed Self-VAE yields improvements in both rate-distortion performance and perceptual image quality. △ Less

Submitted 28 May, 2021; v1 submitted 25 May, 2021; originally announced May 2021.

Comments: Accepted for publication in IEEE International Conference on Image Processing (ICIP) 2021

arXiv:2105.09913 [pdf]

POCFormer: A Lightweight Transformer Architecture for Detection of COVID-19 Using Point of Care Ultrasound

Authors: Shehan Perera, Srikar Adhikari, Alper Yilmaz

Abstract: The rapid and seemingly endless expansion of COVID-19 can be traced back to the inefficiency and shortage of testing kits that offer accurate results in a timely manner. An emerging popular technique, which adopts improvements made in mobile ultrasound technology, allows for healthcare professionals to conduct rapid screenings on a large scale. We present an image-based solution that aims at autom… ▽ More The rapid and seemingly endless expansion of COVID-19 can be traced back to the inefficiency and shortage of testing kits that offer accurate results in a timely manner. An emerging popular technique, which adopts improvements made in mobile ultrasound technology, allows for healthcare professionals to conduct rapid screenings on a large scale. We present an image-based solution that aims at automating the testing process which allows for rapid mass testing to be conducted with or without a trained medical professional that can be applied to rural environments and third world countries. Our contributions towards rapid large-scale testing include a novel deep learning architecture capable of analyzing ultrasound data that can run in real-time and significantly improve the current state-of-the-art detection accuracies using image-based COVID-19 detection. △ Less

Submitted 20 May, 2021; originally announced May 2021.

arXiv:2104.14868 [pdf, other]

On the Computation of PSNR for a Set of Images or Video

Authors: Onur Keleş, M. Akın Yılmaz, A. Murat Tekalp, Cansu Korkmaz, Zafer Dogan

Abstract: When comparing learned image/video restoration and compression methods, it is common to report peak-signal to noise ratio (PSNR) results. However, there does not exist a generally agreed upon practice to compute PSNR for sets of images or video. Some authors report average of individual image/frame PSNR, which is equivalent to computing a single PSNR from the geometric mean of individual image/fra… ▽ More When comparing learned image/video restoration and compression methods, it is common to report peak-signal to noise ratio (PSNR) results. However, there does not exist a generally agreed upon practice to compute PSNR for sets of images or video. Some authors report average of individual image/frame PSNR, which is equivalent to computing a single PSNR from the geometric mean of individual image/frame mean-square error (MSE). Others compute a single PSNR from the arithmetic mean of frame MSEs for each video. Furthermore, some compute the MSE/PSNR of Y-channel only, while others compute MSE/PSNR for RGB channels. This paper investigates different approaches to computing PSNR for sets of images, single video, and sets of video and the relation between them. We show the difference between computing the PSNR based on arithmetic vs. geometric mean of MSE depends on the distribution of MSE over the set of images or video, and that this distribution is task-dependent. In particular, these two methods yield larger differences in restoration problems, where the MSE is exponentially distributed and smaller differences in compression problems, where the MSE distribution is narrower. We hope this paper will motivate the community to clearly describe how they compute reported PSNR values to enable consistent comparison. △ Less

Submitted 30 April, 2021; originally announced April 2021.

Comments: accepted for publication in Picture Coding Symposium (PCS) 2021

arXiv:2104.11927 [pdf, other]

doi 10.1109/TCPMT.2021.3121265

Anomaly Detection for Solder Joints Using $β$-VAE

Authors: Furkan Ulger, Seniha Esen Yuksel, Atila Yilmaz

Abstract: In the assembly process of printed circuit boards (PCB), most of the errors are caused by solder joints in Surface Mount Devices (SMD). In the literature, traditional feature extraction based methods require designing hand-crafted features and rely on the tiered RGB illumination to detect solder joint errors, whereas the supervised Convolutional Neural Network (CNN) based approaches require a lot… ▽ More In the assembly process of printed circuit boards (PCB), most of the errors are caused by solder joints in Surface Mount Devices (SMD). In the literature, traditional feature extraction based methods require designing hand-crafted features and rely on the tiered RGB illumination to detect solder joint errors, whereas the supervised Convolutional Neural Network (CNN) based approaches require a lot of labelled abnormal samples (defective solder joints) to achieve high accuracy. To solve the optical inspection problem in unrestricted environments with no special lighting and without the existence of error-free reference boards, we propose a new beta-Variational Autoencoders (beta-VAE) architecture for anomaly detection that can work on both IC and non-IC components. We show that the proposed model learns disentangled representation of data, leading to more independent features and improved latent space representations. We compare the activation and gradient-based representations that are used to characterize anomalies; and observe the effect of different beta parameters on accuracy and on untwining the feature representations in beta-VAE. Finally, we show that anomalies on solder joints can be detected with high accuracy via a model trained on directly normal samples without designated hardware or feature engineering. △ Less

Submitted 16 December, 2021; v1 submitted 24 April, 2021; originally announced April 2021.

Comments: Published in IEEE Transactions on Components, Packaging and Manufacturing Technology

Journal ref: in IEEE Transactions on Components, Packaging and Manufacturing Technology, vol. 11, no. 12, pp. 2214-2221, Dec. 2021

arXiv:2104.05448 [pdf]

Deep Transformer Networks for Time Series Classification: The NPP Safety Case

Authors: Bing Zha, Alessandro Vanni, Yassin Hassan, Tunc Aldemir, Alper Yilmaz

Abstract: A challenging part of dynamic probabilistic risk assessment for nuclear power plants is the need for large amounts of temporal simulations given various initiating events and branching conditions from which representative feature extraction becomes complicated for subsequent applications. Artificial Intelligence techniques have been shown to be powerful tools in time-dependent sequential data proc… ▽ More A challenging part of dynamic probabilistic risk assessment for nuclear power plants is the need for large amounts of temporal simulations given various initiating events and branching conditions from which representative feature extraction becomes complicated for subsequent applications. Artificial Intelligence techniques have been shown to be powerful tools in time-dependent sequential data processing to automatically extract and yield complex features from large data. An advanced temporal neural network referred to as the Transformer is used within a supervised learning fashion to model the time-dependent NPP simulation data and to infer whether a given sequence of events leads to core damage or not. The training and testing datasets for the Transformer are obtained by running 10,000 RELAP5-3D NPP blackout simulations with the list of variables obtained from the RAVEN software. Each simulation is classified as "OK" or "CORE DAMAGE" based on the consequence. The results show that the Transformer can learn the characteristics of the sequential data and yield promising performance with approximately 99% classification accuracy on the testing dataset. △ Less

Submitted 18 April, 2021; v1 submitted 9 April, 2021; originally announced April 2021.

arXiv:2102.06922 [pdf, other]

A Theoretical Performance Bound for Joint Beamformer Design of Wireless Fronthaul and Access Links in Downlink C-RAN

Authors: Fehmi Emre Kadan, Ali Özgür Yılmaz

Abstract: It is known that data rates in standard cellular networks are limited due to inter-cell interference. An effective solution of this problem is to use the multi-cell cooperation idea. In Cloud Radio Access Network (C-RAN), which is a candidate solution in 5G and future communication networks, cooperation is applied by means of central processors (CPs) connected to simple remote radio heads with fin… ▽ More It is known that data rates in standard cellular networks are limited due to inter-cell interference. An effective solution of this problem is to use the multi-cell cooperation idea. In Cloud Radio Access Network (C-RAN), which is a candidate solution in 5G and future communication networks, cooperation is applied by means of central processors (CPs) connected to simple remote radio heads with finite capacity fronthaul links. In this study, we consider a downlink C-RAN with a wireless fronthaul and aim to minimize total power spent by jointly designing beamformers for fronthaul and access links. We consider the case where perfect channel state information is not available in the CP. We first derive a novel theoretical performance bound for the problem defined. Then we propose four algorithms with different complexities to show the tightness of the bound. The first two algorithms apply successive convex optimizations with semi-definite relaxation idea where other two are adapted from well-known beamforming design methods. The detailed simulations under realistic channel conditions show that as the complexity of the algorithm increases, the corresponding performance becomes closer to the bound. △ Less

Submitted 13 February, 2021; originally announced February 2021.

Comments: 30 pages, single column, 11 figures, submitted to Transactions on Wireless Communications in Oct. 20, 2020. Major Revision decision was made in Jan. 16, 2021. After the revision, it will be resubmitted to the same journal until the end of February, 2021

arXiv:2102.06916 [pdf, other]

Beamformer Design with Smooth Constraint-Free Approximation in Downlink Cloud Radio Access Networks

Authors: Fehmi Emre Kadan, Ali Özgür Yılmaz

Abstract: It is known that data rates in standard cellular networks are limited due to inter-cell interference. An effective solution of this problem is to use the multi-cell cooperation idea. In Cloud Radio Access Network, which is a candidate solution in 5G and beyond, cooperation is applied by means of central processors (CPs) connected to simple remote radio heads with finite capacity fronthaul links. I… ▽ More It is known that data rates in standard cellular networks are limited due to inter-cell interference. An effective solution of this problem is to use the multi-cell cooperation idea. In Cloud Radio Access Network, which is a candidate solution in 5G and beyond, cooperation is applied by means of central processors (CPs) connected to simple remote radio heads with finite capacity fronthaul links. In this study, we consider a downlink scenario and aim to minimize total power spent by designing beamformers. We consider the case where perfect channel state information is not available in the CP. The original problem includes discontinuous terms with many constraints. We propose a novel method which transforms the problem into a smooth constraint-free form and a solution is found by the gradient descent approach. As a comparison, we consider the optimal method solving an extensive number of convex sub-problems, a known heuristic search algorithm and some sparse solution techniques. Heuristic search methods find a solution by solving a subset of all possible convex sub-problems. Sparse techniques apply some norm approximation ($\ell_0/\ell_1, \ell_0/\ell_2$) or convex approximation to make the objective function more tractable. We also derive a theoretical performance bound in order to observe how far the proposed method performs off the optimal method when running the optimal method is prohibitive due to computational complexity. Detailed simulations show that the performance of the proposed method is close to the optimal one, and it outperforms other methods analyzed. △ Less

Submitted 13 February, 2021; originally announced February 2021.

Comments: 18 pages, 12 figures, submitted to IEEE Access in Feb. 03, 2021. It is a revised version of the paper submitted to IEEE Access in Nov. 23, 2020. Revisions were made according to the reviewer comments

arXiv:2011.03098 [pdf]

End-to-end Deep Learning Methods for Automated Damage Detection in Extreme Events at Various Scales

Authors: Yongsheng Bai, Halil Sezen, Alper Yilmaz

Abstract: Robust Mask R-CNN (Mask Regional Convolu-tional Neural Network) methods are proposed and tested for automatic detection of cracks on structures or their components that may be damaged during extreme events, such as earth-quakes. We curated a new dataset with 2,021 labeled images for training and validation and aimed to find end-to-end deep neural networks for crack detection in the field. With dat… ▽ More Robust Mask R-CNN (Mask Regional Convolu-tional Neural Network) methods are proposed and tested for automatic detection of cracks on structures or their components that may be damaged during extreme events, such as earth-quakes. We curated a new dataset with 2,021 labeled images for training and validation and aimed to find end-to-end deep neural networks for crack detection in the field. With data augmentation and parameters fine-tuning, Path Aggregation Network (PANet) with spatial attention mechanisms and High-resolution Network (HRNet) are introduced into Mask R-CNNs. The tests on three public datasets with low- or high-resolution images demonstrate that the proposed methods can achieve a big improvement over alternative networks, so the proposed method may be sufficient for crack detection for a variety of scales in real applications. △ Less

Submitted 5 November, 2020; originally announced November 2020.

arXiv:2010.06117 [pdf, other]

Map-Based Temporally Consistent Geolocalization through Learning Motion Trajectories

Authors: Bing Zha, Alper Yilmaz

Abstract: In this paper, we propose a novel trajectory learning method that exploits motion trajectories on topological map using recurrent neural network for temporally consistent geolocalization of object. Inspired by human's ability to both be aware of distance and direction of self-motion in navigation, our trajectory learning method learns a pattern representation of trajectories encoded as a sequence… ▽ More In this paper, we propose a novel trajectory learning method that exploits motion trajectories on topological map using recurrent neural network for temporally consistent geolocalization of object. Inspired by human's ability to both be aware of distance and direction of self-motion in navigation, our trajectory learning method learns a pattern representation of trajectories encoded as a sequence of distances and turning angles to assist self-localization. We pose the learning process as a conditional sequence prediction problem in which each output locates the object on a traversable path in a map. Considering the prediction sequence ought to be topologically connected in the graph-structured map, we adopt two different hypotheses generation and elimination strategies to eliminate disconnected sequence prediction. We demonstrate our approach on the KITTI stereo visual odometry dataset which is a city-scale environment and can generate trajectory with metric information. The key benefits of our approach to geolocalization are that 1) we take advantage of powerful sequence modeling ability of recurrent neural network and its robustness to noisy input, 2) only require a map in the form of a graph and simply use an affordable sensor that generates motion trajectory and 3) do not need initial position. The experiments show that the motion trajectories can be learned by training an recurrent neural network, and temporally consistent geolocation can be predicted with both of the proposed strategies. △ Less

Submitted 12 October, 2020; originally announced October 2020.

arXiv:2008.06106 [pdf, other]

doi 10.1109/ICIP.2019.8803624

Effect of Architectures and Training Methods on the Performance of Learned Video Frame Prediction

Authors: M. Akin Yilmaz, A. Murat Tekalp

Abstract: We analyze the performance of feedforward vs. recurrent neural network (RNN) architectures and associated training methods for learned frame prediction. To this effect, we trained a residual fully convolutional neural network (FCNN), a convolutional RNN (CRNN), and a convolutional long short-term memory (CLSTM) network for next frame prediction using the mean square loss. We performed both statele… ▽ More We analyze the performance of feedforward vs. recurrent neural network (RNN) architectures and associated training methods for learned frame prediction. To this effect, we trained a residual fully convolutional neural network (FCNN), a convolutional RNN (CRNN), and a convolutional long short-term memory (CLSTM) network for next frame prediction using the mean square loss. We performed both stateless and stateful training for recurrent networks. Experimental results show that the residual FCNN architecture performs the best in terms of peak signal to noise ratio (PSNR) at the expense of higher training and test (inference) computational complexity. The CRNN can be trained stably and very efficiently using the stateful truncated backpropagation through time procedure, and it requires an order of magnitude less inference runtime to achieve near real-time frame prediction with an acceptable performance. △ Less

Submitted 13 August, 2020; originally announced August 2020.

Comments: Accepted for publication at IEEE ICIP 2019

arXiv:2008.05028 [pdf, other]

doi 10.1109/ICIP40778.2020.9190881

End-to-End Rate-Distortion Optimization for Bi-Directional Learned Video Compression

Authors: M. Akin Yilmaz, A. Murat Tekalp

Abstract: Conventional video compression methods employ a linear transform and block motion model, and the steps of motion estimation, mode and quantization parameter selection, and entropy coding are optimized individually due to combinatorial nature of the end-to-end optimization problem. Learned video compression allows end-to-end rate-distortion optimized training of all nonlinear modules, quantization… ▽ More Conventional video compression methods employ a linear transform and block motion model, and the steps of motion estimation, mode and quantization parameter selection, and entropy coding are optimized individually due to combinatorial nature of the end-to-end optimization problem. Learned video compression allows end-to-end rate-distortion optimized training of all nonlinear modules, quantization parameter and entropy model simultaneously. While previous work on learned video compression considered training a sequential video codec based on end-to-end optimization of cost averaged over pairs of successive frames, it is well-known in conventional video compression that hierarchical, bi-directional coding outperforms sequential compression. In this paper, we propose for the first time end-to-end optimization of a hierarchical, bi-directional motion compensated learned codec by accumulating cost function over fixed-size groups of pictures (GOP). Experimental results show that the rate-distortion performance of our proposed learned bi-directional {\it GOP coder} outperforms the state-of-the-art end-to-end optimized learned sequential compression as expected. △ Less

Submitted 26 May, 2021; v1 submitted 11 August, 2020; originally announced August 2020.

Comments: This work is accepted for publication in IEEE ICIP 2020

arXiv:2007.08467 [pdf, ps, other]

A Reduced Complexity Ungerboeck Receiver for Quantized Wideband Massive SC-MIMO

Authors: Ali Bulut Üçüncü, Gökhan Muzaffer Güvensen, Ali Özgür Yılmaz

Abstract: Employing low resolution analog-to-digital converters in massive multiple-input multiple-output (MIMO) has many advantages in terms of total power consumption, cost and feasibility of such systems. However, such advantages come together with significant challenges in channel estimation and data detection due to the severe quantization noise present. In this study, we propose a novel iterative rece… ▽ More Employing low resolution analog-to-digital converters in massive multiple-input multiple-output (MIMO) has many advantages in terms of total power consumption, cost and feasibility of such systems. However, such advantages come together with significant challenges in channel estimation and data detection due to the severe quantization noise present. In this study, we propose a novel iterative receiver for quantized uplink single carrier MIMO (SC-MIMO) utilizing an efficient message passing algorithm based on the Bussgang decomposition and Ungerboeck factorization, which avoids the use of a complex whitening filter. A reduced state sequence estimator with bidirectional decision feedback is also derived, achieving remarkable complexity reduction compared to the existing receivers for quantized SC-MIMO in the literature, without any requirement on the sparsity of the transmission channel. Moreover, the linear minimum mean-square-error (LMMSE) channel estimator for SC-MIMO under frequency-selective channel, which do not require any cyclic-prefix overhead, is also derived. We observe that the proposed receiver has significant performance gains with respect to the existing receivers in the literature under imperfect channel state information. △ Less

Submitted 18 August, 2020; v1 submitted 16 July, 2020; originally announced July 2020.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2002.07594 [pdf, other]

doi 10.1109/TCOMM.2019.2954512

Performance Analysis of Quantized Uplink Massive MIMO-OFDM With Oversampling Under Adjacent Channel Interference

Authors: Ali Bulut Üçüncü, Emil Björnson, Håkan Johansson, Erik G. Larsson, Ali Özgür Yılmaz

Abstract: Massive multiple-input multiple-output (MIMO) systems have attracted much attention lately due to the many advantages they provide over single-antenna systems. Owing to the many antennas, low-cost implementation and low power consumption per antenna are desired. To that end, massive MIMO structures with low-resolution analog-to-digital converters (ADC) have been investigated in many studies. Howev… ▽ More Massive multiple-input multiple-output (MIMO) systems have attracted much attention lately due to the many advantages they provide over single-antenna systems. Owing to the many antennas, low-cost implementation and low power consumption per antenna are desired. To that end, massive MIMO structures with low-resolution analog-to-digital converters (ADC) have been investigated in many studies. However, the effect of a strong interferer in the adjacent band on quantized massive MIMO systems have not been examined yet. In this study, we analyze the performance of uplink massive MIMO with low-resolution ADCs under frequency selective fading with orthogonal frequency division multiplexing in the perfect and imperfect receiver channel state information cases. We derive analytical expressions for the bit error rate and ergodic capacity. We show that the interfering band can be suppressed by increasing the number of antennas or the oversampling rate when a zero-forcing receiver is employed. △ Less

Submitted 18 February, 2020; originally announced February 2020.

arXiv:2001.00470 [pdf, other]

The Mobile AR Sensor Logger for Android and iOS Devices

Authors: Jianzhu Huai, Yujia Zhang, Alper Yilmaz

Abstract: In recent years, commodity mobile devices equipped with cameras and inertial measurement units (IMUs) have attracted much research and design effort for augmented reality (AR) and robotics applications. Based on such sensors, many commercial AR toolkits and public benchmark datasets have been made available to accelerate hatching and validating new ideas. To lower the difficulty and enhance the fl… ▽ More In recent years, commodity mobile devices equipped with cameras and inertial measurement units (IMUs) have attracted much research and design effort for augmented reality (AR) and robotics applications. Based on such sensors, many commercial AR toolkits and public benchmark datasets have been made available to accelerate hatching and validating new ideas. To lower the difficulty and enhance the flexibility in accessing the rich raw data of typical AR sensors on mobile devices, this paper present the mobile AR sensor (MARS) logger for two of the most popular mobile operating systems, Android and iOS. The logger highlights the best possible synchronization between the camera and the IMU allowed by a mobile device, and efficient saving of images at about 30Hz, and recording the metadata relevant to AR applications. This logger has been tested on a relatively large spectrum of mobile devices, and the collected data has been used for analyzing the sensor characteristics. We see that this application will facilitate research and development related to AR and robotics, so it has been open sourced at https://github.com/OSUPCVLab/mobile-ar-sensor-logger. △ Less

Submitted 21 December, 2019; originally announced January 2020.

Comments: 4 pages, 4 figures, submitted to IEEE Sensors 2019

arXiv:1911.09899 [pdf]

Knowledge Network and a Knowledge Network Example

Authors: Hilmi Bahadır Temur, Ahmet Serdar Yılmaz, Mehmet Tekerek

Abstract: Knowledge networks can be defined as social networks that enable the transfer of the knowledge, which is defined as the intellectual product formed as a result of the work of human intelligence, to be transferred to any other means of communication. A knowledge network represents a large number of people, resources and relationships between them, to create the highest value, primarily to accumulat… ▽ More Knowledge networks can be defined as social networks that enable the transfer of the knowledge, which is defined as the intellectual product formed as a result of the work of human intelligence, to be transferred to any other means of communication. A knowledge network represents a large number of people, resources and relationships between them, to create the highest value, primarily to accumulate and use knowledge through the process of generating and transmitting knowledge. General structure of knowledge networks; it consists of three basic stages: gathering, organizing and disseminating knowledge. The first step, knowledge collection, institutions and organizations to enter the network structure of the knowledge that is present. The organizing phase is the structuring of irregular and unstructured knowledge in the network structure according to certain standards and recording them regularly in the structure. Knowledge dissemination can be expressed as the transfer of organized knowledge in accordance with user knowledge and needs. The purpose of the training knowledge network is to ensure communication between the student, the teacher and the guide, and to store the knowledge that is formed in a course, to enable the system to be recorded in a systematic way, to disseminate it according to the needs of the users and to update the knowledge. A dynamic website and content management system have been developed for the training knowledge network. The training knowledge network has a flexible structure to support web technologies. By making production, management and publication of knowledge within the framework of specific needs, it provides knowledge network for different problems, producing different and tailored solutions. With the example of education knowledge network, a knowledge network that can be developed according to the innovative knowledge network structure is designed. △ Less

Submitted 22 November, 2019; originally announced November 2019.

Comments: 6 Pages, in Turkish language, 4 figures, Conference Proceeding

Journal ref: Proceedings of International Symposium on Advanced Engineering Technologies 2019, 1, 1350-1355 (2019)

arXiv:1911.06610 [pdf]

Design of Internet of Things Based Controller for Direct Current Motors

Authors: Zeynep Özdemir, Mehmet Tekerek, Ahmet Serdar Yılmaz

Abstract: It is known that internet have been widespread in many areas of life and enabled machines to communicate each other. The terms of Internet of Things (IoT) has emerged as a result of networks consist of machines and equipment that work independent of operator and communicate each other. Sensors, embedded systems, communication technologies and data storage systems like clouds create wide area netwo… ▽ More It is known that internet have been widespread in many areas of life and enabled machines to communicate each other. The terms of Internet of Things (IoT) has emerged as a result of networks consist of machines and equipment that work independent of operator and communicate each other. Sensors, embedded systems, communication technologies and data storage systems like clouds create wide area networks that can communicate and share data. In this study, the application of the IoT on a low speed mechanical benchmark driven by a direct current motor has been designed. It is aimed to keep the motor speed fixed at desired value according to the changings occurs depending on varying the pressure acting on the actuators. In presented study, a speed control is performed by establishing a modeling and simulation mechanism that will give the closest results to the real system. △ Less

Submitted 21 October, 2019; originally announced November 2019.

Comments: in Turkish language. International Symposium on Advanced Engineering Technologies 2019

arXiv:1810.10438 [pdf, other]

UAVid: A Semantic Segmentation Dataset for UAV Imagery

Authors: Ye Lyu, George Vosselman, Guisong Xia, Alper Yilmaz, Michael Ying Yang

Abstract: Semantic segmentation has been one of the leading research interests in computer vision recently. It serves as a perception foundation for many fields, such as robotics and autonomous driving. The fast development of semantic segmentation attributes enormously to the large scale datasets, especially for the deep learning related methods. There already exist several semantic segmentation datasets f… ▽ More Semantic segmentation has been one of the leading research interests in computer vision recently. It serves as a perception foundation for many fields, such as robotics and autonomous driving. The fast development of semantic segmentation attributes enormously to the large scale datasets, especially for the deep learning related methods. There already exist several semantic segmentation datasets for comparison among semantic segmentation methods in complex urban scenes, such as the Cityscapes and CamVid datasets, where the side views of the objects are captured with a camera mounted on the driving car. There also exist semantic labeling datasets for the airborne images and the satellite images, where the top views of the objects are captured. However, only a few datasets capture urban scenes from an oblique Unmanned Aerial Vehicle (UAV) perspective, where both of the top view and the side view of the objects can be observed, providing more information for object recognition. In this paper, we introduce our UAVid dataset, a new high-resolution UAV semantic segmentation dataset as a complement, which brings new challenges, including large scale variation, moving object recognition and temporal consistency preservation. Our UAV dataset consists of 30 video sequences capturing 4K high-resolution images in slanted views. In total, 300 images have been densely labeled with 8 classes for the semantic labeling task. We have provided several deep learning baseline methods with pre-training, among which the proposed Multi-Scale-Dilation net performs the best via multi-scale feature extraction. Our UAVid website and the labeling tool have been published https://uavid.nl/. △ Less

Submitted 18 May, 2020; v1 submitted 24 October, 2018; originally announced October 2018.

Comments: Accepted by ISPRS Journal of Photogrammetry and Remote Sensing

arXiv:1809.07257 [pdf, other]

MTLE: A Multitask Learning Encoder of Visual Feature Representations for Video and Movie Description

Authors: Oliver Nina, Washington Garcia, Scott Clouse, Alper Yilmaz

Abstract: Learning visual feature representations for video analysis is a daunting task that requires a large amount of training samples and a proper generalization framework. Many of the current state of the art methods for video captioning and movie description rely on simple encoding mechanisms through recurrent neural networks to encode temporal visual information extracted from video data. In this pape… ▽ More Learning visual feature representations for video analysis is a daunting task that requires a large amount of training samples and a proper generalization framework. Many of the current state of the art methods for video captioning and movie description rely on simple encoding mechanisms through recurrent neural networks to encode temporal visual information extracted from video data. In this paper, we introduce a novel multitask encoder-decoder framework for automatic semantic description and captioning of video sequences. In contrast to current approaches, our method relies on distinct decoders that train a visual encoder in a multitask fashion. Our system does not depend solely on multiple labels and allows for a lack of training data working even with datasets where only one single annotation is viable per video. Our method shows improved performance over current state of the art methods in several metrics on multi-caption and single-caption datasets. To the best of our knowledge, our method is the first method to use a multitask approach for encoding video features. Our method demonstrates its robustness on the Large Scale Movie Description Challenge (LSMDC) 2017 where our method won the movie description task and its results were ranked among other competitors as the most helpful for the visually impaired. △ Less

Submitted 19 September, 2018; originally announced September 2018.

Comments: This is a pre-print version of our soon to be released paper

arXiv:1804.00429 [pdf]

A Vehicle Detection Approach using Deep Learning Methodologies

Authors: Abdullah Asim Yilmaz, Mehmet Serdar Guzel, Iman Askerbeyli, Erkan Bostanci

Abstract: The purpose of this study is to successfully train our vehicle detector using R-CNN, Faster R-CNN deep learning methods on a sample vehicle data sets and to optimize the success rate of the trained detector by providing efficient results for vehicle detection by testing the trained vehicle detector on the test data. The working method consists of six main stages. These are respectively; loading th… ▽ More The purpose of this study is to successfully train our vehicle detector using R-CNN, Faster R-CNN deep learning methods on a sample vehicle data sets and to optimize the success rate of the trained detector by providing efficient results for vehicle detection by testing the trained vehicle detector on the test data. The working method consists of six main stages. These are respectively; loading the data set, the design of the convolutional neural network, configuration of training options, training of the Faster R-CNN object detector and evaluation of trained detector. In addition, in the scope of the study, Faster R-CNN, R-CNN deep learning methods were mentioned and experimental analysis comparisons were made with the results obtained from vehicle detection. △ Less

Submitted 2 April, 2018; originally announced April 2018.

Comments: 7 pages, 8 Figures, 1 table

arXiv:1708.01023 [pdf, other]

Collusion-Secure Watermarking for Sequential Data

Authors: Arif Yilmaz, Erman Ayday

Abstract: In this work, we address the liability issues that may arise due to unauthorized sharing of personal data. We consider a scenario in which an individual shares his sequential data (such as genomic data or location patterns) with several service providers (SPs). In such a scenario, if his data is shared with other third parties without his consent, the individual wants to determine the service prov… ▽ More In this work, we address the liability issues that may arise due to unauthorized sharing of personal data. We consider a scenario in which an individual shares his sequential data (such as genomic data or location patterns) with several service providers (SPs). In such a scenario, if his data is shared with other third parties without his consent, the individual wants to determine the service provider that is responsible for this unauthorized sharing. To provide this functionality, we propose a novel optimization-based watermarking scheme for sharing of sequential data. Thus, in the case of an unauthorized sharing of sensitive data, the proposed scheme can find the source of the leakage by checking the watermark inside the leaked data. In particular, the proposed schemes guarantees with a high probability that (i) the malicious SP that receives the data cannot understand the watermarked data points, (ii) when more than one malicious SPs aggregate their data, they still cannot determine the watermarked data points, (iii) even if the unauthorized sharing involves only a portion of the original data or modified data (to damage the watermark), the corresponding malicious SP can be kept responsible for the leakage, and (iv) the added watermark is compliant with the nature of the corresponding data. That is, if there are inherent correlations in the data, the added watermark still preserves such correlations. Watermarking typically means changing certain parts of the data, and hence it may have negative effects on data utility. The proposed scheme also minimizes such utility loss while it provides the aforementioned security guarantees. Furthermore, we conduct a case study of the proposed scheme on genomic data and show the security and utility guarantees of the proposed scheme. △ Less

Submitted 18 August, 2017; v1 submitted 3 August, 2017; originally announced August 2017.

Showing 1–50 of 67 results for author: Yilmaz, A