-
Time Series Foundation Models and Deep Learning Architectures for Earthquake Temporal and Spatial Nowcasting
Authors:
Alireza Jafari,
Geoffrey Fox,
John B. Rundle,
Andrea Donnellan,
Lisa Grant Ludwig
Abstract:
Advancing the capabilities of earthquake nowcasting, the real-time forecasting of seismic activities remains a crucial and enduring objective aimed at reducing casualties. This multifaceted challenge has recently gained attention within the deep learning domain, facilitated by the availability of extensive, long-term earthquake datasets. Despite significant advancements, existing literature on ear…
▽ More
Advancing the capabilities of earthquake nowcasting, the real-time forecasting of seismic activities remains a crucial and enduring objective aimed at reducing casualties. This multifaceted challenge has recently gained attention within the deep learning domain, facilitated by the availability of extensive, long-term earthquake datasets. Despite significant advancements, existing literature on earthquake nowcasting lacks comprehensive evaluations of pre-trained foundation models and modern deep learning architectures. These architectures, such as transformers or graph neural networks, uniquely focus on different aspects of data, including spatial relationships, temporal patterns, and multi-scale dependencies. This paper addresses the mentioned gap by analyzing different architectures and introducing two innovation approaches called MultiFoundationQuake and GNNCoder. We formulate earthquake nowcasting as a time series forecasting problem for the next 14 days within 0.1-degree spatial bins in Southern California, spanning from 1986 to 2024. Earthquake time series is forecasted as a function of logarithm energy released by quakes. Our comprehensive evaluation employs several key performance metrics, notably Nash-Sutcliffe Efficiency and Mean Squared Error, over time in each spatial region. The results demonstrate that our introduced models outperform other custom architectures by effectively capturing temporal-spatial relationships inherent in seismic data. The performance of existing foundation models varies significantly based on the pre-training datasets, emphasizing the need for careful dataset selection. However, we introduce a new general approach termed MultiFoundationPattern that combines a bespoke pattern with foundation model results handled as auxiliary streams. In the earthquake case, the resultant MultiFoundationQuake model achieves the best overall performance.
△ Less
Submitted 21 August, 2024;
originally announced August 2024.
-
Advancements in Radiomics and Artificial Intelligence for Thyroid Cancer Diagnosis
Authors:
Milad Yousefi,
Shadi Farabi Maleki,
Ali Jafarizadeh,
Mahya Ahmadpour Youshanlui,
Aida Jafari,
Siamak Pedrammehr,
Roohallah Alizadehsani,
Ryszard Tadeusiewicz,
Pawel Plawiak
Abstract:
Thyroid cancer is an increasing global health concern that requires advanced diagnostic methods. The application of AI and radiomics to thyroid cancer diagnosis is examined in this review. A review of multiple databases was conducted in compliance with PRISMA guidelines until October 2023. A combination of keywords led to the discovery of an English academic publication on thyroid cancer and relat…
▽ More
Thyroid cancer is an increasing global health concern that requires advanced diagnostic methods. The application of AI and radiomics to thyroid cancer diagnosis is examined in this review. A review of multiple databases was conducted in compliance with PRISMA guidelines until October 2023. A combination of keywords led to the discovery of an English academic publication on thyroid cancer and related subjects. 267 papers were returned from the original search after 109 duplicates were removed. Relevant studies were selected according to predetermined criteria after 124 articles were eliminated based on an examination of their abstract and title. After the comprehensive analysis, an additional six studies were excluded. Among the 28 included studies, radiomics analysis, which incorporates ultrasound (US) images, demonstrated its effectiveness in diagnosing thyroid cancer. Various results were noted, some of the studies presenting new strategies that outperformed the status quo. The literature has emphasized various challenges faced by AI models, including interpretability issues, dataset constraints, and operator dependence. The synthesized findings of the 28 included studies mentioned the need for standardization efforts and prospective multicenter studies to address these concerns. Furthermore, approaches to overcome these obstacles were identified, such as advances in explainable AI technology and personalized medicine techniques. The review focuses on how AI and radiomics could transform the diagnosis and treatment of thyroid cancer. Despite challenges, future research on multidisciplinary cooperation, clinical applicability validation, and algorithm improvement holds the potential to improve patient outcomes and diagnostic precision in the treatment of thyroid cancer.
△ Less
Submitted 9 April, 2024;
originally announced April 2024.
-
Agonist-Antagonist Pouch Motors: Bidirectional Soft Actuators Enhanced by Thermally Responsive Peltier Elements
Authors:
Trevor Exley,
Rashmi Wijesundara,
Nathan Tan,
Akshay Sunkara,
Xinyu He,
Shuopu Wang,
Bonnie Chan,
Aditya Jain,
Luis Espinosa,
Amir Jafari
Abstract:
In this study, we introduce a novel Mylar-based pouch motor design that leverages the reversible actuation capabilities of Peltier junctions to enable agonist-antagonist muscle mimicry in soft robotics. Addressing the limitations of traditional silicone-based materials, such as leakage and phase-change fluid degradation, our pouch motors filled with Novec 7000 provide a durable and leak-proof solu…
▽ More
In this study, we introduce a novel Mylar-based pouch motor design that leverages the reversible actuation capabilities of Peltier junctions to enable agonist-antagonist muscle mimicry in soft robotics. Addressing the limitations of traditional silicone-based materials, such as leakage and phase-change fluid degradation, our pouch motors filled with Novec 7000 provide a durable and leak-proof solution for geometric modeling. The integration of flexible Peltier junctions offers a significant advantage over conventional Joule heating methods by allowing active and reversible heating and cooling cycles. This innovation not only enhances the reliability and longevity of soft robotic applications but also broadens the scope of design possibilities, including the development of agonist-antagonist artificial muscles, grippers with can manipulate through flexion and extension, and an anchor-slip style simple crawler design. Our findings indicate that this approach could lead to more efficient, versatile, and durable robotic systems, marking a significant advancement in the field of soft robotics.
△ Less
Submitted 16 March, 2024;
originally announced March 2024.
-
TVIM: Thermo-Active Variable Impedance Module: Evaluating Shear-Mode Capabilities of Polycaprolactone
Authors:
Trevor Exley,
Rashmi Wijesundara,
Shuopu Wang,
Arian Moridani,
Amir Jafari
Abstract:
In this work, we introduce an advanced thermo-active variable impedance module which builds upon our previous innovation in thermal-based impedance adjustment for actuation systems. Our initial design harnessed the temperature-responsive, viscoelastic properties of Polycaprolactone (PCL) to modulate stiffness and damping, facilitated by integrated flexible Peltier elements. While effective, the re…
▽ More
In this work, we introduce an advanced thermo-active variable impedance module which builds upon our previous innovation in thermal-based impedance adjustment for actuation systems. Our initial design harnessed the temperature-responsive, viscoelastic properties of Polycaprolactone (PCL) to modulate stiffness and damping, facilitated by integrated flexible Peltier elements. While effective, the reliance on compressing and the inherent stress relaxation characteristics of PCL led to suboptimal response times in impedance adjustments. Addressing these limitations, the current iteration of our module pivots to a novel 'shear-mode' operation. By conducting comprehensive shear rheology analyses on PCL, we have identified a configuration that eliminates the viscoelastic delay, offering a faster response with improved heat transfer efficiency. A key advantage of our module lies in its scalability and elimination of additional mechanical actuators for impedance adjustment. The compactness and efficiency of thermal actuation through Peltier elements allow for significant downsizing, making these thermal, variable impedance modules exceptionally well-suited for applications where space constraints and actuator weight are critical considerations. This development represents a significant leap forward in the design of variable impedance actuators, offering a more versatile, responsive, and compact solution for a wide range of robotic and biomechanical applications.
△ Less
Submitted 16 March, 2024;
originally announced March 2024.
-
Streamlining the Selection Phase of Systematic Literature Reviews (SLRs) Using AI-Enabled GPT-4 Assistant API
Authors:
Seyed Mohammad Ali Jafari
Abstract:
The escalating volume of academic literature presents a formidable challenge in staying updated with the newest research developments. Addressing this, this study introduces a pioneering AI-based tool, configured specifically to streamline the efficiency of the article selection phase in Systematic Literature Reviews (SLRs). Utilizing the robust capabilities of OpenAI's GPT-4 Assistant API, the to…
▽ More
The escalating volume of academic literature presents a formidable challenge in staying updated with the newest research developments. Addressing this, this study introduces a pioneering AI-based tool, configured specifically to streamline the efficiency of the article selection phase in Systematic Literature Reviews (SLRs). Utilizing the robust capabilities of OpenAI's GPT-4 Assistant API, the tool successfully homogenizes the article selection process across a broad array of academic disciplines. Implemented through a tripartite approach consisting of data preparation, AI-mediated article assessment, and structured result presentation, this tool significantly accelerates the time-consuming task of literature reviews. Importantly, this tool could be highly beneficial in fields such as management and economics, where the SLR process involves substantial human judgment. The adoption of a standard GPT model can substantially reduce potential biases and enhance the speed and precision of the SLR selection phase. This not only amplifies researcher productivity and accuracy but also denotes a considerable stride forward in the way academic research is conducted amidst the surging body of scholarly publications.
△ Less
Submitted 14 January, 2024;
originally announced February 2024.
-
Social Recommendation through Heterogeneous Graph Modeling of the Long-term and Short-term Preference Defined by Dynamic Periods
Authors:
Behafarid Mohammad Jafari,
Xiao Luo,
Ali Jafari
Abstract:
Social recommendations have been widely adopted in substantial domains. Recently, graph neural networks (GNN) have been employed in recommender systems due to their success in graph representation learning. However, dealing with the dynamic property of social network data is a challenge. This research presents a novel method that provides social recommendations by incorporating the dynamic propert…
▽ More
Social recommendations have been widely adopted in substantial domains. Recently, graph neural networks (GNN) have been employed in recommender systems due to their success in graph representation learning. However, dealing with the dynamic property of social network data is a challenge. This research presents a novel method that provides social recommendations by incorporating the dynamic property of social network data in a heterogeneous graph. The model aims to capture user preference over time without going through the complexities of a dynamic graph by adding period nodes to define users' long-term and short-term preferences and aggregating assigned edge weights. The model is applied to real-world data to argue its superior performance. Promising results demonstrate the effectiveness of this model.
△ Less
Submitted 21 December, 2023;
originally announced December 2023.
-
Towards a Unified Naming Scheme for Thermo-Active Soft Actuators: A Review of Materials, Working Principles, and Applications
Authors:
Trevor Exley,
Emilly Hays,
Daniel Johnson,
Arian Moridani,
Ramya Motati,
Amir Jafari
Abstract:
Soft robotics is a rapidly growing field that spans the fields of chemistry, materials science, and engineering. Due to the diverse background of the field, there have been contrasting naming schemes such as 'intelligent', 'smart' and 'adaptive' materials which add vagueness to the broad innovation among literature. Therefore, a clear, functional and descriptive naming scheme is proposed in which…
▽ More
Soft robotics is a rapidly growing field that spans the fields of chemistry, materials science, and engineering. Due to the diverse background of the field, there have been contrasting naming schemes such as 'intelligent', 'smart' and 'adaptive' materials which add vagueness to the broad innovation among literature. Therefore, a clear, functional and descriptive naming scheme is proposed in which a previously vague name -- Soft Material for Soft Actuators -- can remain clear and concise -- Phase-Change Elastomers for Artificial Muscles. By synthesizing the working principle, material, and application into a naming scheme, the searchability of soft robotics can be enhanced and applied to other fields. The field of thermo-active soft actuators spans multiple domains and requires added clarity. Thermo-active actuators have potential for a variety of applications spanning virtual reality haptics to assistive devices. This review offers a comprehensive guide to selecting the type of thermo-active actuator when one has an application in mind. Additionally, it discusses future directions and improvements that are necessary for implementation.
△ Less
Submitted 11 December, 2023;
originally announced December 2023.
-
Dependency Practices for Vulnerability Mitigation
Authors:
Abbas Javan Jafari,
Diego Elias Costa,
Ahmad Abdellatif,
Emad Shihab
Abstract:
Relying on dependency packages accelerates software development, but it also increases the exposure to security vulnerabilities that may be present in dependencies. While developers have full control over which dependency packages (and which version) they use, they have no control over the dependencies of their dependencies. Such transitive dependencies, which often amount to a greater number than…
▽ More
Relying on dependency packages accelerates software development, but it also increases the exposure to security vulnerabilities that may be present in dependencies. While developers have full control over which dependency packages (and which version) they use, they have no control over the dependencies of their dependencies. Such transitive dependencies, which often amount to a greater number than direct dependencies, can become infected with vulnerabilities and put software projects at risk. To mitigate this risk, Practitioners need to select dependencies that respond quickly to vulnerabilities to prevent the propagation of vulnerable code to their project. To identify such dependencies, we analyze more than 450 vulnerabilities in the npm ecosystem to understand why dependent packages remain vulnerable. We identify over 200,000 npm packages that are infected through their dependencies and use 9 features to build a prediction model that identifies packages that quickly adopt the vulnerability fix and prevent further propagation of vulnerabilities. We also study the relationship between these features and the response speed of vulnerable packages. We complement our work with a practitioner survey to understand the applicability of our findings. Developers can incorporate our findings into their dependency management practices to mitigate the impact of vulnerabilities from their dependency supply chain.
△ Less
Submitted 11 October, 2023;
originally announced October 2023.
-
HAGRID: A Human-LLM Collaborative Dataset for Generative Information-Seeking with Attribution
Authors:
Ehsan Kamalloo,
Aref Jafari,
Xinyu Zhang,
Nandan Thakur,
Jimmy Lin
Abstract:
The rise of large language models (LLMs) had a transformative impact on search, ushering in a new era of search engines that are capable of generating search results in natural language text, imbued with citations for supporting sources. Building generative information-seeking models demands openly accessible datasets, which currently remain lacking. In this paper, we introduce a new dataset, HAGR…
▽ More
The rise of large language models (LLMs) had a transformative impact on search, ushering in a new era of search engines that are capable of generating search results in natural language text, imbued with citations for supporting sources. Building generative information-seeking models demands openly accessible datasets, which currently remain lacking. In this paper, we introduce a new dataset, HAGRID (Human-in-the-loop Attributable Generative Retrieval for Information-seeking Dataset) for building end-to-end generative information-seeking models that are capable of retrieving candidate quotes and generating attributed explanations. Unlike recent efforts that focus on human evaluation of black-box proprietary search engines, we built our dataset atop the English subset of MIRACL, a publicly available information retrieval dataset. HAGRID is constructed based on human and LLM collaboration. We first automatically collect attributed explanations that follow an in-context citation style using an LLM, i.e. GPT-3.5. Next, we ask human annotators to evaluate the LLM explanations based on two criteria: informativeness and attributability. HAGRID serves as a catalyst for the development of information-seeking models with better attribution capabilities.
△ Less
Submitted 31 July, 2023;
originally announced July 2023.
-
Dependency Update Strategies and Package Characteristics
Authors:
Abbas Javan Jafari,
Diego Elias Costa,
Emad Shihab,
Rabe Abdalkareem
Abstract:
Managing project dependencies is a key maintenance issue in software development. Developers need to choose an update strategy that allows them to receive important updates and fixes while protecting them from breaking changes. Semantic Versioning was proposed to address this dilemma but many have opted for more restrictive or permissive alternatives. This empirical study explores the association…
▽ More
Managing project dependencies is a key maintenance issue in software development. Developers need to choose an update strategy that allows them to receive important updates and fixes while protecting them from breaking changes. Semantic Versioning was proposed to address this dilemma but many have opted for more restrictive or permissive alternatives. This empirical study explores the association between package characteristics and the dependency update strategy selected by its dependents to understand how developers select and change their update strategies. We study over 112,000 npm packages and use 19 characteristics to build a prediction model that identifies the common dependency update strategy for each package. Our model achieves a minimum improvement of 72% over the baselines and is much better aligned with community decisions than the npm default strategy. We investigate how different package characteristics can influence the predicted update strategy and find that dependent count, age and release status to be the highest influencing features. We complement the work with qualitative analyses of 160 packages to investigate the evolution of update strategies. While the common update strategy remains consistent for many packages, certain events such as the release of the 1.0.0 version or breaking changes influence the selected update strategy over time.
△ Less
Submitted 24 May, 2023;
originally announced May 2023.
-
GENIE-NF-AI: Identifying Neurofibromatosis Tumors using Liquid Neural Network (LTC) trained on AACR GENIE Datasets
Authors:
Michael Bidollahkhani,
Ferhat Atasoy,
Elnaz Abedini,
Ali Davar,
Omid Hamza,
Fırat Sefaoğlu,
Amin Jafari,
Muhammed Nadir Yalçın,
Hamdan Abdellatef
Abstract:
In recent years, the field of medicine has been increasingly adopting artificial intelligence (AI) technologies to provide faster and more accurate disease detection, prediction, and assessment. In this study, we propose an interpretable AI approach to diagnose patients with neurofibromatosis using blood tests and pathogenic variables. We evaluated the proposed method using a dataset from the AACR…
▽ More
In recent years, the field of medicine has been increasingly adopting artificial intelligence (AI) technologies to provide faster and more accurate disease detection, prediction, and assessment. In this study, we propose an interpretable AI approach to diagnose patients with neurofibromatosis using blood tests and pathogenic variables. We evaluated the proposed method using a dataset from the AACR GENIE project and compared its performance with modern approaches. Our proposed approach outperformed existing models with 99.86% accuracy. We also conducted NF1 and interpretable AI tests to validate our approach. Our work provides an explainable approach model using logistic regression and explanatory stimulus as well as a black-box model. The explainable models help to explain the predictions of black-box models while the glass-box models provide information about the best-fit features. Overall, our study presents an interpretable AI approach for diagnosing patients with neurofibromatosis and demonstrates the potential of AI in the medical field.
△ Less
Submitted 26 April, 2023;
originally announced April 2023.
-
Improved knowledge distillation by utilizing backward pass knowledge in neural networks
Authors:
Aref Jafari,
Mehdi Rezagholizadeh,
Ali Ghodsi
Abstract:
Knowledge distillation (KD) is one of the prominent techniques for model compression. In this method, the knowledge of a large network (teacher) is distilled into a model (student) with usually significantly fewer parameters. KD tries to better-match the output of the student model to that of the teacher model based on the knowledge extracts from the forward pass of the teacher network. Although c…
▽ More
Knowledge distillation (KD) is one of the prominent techniques for model compression. In this method, the knowledge of a large network (teacher) is distilled into a model (student) with usually significantly fewer parameters. KD tries to better-match the output of the student model to that of the teacher model based on the knowledge extracts from the forward pass of the teacher network. Although conventional KD is effective for matching the two networks over the given data points, there is no guarantee that these models would match in other areas for which we do not have enough training samples. In this work, we address that problem by generating new auxiliary training samples based on extracting knowledge from the backward pass of the teacher in the areas where the student diverges greatly from the teacher. We compute the difference between the teacher and the student and generate new data samples that maximize the divergence. This is done by perturbing data samples in the direction of the gradient of the difference between the student and the teacher. Augmenting the training set by adding this auxiliary improves the performance of KD significantly and leads to a closer match between the student and the teacher. Using this approach, when data samples come from a discrete domain, such as applications of natural language processing (NLP) and language understanding, is not trivial. However, we show how this technique can be used successfully in such applications. We evaluated the performance of our method on various tasks in computer vision and NLP domains and got promising results.
△ Less
Submitted 27 January, 2023;
originally announced January 2023.
-
Risk-aware Vehicle Motion Planning Using Bayesian LSTM-Based Model Predictive Control
Authors:
Yufei Huang,
Mohsen A. Jafari
Abstract:
Understanding the probabilistic traffic environment is a vital challenge for the motion planning of autonomous vehicles. To make feasible control decisions, forecasting future trajectories of adjacent cars is essential for intelligent vehicles to assess potential conflicts and react to reduce the risk. This paper first introduces a Bayesian Long Short-term Memory (BLSTM) model to learn human drive…
▽ More
Understanding the probabilistic traffic environment is a vital challenge for the motion planning of autonomous vehicles. To make feasible control decisions, forecasting future trajectories of adjacent cars is essential for intelligent vehicles to assess potential conflicts and react to reduce the risk. This paper first introduces a Bayesian Long Short-term Memory (BLSTM) model to learn human drivers' behaviors and habits from their historical trajectory data. The model predicts the probability distribution of surrounding vehicles' positions, which are used to estimate dynamic conflict risks. Next, a hybrid automaton is built to model the basic motions of a car, and the conflict risks are assessed for real-time state-space transitions based on environmental information. Finally, a BLSTM-based Model Predictive Control (MPC) is built to navigate vehicles through safe paths with the least predicted conflict risk. By merging BLSTM with MPC, the designed neural-based MPC overcomes the defect that traditional MPC is hard to model uncertain conflict risks. The simulation results show that our proposed BLSTM-based MPC performs better than human drivers because it can foresee potential conflicts and take action to avoid them.
△ Less
Submitted 15 January, 2023;
originally announced January 2023.
-
Continuation KD: Improved Knowledge Distillation through the Lens of Continuation Optimization
Authors:
Aref Jafari,
Ivan Kobyzev,
Mehdi Rezagholizadeh,
Pascal Poupart,
Ali Ghodsi
Abstract:
Knowledge Distillation (KD) has been extensively used for natural language understanding (NLU) tasks to improve a small model's (a student) generalization by transferring the knowledge from a larger model (a teacher). Although KD methods achieve state-of-the-art performance in numerous settings, they suffer from several problems limiting their performance. It is shown in the literature that the ca…
▽ More
Knowledge Distillation (KD) has been extensively used for natural language understanding (NLU) tasks to improve a small model's (a student) generalization by transferring the knowledge from a larger model (a teacher). Although KD methods achieve state-of-the-art performance in numerous settings, they suffer from several problems limiting their performance. It is shown in the literature that the capacity gap between the teacher and the student networks can make KD ineffective. Additionally, existing KD techniques do not mitigate the noise in the teacher's output: modeling the noisy behaviour of the teacher can distract the student from learning more useful features. We propose a new KD method that addresses these problems and facilitates the training compared to previous techniques. Inspired by continuation optimization, we design a training procedure that optimizes the highly non-convex KD objective by starting with the smoothed version of this objective and making it more complex as the training proceeds. Our method (Continuation-KD) achieves state-of-the-art performance across various compact architectures on NLU (GLUE benchmark) and computer vision tasks (CIFAR-10 and CIFAR-100).
△ Less
Submitted 12 December, 2022;
originally announced December 2022.
-
NETpred: Network-based modeling and prediction of multiple connected market indices
Authors:
Alireza Jafari,
Saman Haratizadeh
Abstract:
Market prediction plays a major role in supporting financial decisions. An emerging approach in this domain is to use graphical modeling and analysis to for prediction of next market index fluctuations. One important question in this domain is how to construct an appropriate graphical model of the data that can be effectively used by a semi-supervised GNN to predict index fluctuations. In this pap…
▽ More
Market prediction plays a major role in supporting financial decisions. An emerging approach in this domain is to use graphical modeling and analysis to for prediction of next market index fluctuations. One important question in this domain is how to construct an appropriate graphical model of the data that can be effectively used by a semi-supervised GNN to predict index fluctuations. In this paper, we introduce a framework called NETpred that generates a novel heterogeneous graph representing multiple related indices and their stocks by using several stock-stock and stock-index relation measures. It then thoroughly selects a diverse set of representative nodes that cover different parts of the state space and whose price movements are accurately predictable. By assigning initial predicted labels to such a set of nodes, NETpred makes sure that the subsequent GCN model can be successfully trained using a semi-supervised learning process. The resulting model is then used to predict the stock labels which are finally aggregated to infer the labels for all the index nodes in the graph. Our comprehensive set of experiments shows that NETpred improves the performance of the state-of-the-art baselines by 3%-5% in terms of F-score measure on different well-known data sets.
△ Less
Submitted 2 December, 2022;
originally announced December 2022.
-
A Deep Learning Anomaly Detection Method in Textual Data
Authors:
Amir Jafari
Abstract:
In this article, we propose using deep learning and transformer architectures combined with classical machine learning algorithms to detect and identify text anomalies in texts. Deep learning model provides a very crucial context information about the textual data which all textual context are converted to a numerical representation. We used multiple machine learning methods such as Sentence Trans…
▽ More
In this article, we propose using deep learning and transformer architectures combined with classical machine learning algorithms to detect and identify text anomalies in texts. Deep learning model provides a very crucial context information about the textual data which all textual context are converted to a numerical representation. We used multiple machine learning methods such as Sentence Transformers, Auto Encoders, Logistic Regression and Distance calculation methods to predict anomalies. The method are tested on the texts data and we used syntactic data from different source injected into the original text as anomalies or use them as target. Different methods and algorithm are explained in the field of outlier detection and the results of the best technique is presented. These results suggest that our algorithm could potentially reduce false positive rates compared with other anomaly detection methods that we are testing.
△ Less
Submitted 25 November, 2022;
originally announced November 2022.
-
Comparison Study Between Token Classification and Sequence Classification In Text Classification
Authors:
Amir Jafari
Abstract:
Unsupervised Machine Learning techniques have been applied to Natural Language Processing tasks and surpasses the benchmarks such as GLUE with great success. Building language models approach achieves good results in one language and it can be applied to multiple NLP task such as classification, summarization, generation and etc as an out of box model. Among all the of the classical approaches use…
▽ More
Unsupervised Machine Learning techniques have been applied to Natural Language Processing tasks and surpasses the benchmarks such as GLUE with great success. Building language models approach achieves good results in one language and it can be applied to multiple NLP task such as classification, summarization, generation and etc as an out of box model. Among all the of the classical approaches used in NLP, the masked language modeling is the most used. In general, the only requirement to build a language model is presence of the large corpus of textual data. Text classification engines uses a variety of models from classical and state of art transformer models to classify texts for in order to save costs. Sequence Classifiers are mostly used in the domain of text classification. However Token classifiers also are viable candidate models as well. Sequence Classifiers and Token Classifier both tend to improve the classification predictions due to the capturing the context information differently. This work aims to compare the performance of Sequence Classifier and Token Classifiers and evaluate each model on the same set of data. In this work, we are using a pre-trained model as the base model and Token Classifier and Sequence Classier heads results of these two scoring paradigms with be compared..
△ Less
Submitted 25 November, 2022;
originally announced November 2022.
-
Do we need Label Regularization to Fine-tune Pre-trained Language Models?
Authors:
Ivan Kobyzev,
Aref Jafari,
Mehdi Rezagholizadeh,
Tianda Li,
Alan Do-Omri,
Peng Lu,
Pascal Poupart,
Ali Ghodsi
Abstract:
Knowledge Distillation (KD) is a prominent neural model compression technique that heavily relies on teacher network predictions to guide the training of a student model. Considering the ever-growing size of pre-trained language models (PLMs), KD is often adopted in many NLP tasks involving PLMs. However, it is evident that in KD, deploying the teacher network during training adds to the memory an…
▽ More
Knowledge Distillation (KD) is a prominent neural model compression technique that heavily relies on teacher network predictions to guide the training of a student model. Considering the ever-growing size of pre-trained language models (PLMs), KD is often adopted in many NLP tasks involving PLMs. However, it is evident that in KD, deploying the teacher network during training adds to the memory and computational requirements of training. In the computer vision literature, the necessity of the teacher network is put under scrutiny by showing that KD is a label regularization technique that can be replaced with lighter teacher-free variants such as the label-smoothing technique. However, to the best of our knowledge, this issue is not investigated in NLP. Therefore, this work concerns studying different label regularization techniques and whether we actually need them to improve the fine-tuning of smaller PLM networks on downstream tasks. In this regard, we did a comprehensive set of experiments on different PLMs such as BERT, RoBERTa, and GPT with more than 600 distinct trials and ran each configuration five times. This investigation led to a surprising observation that KD and other label regularization techniques do not play any meaningful role over regular fine-tuning when the student model is pre-trained. We further explore this phenomenon in different settings of NLP and computer vision tasks and demonstrate that pre-training itself acts as a kind of regularization, and additional label regularization is unnecessary.
△ Less
Submitted 12 April, 2023; v1 submitted 24 May, 2022;
originally announced May 2022.
-
GCNET: graph-based prediction of stock price movement using graph convolutional network
Authors:
Alireza Jafari,
Saman Haratizadeh
Abstract:
The importance of considering related stocks data for the prediction of stock price movement has been shown in many studies, however, advanced graphical techniques for modeling, embedding and analyzing the behavior of interrelated stocks have not been widely exploited for the prediction of stocks price movements yet. The main challenges in this domain are to find a way for modeling the existing re…
▽ More
The importance of considering related stocks data for the prediction of stock price movement has been shown in many studies, however, advanced graphical techniques for modeling, embedding and analyzing the behavior of interrelated stocks have not been widely exploited for the prediction of stocks price movements yet. The main challenges in this domain are to find a way for modeling the existing relations among an arbitrary set of stocks and to exploit such a model for improving the prediction performance for those stocks. The most of existing methods in this domain rely on basic graph-analysis techniques, with limited prediction power, and suffer from a lack of generality and flexibility. In this paper, we introduce a novel framework, called GCNET that models the relations among an arbitrary set of stocks as a graph structure called influence network and uses a set of history-based prediction models to infer plausible initial labels for a subset of the stock nodes in the graph. Finally, GCNET uses the Graph Convolutional Network algorithm to analyze this partially labeled graph and predicts the next price direction of movement for each stock in the graph. GCNET is a general prediction framework that can be applied for the prediction of the price fluctuations of interacting stocks based on their historical data. Our experiments and evaluations on a set of stocks from the NASDAQ index demonstrate that GCNET significantly improves the performance of SOTA in terms of accuracy and MCC measures.
△ Less
Submitted 31 August, 2022; v1 submitted 19 February, 2022;
originally announced March 2022.
-
Activity-based and agent-based Transport model of Melbourne (AToM): an open multi-modal transport simulation model for Greater Melbourne
Authors:
Afshin Jafari,
Dhirendra Singh,
Alan Both,
Mahsa Abdollahyar,
Lucy Gunn,
Steve Pemberton,
Billie Giles-Corti
Abstract:
Agent-based and activity-based models for simulating transportation systems have attracted significant attention in recent years. Few studies, however, include a detailed representation of active modes of transportation - such as walking and cycling - at a city-wide level, where dominating motorised modes are often of primary concern. This paper presents an open workflow for creating a multi-modal…
▽ More
Agent-based and activity-based models for simulating transportation systems have attracted significant attention in recent years. Few studies, however, include a detailed representation of active modes of transportation - such as walking and cycling - at a city-wide level, where dominating motorised modes are often of primary concern. This paper presents an open workflow for creating a multi-modal agent-based and activity-based transport simulation model, focusing on Greater Melbourne, and including the process of mode choice calibration for the four main travel modes of driving, public transport, cycling and walking. The synthetic population generated and used as an input for the simulation model represented Melbourne's population based on Census 2016, with daily activities and trips based on the Victoria's 2016-18 travel survey data. The road network used in the simulation model includes all public roads accessible via the included travel modes. We compared the output of the simulation model with observations from the real world in terms of mode share, road volume, travel time, and travel distance. Through these comparisons, we showed that our model is suitable for studying mode choice and road usage behaviour of travellers.
△ Less
Submitted 15 December, 2021;
originally announced December 2021.
-
An eXtended Finite Element Method Implementation in COMSOL Multiphysics: Thermo-Hydro-Mechanical Modeling of Fluid Flow in Discontinuous Porous Media
Authors:
Ahmad Jafari,
Mohammad Vahab,
Pooyan Broumand,
Nasser Khalili
Abstract:
This paper presents the implementation of the eXtended Finite Element Method (XFEM) in the general-purpose commercial software package COMSOL Multiphysics for multi-field thermo-hydro-mechanical problems in discontinuous porous media. To this end, an exclusive enrichment strategy is proposed in compliance with the COMSOL modeling structure. COMSOL modules and physics interfaces are adopted to take…
▽ More
This paper presents the implementation of the eXtended Finite Element Method (XFEM) in the general-purpose commercial software package COMSOL Multiphysics for multi-field thermo-hydro-mechanical problems in discontinuous porous media. To this end, an exclusive enrichment strategy is proposed in compliance with the COMSOL modeling structure. COMSOL modules and physics interfaces are adopted to take account of the relevant physical processes involved in thermo-hydro-mechanical coupling analysis, namely: the mechanical deformation, fluid flow in porous media and heat transfer. Essential changes are made to the internal variables of the physics interfaces to ensure consistency in the evaluation of enriched solution fields. The model preprocessing, level-set updates, coupling of the relevant physics and postprocessing procedures are performed adopting a coherent utilization of the COMSOL built-in features along with the COMSOL LiveLink for MATLAB functions. The implementation process, remedies for the treatment of the enriched zones, XFEM framework setup, multiphysics coupling, numerical integration and numerical solution strategy are described in detail. The capabilities and performance of the proposed approach are investigated by examining several multi-field thermo-hydro-mechanical simulations involving single/multiple discontinuities in 2D/3D porous rock settings.
△ Less
Submitted 17 December, 2021;
originally announced December 2021.
-
An Activity-Based Model of Transport Demand for Greater Melbourne
Authors:
Alan Both,
Dhirendra Singh,
Afshin Jafari,
Billie Giles-Corti,
Lucy Gunn
Abstract:
In this paper, we present an algorithm for creating a synthetic population for the Greater Melbourne area using a combination of machine learning, probabilistic, and gravity-based approaches. We combine these techniques in a hybrid model with three primary innovations: 1. when assigning activity patterns, we generate individual activity chains for every agent, tailored to their cohort; 2. when sel…
▽ More
In this paper, we present an algorithm for creating a synthetic population for the Greater Melbourne area using a combination of machine learning, probabilistic, and gravity-based approaches. We combine these techniques in a hybrid model with three primary innovations: 1. when assigning activity patterns, we generate individual activity chains for every agent, tailored to their cohort; 2. when selecting destinations, we aim to strike a balance between the distance-decay of trip lengths and the activity-based attraction of destination locations; and 3. we take into account the number of trips remaining for an agent so as to ensure they do not select a destination that would be unreasonable to return home from. Our method is completely open and replicable, requiring only publicly available data to generate a synthetic population of agents compatible with commonly used agent-based modeling software such as MATSim. The synthetic population was found to be accurate in terms of distance distribution, mode choice, and destination choice for a variety of population sizes.
△ Less
Submitted 19 November, 2021;
originally announced November 2021.
-
Airport Taxi Time Prediction and Alerting: A Convolutional Neural Network Approach
Authors:
Erik Vargo,
Alex Tien,
Arian Jafari
Abstract:
This paper proposes a novel approach to predict and determine whether the average taxi- out time at an airport will exceed a pre-defined threshold within the next hour of operations. Prior work in this domain has focused exclusively on predicting taxi-out times on a flight-by-flight basis, which requires significant efforts and data on modeling taxiing activities from gates to runways. Learning di…
▽ More
This paper proposes a novel approach to predict and determine whether the average taxi- out time at an airport will exceed a pre-defined threshold within the next hour of operations. Prior work in this domain has focused exclusively on predicting taxi-out times on a flight-by-flight basis, which requires significant efforts and data on modeling taxiing activities from gates to runways. Learning directly from surface radar information with minimal processing, a computer vision-based model is proposed that incorporates airport surface data in such a way that adaptation-specific information (e.g., runway configuration, the state of aircraft in the taxiing process) is inferred implicitly and automatically by Artificial Intelligence (AI).
△ Less
Submitted 17 November, 2021;
originally announced November 2021.
-
Pro-KD: Progressive Distillation by Following the Footsteps of the Teacher
Authors:
Mehdi Rezagholizadeh,
Aref Jafari,
Puneeth Salad,
Pranav Sharma,
Ali Saheb Pasand,
Ali Ghodsi
Abstract:
With ever growing scale of neural models, knowledge distillation (KD) attracts more attention as a prominent tool for neural model compression. However, there are counter intuitive observations in the literature showing some challenging limitations of KD. A case in point is that the best performing checkpoint of the teacher might not necessarily be the best teacher for training the student in KD.…
▽ More
With ever growing scale of neural models, knowledge distillation (KD) attracts more attention as a prominent tool for neural model compression. However, there are counter intuitive observations in the literature showing some challenging limitations of KD. A case in point is that the best performing checkpoint of the teacher might not necessarily be the best teacher for training the student in KD. Therefore, one important question would be how to find the best checkpoint of the teacher for distillation? Searching through the checkpoints of the teacher would be a very tedious and computationally expensive process, which we refer to as the \textit{checkpoint-search problem}. Moreover, another observation is that larger teachers might not necessarily be better teachers in KD which is referred to as the \textit{capacity-gap} problem. To address these challenging problems, in this work, we introduce our progressive knowledge distillation (Pro-KD) technique which defines a smoother training path for the student by following the training footprints of the teacher instead of solely relying on distilling from a single mature fully-trained teacher. We demonstrate that our technique is quite effective in mitigating the capacity-gap problem and the checkpoint search problem. We evaluate our technique using a comprehensive set of experiments on different tasks such as image classification (CIFAR-10 and CIFAR-100), natural language understanding tasks of the GLUE benchmark, and question answering (SQuAD 1.1 and 2.0) using BERT-based models and consistently got superior results over state-of-the-art techniques.
△ Less
Submitted 16 October, 2021;
originally announced October 2021.
-
Transfer Learning for Multi-lingual Tasks -- a Survey
Authors:
Amir Reza Jafari,
Behnam Heidary,
Reza Farahbakhsh,
Mostafa Salehi,
Mahdi Jalili
Abstract:
These days different platforms such as social media provide their clients from different backgrounds and languages the possibility to connect and exchange information. It is not surprising anymore to see comments from different languages in posts published by international celebrities or data providers. In this era, understanding cross languages content and multilingualism in natural language proc…
▽ More
These days different platforms such as social media provide their clients from different backgrounds and languages the possibility to connect and exchange information. It is not surprising anymore to see comments from different languages in posts published by international celebrities or data providers. In this era, understanding cross languages content and multilingualism in natural language processing (NLP) are hot topics, and multiple efforts have tried to leverage existing technologies in NLP to tackle this challenging research problem. In this survey, we provide a comprehensive overview of the existing literature with a focus on transfer learning techniques in multilingual tasks. We also identify potential opportunities for further research in this domain.
△ Less
Submitted 28 August, 2021;
originally announced October 2021.
-
How to Select One Among All? An Extensive Empirical Study Towards the Robustness of Knowledge Distillation in Natural Language Understanding
Authors:
Tianda Li,
Ahmad Rashid,
Aref Jafari,
Pranav Sharma,
Ali Ghodsi,
Mehdi Rezagholizadeh
Abstract:
Knowledge Distillation (KD) is a model compression algorithm that helps transfer the knowledge of a large neural network into a smaller one. Even though KD has shown promise on a wide range of Natural Language Processing (NLP) applications, little is understood about how one KD algorithm compares to another and whether these approaches can be complimentary to each other. In this work, we evaluate…
▽ More
Knowledge Distillation (KD) is a model compression algorithm that helps transfer the knowledge of a large neural network into a smaller one. Even though KD has shown promise on a wide range of Natural Language Processing (NLP) applications, little is understood about how one KD algorithm compares to another and whether these approaches can be complimentary to each other. In this work, we evaluate various KD algorithms on in-domain, out-of-domain and adversarial testing. We propose a framework to assess the adversarial robustness of multiple KD algorithms. Moreover, we introduce a new KD algorithm, Combined-KD, which takes advantage of two promising approaches (better training scheme and more efficient data augmentation). Our extensive experimental results show that Combined-KD achieves state-of-the-art results on the GLUE benchmark, out-of-domain generalization, and adversarial robustness compared to competitive methods.
△ Less
Submitted 20 September, 2021; v1 submitted 13 September, 2021;
originally announced September 2021.
-
An eXtended Finite Element Method Implementation in COMSOL Multiphysics: Solid Mechanics
Authors:
Ahmad Jafari,
Pooyan Broumand,
Mohammad Vahab,
Nasser Khalili
Abstract:
This paper presents the first time implementation of the eXtended Finite Element Method (XFEM) in the general purpose commercial software COMSOL Multiphysics. An enrichment strategy is proposed, consistent with the structure of the software. To this end, for each set of enrichment functions, an additional Solid Mechanics module is incorporated into the numerical framework, coupled with compatible…
▽ More
This paper presents the first time implementation of the eXtended Finite Element Method (XFEM) in the general purpose commercial software COMSOL Multiphysics. An enrichment strategy is proposed, consistent with the structure of the software. To this end, for each set of enrichment functions, an additional Solid Mechanics module is incorporated into the numerical framework, coupled with compatible modifications to the internal variables. The Linear Elastic Fracture Mechanics (LEFM) is exclusively adopted for the crack analysis. The model pre-processing, level set update, stress intensity factor calculation and crack propagation analysis are conducted by employing COMSOL's built-in features in conjunction with external MATLAB functions through COMSOL LiveLink. All implementational aspects and suggested remedies for the treatment of enriched elements, framework setup, evaluation of stress intensity factors, and numerical integration are described in detail. The accuracy and robustness of the proposed method are examined by several numerical examples for stationary and propagating crack problems in 2D and 3D settings. The results represent excellent agreement with available analytical, numerical and experimental observations in the literature.
Keywords:XFEM; COMSOL Multiphysics; Crack analysis; Fracture propagation
△ Less
Submitted 3 September, 2021;
originally announced September 2021.
-
The effects of data size on Automated Essay Scoring engines
Authors:
Christopher Ormerod,
Amir Jafari,
Susan Lottridge,
Milan Patel,
Amy Harris,
Paul van Wamelen
Abstract:
We study the effects of data size and quality on the performance on Automated Essay Scoring (AES) engines that are designed in accordance with three different paradigms; A frequency and hand-crafted feature-based model, a recurrent neural network model, and a pretrained transformer-based language model that is fine-tuned for classification. We expect that each type of model benefits from the size…
▽ More
We study the effects of data size and quality on the performance on Automated Essay Scoring (AES) engines that are designed in accordance with three different paradigms; A frequency and hand-crafted feature-based model, a recurrent neural network model, and a pretrained transformer-based language model that is fine-tuned for classification. We expect that each type of model benefits from the size and the quality of the training data in very different ways. Standard practices for developing training data for AES engines were established with feature-based methods in mind, however, since neural networks are increasingly being considered in a production setting, this work seeks to inform us as to how to establish better training data for neural networks that will be used in production.
△ Less
Submitted 30 August, 2021;
originally announced August 2021.
-
Annealing Knowledge Distillation
Authors:
Aref Jafari,
Mehdi Rezagholizadeh,
Pranav Sharma,
Ali Ghodsi
Abstract:
Significant memory and computational requirements of large deep neural networks restrict their application on edge devices. Knowledge distillation (KD) is a prominent model compression technique for deep neural networks in which the knowledge of a trained large teacher model is transferred to a smaller student model. The success of knowledge distillation is mainly attributed to its training object…
▽ More
Significant memory and computational requirements of large deep neural networks restrict their application on edge devices. Knowledge distillation (KD) is a prominent model compression technique for deep neural networks in which the knowledge of a trained large teacher model is transferred to a smaller student model. The success of knowledge distillation is mainly attributed to its training objective function, which exploits the soft-target information (also known as "dark knowledge") besides the given regular hard labels in a training set. However, it is shown in the literature that the larger the gap between the teacher and the student networks, the more difficult is their training using knowledge distillation. To address this shortcoming, we propose an improved knowledge distillation method (called Annealing-KD) by feeding the rich information provided by the teacher's soft-targets incrementally and more efficiently. Our Annealing-KD technique is based on a gradual transition over annealed soft-targets generated by the teacher at different temperatures in an iterative process, and therefore, the student is trained to follow the annealed teacher output in a step-by-step manner. This paper includes theoretical and empirical evidence as well as practical experiments to support the effectiveness of our Annealing-KD method. We did a comprehensive set of experiments on different tasks such as image classification (CIFAR-10 and 100) and NLP language inference with BERT-based models on the GLUE benchmark and consistently got superior results.
△ Less
Submitted 14 April, 2021;
originally announced April 2021.
-
Automated essay scoring using efficient transformer-based language models
Authors:
Christopher M Ormerod,
Akanksha Malhotra,
Amir Jafari
Abstract:
Automated Essay Scoring (AES) is a cross-disciplinary effort involving Education, Linguistics, and Natural Language Processing (NLP). The efficacy of an NLP model in AES tests it ability to evaluate long-term dependencies and extrapolate meaning even when text is poorly written. Large pretrained transformer-based language models have dominated the current state-of-the-art in many NLP tasks, howeve…
▽ More
Automated Essay Scoring (AES) is a cross-disciplinary effort involving Education, Linguistics, and Natural Language Processing (NLP). The efficacy of an NLP model in AES tests it ability to evaluate long-term dependencies and extrapolate meaning even when text is poorly written. Large pretrained transformer-based language models have dominated the current state-of-the-art in many NLP tasks, however, the computational requirements of these models make them expensive to deploy in practice. The goal of this paper is to challenge the paradigm in NLP that bigger is better when it comes to AES. To do this, we evaluate the performance of several fine-tuned pretrained NLP models with a modest number of parameters on an AES dataset. By ensembling our models, we achieve excellent results with fewer parameters than most pretrained transformer-based models.
△ Less
Submitted 25 February, 2021;
originally announced February 2021.
-
Enhanced Balancing GAN: Minority-class Image Generation
Authors:
Gaofeng Huang,
Amir H. Jafari
Abstract:
Generative adversarial networks (GANs) are one of the most powerful generative models, but always require a large and balanced dataset to train. Traditional GANs are not applicable to generate minority-class images in a highly imbalanced dataset. Balancing GAN (BAGAN) is proposed to mitigate this problem, but it is unstable when images in different classes look similar, e.g. flowers and cells. In…
▽ More
Generative adversarial networks (GANs) are one of the most powerful generative models, but always require a large and balanced dataset to train. Traditional GANs are not applicable to generate minority-class images in a highly imbalanced dataset. Balancing GAN (BAGAN) is proposed to mitigate this problem, but it is unstable when images in different classes look similar, e.g. flowers and cells. In this work, we propose a supervised autoencoder with an intermediate embedding model to disperse the labeled latent vectors. With the improved autoencoder initialization, we also build an architecture of BAGAN with gradient penalty (BAGAN-GP). Our proposed model overcomes the unstable issue in original BAGAN and converges faster to high quality generations. Our model achieves high performance on the imbalanced scale-down version of MNIST Fashion, CIFAR-10, and one small-scale medical image dataset.
△ Less
Submitted 31 October, 2020;
originally announced November 2020.
-
Dependency Smells in JavaScript Projects
Authors:
Abbas Javan Jafari,
Diego Elias Costa,
Rabe Abdalkareem,
Emad Shihab,
Nikolaos Tsantalis
Abstract:
Dependency management in modern software development poses many challenges for developers who wish to stay up to date with the latest features and fixes whilst ensuring backwards compatibility. Project maintainers have opted for varied, and sometimes conflicting, approaches for maintaining their dependencies. Opting for unsuitable approaches can introduce bugs and vulnerabilities into the project,…
▽ More
Dependency management in modern software development poses many challenges for developers who wish to stay up to date with the latest features and fixes whilst ensuring backwards compatibility. Project maintainers have opted for varied, and sometimes conflicting, approaches for maintaining their dependencies. Opting for unsuitable approaches can introduce bugs and vulnerabilities into the project, introduce breaking changes, cause extraneous installations, and reduce dependency understandability, making it harder for others to contribute effectively. In this paper, we empirically examine evidence of recurring dependency management issues (dependency smells). We look at the commit data for a dataset of 1,146 active JavaScript repositories to catalog, quantify and understand dependency smells. Through a series of surveys with practitioners, we identify and quantify seven dependency smells with varying degrees of popularity and investigate why they are introduced throughout project history. Our findings indicate that dependency smells are prevalent in JavaScript projects with two or more distinct smells appearing in 80% of the projects, but they generally infect a minority of a project's dependencies. Our observations show that the number of dependency smells tend to increase over time. Practitioners agree that dependency smells bring about many problems including security threats, bugs, dependency breakage, runtime errors, and other maintenance issues. These smells are generally introduced as developers react to dependency misbehaviour and the shortcomings of the npm ecosystem.
△ Less
Submitted 18 August, 2021; v1 submitted 27 October, 2020;
originally announced October 2020.
-
Segmentation Approach for Coreference Resolution Task
Authors:
Aref Jafari,
Ali Ghodsi
Abstract:
In coreference resolution, it is important to consider all members of a coreference cluster and decide about all of them at once. This technique can help to avoid losing precision and also in finding long-distance relations. The presented paper is a report of an ongoing study on an idea which proposes a new approach for coreference resolution which can resolve all coreference mentions to a given m…
▽ More
In coreference resolution, it is important to consider all members of a coreference cluster and decide about all of them at once. This technique can help to avoid losing precision and also in finding long-distance relations. The presented paper is a report of an ongoing study on an idea which proposes a new approach for coreference resolution which can resolve all coreference mentions to a given mention in the document in one pass. This has been accomplished by defining an embedding method for the position of all members of a coreference cluster in a document and resolving all of them for a given mention. In the proposed method, the BERT model has been used for encoding the documents and a head network designed to capture the relations between the embedded tokens. These are then converted to the proposed span position embedding matrix which embeds the position of all coreference mentions in the document. We tested this idea on CoNLL 2012 dataset and although the preliminary results from this method do not quite meet the state-of-the-art results, they are promising and they can capture features like long-distance relations better than the other approaches.
△ Less
Submitted 30 June, 2020;
originally announced July 2020.
-
Language models and Automated Essay Scoring
Authors:
Pedro Uria Rodriguez,
Amir Jafari,
Christopher M. Ormerod
Abstract:
In this paper, we present a new comparative study on automatic essay scoring (AES). The current state-of-the-art natural language processing (NLP) neural network architectures are used in this work to achieve above human-level accuracy on the publicly available Kaggle AES dataset. We compare two powerful language models, BERT and XLNet, and describe all the layers and network architectures in thes…
▽ More
In this paper, we present a new comparative study on automatic essay scoring (AES). The current state-of-the-art natural language processing (NLP) neural network architectures are used in this work to achieve above human-level accuracy on the publicly available Kaggle AES dataset. We compare two powerful language models, BERT and XLNet, and describe all the layers and network architectures in these models. We elucidate the network architectures of BERT and XLNet using clear notation and diagrams and explain the advantages of transformer architectures over traditional recurrent neural network architectures. Linear algebra notation is used to clarify the functions of transformers and attention mechanisms. We compare the results with more traditional methods, such as bag of words (BOW) and long short term memory (LSTM) networks.
△ Less
Submitted 18 September, 2019;
originally announced September 2019.
-
Security Patterns: A Systematic Mapping Study
Authors:
Abbas Javan Jafari,
Abbas Rasoolzadegan
Abstract:
Security patterns are a means to encapsulate and communicate proven security solutions. They are well-established approaches for introducing security into the software development process. Our objective is to explore the research efforts on security patterns and discuss the current state of the art. This study will serve as a guideline for researchers, practitioners, and teachers interested in thi…
▽ More
Security patterns are a means to encapsulate and communicate proven security solutions. They are well-established approaches for introducing security into the software development process. Our objective is to explore the research efforts on security patterns and discuss the current state of the art. This study will serve as a guideline for researchers, practitioners, and teachers interested in this field. We have conducted a systematic mapping study of relevant literature from 1997 until the end of 2017 and identified 403 relevant papers, 274 of which were selected for analysis based on quality criteria. This study derives a customized research strategy from established systematic approaches in the literature. We have utilized an exhaustive 3-tier search strategy to ensure a high degree of completeness during the study collection and used a test set to evaluate our search. The first 3 research questions address the demographics of security pattern research such as topic classification, trends, and distribution between academia and industry, along with prominent researchers and venues. The next 9 research questions focus on more in-depth analyses such as pattern presentation notations and classification criteria, pattern evaluation techniques, and pattern usage environments. The results and discussions of this study have significant implications for researchers, practitioners, and teachers in software engineering and information security.
△ Less
Submitted 30 November, 2018;
originally announced November 2018.
-
Performance Evaluation of Spatial Complementary Code Keying Modulation in MIMO Systems
Authors:
A. H. Jafari,
T. O'Farrell
Abstract:
Spatial complementary code keying modulation (SCCKM) is proposed as a novel block coding modulation scheme. An input binary sequence is modulated based on the different lengths of complementary code keying (CCK) modulation and then spread across the transmit antennas (spatial domain) in a multiple input multiple output (MIMO) system exploiting orthogonal frequency division multiplexing (OFDM). At…
▽ More
Spatial complementary code keying modulation (SCCKM) is proposed as a novel block coding modulation scheme. An input binary sequence is modulated based on the different lengths of complementary code keying (CCK) modulation and then spread across the transmit antennas (spatial domain) in a multiple input multiple output (MIMO) system exploiting orthogonal frequency division multiplexing (OFDM). At the receiver side, zero forcing equalization is applied to the OFDM modulated data to mitigate the effect of the multipath fast fading channel and then followed by maximum likelihood (ML) detection to retrieve the input sequence. The performance of SCCKM in different MIMO systems is compared to that of spatial modulation (SM) as a baseline scheme. Simulation results show that for the same spectral efficiency, SCCKM is able to substantially improve the bit error rate (BER).
△ Less
Submitted 16 September, 2017;
originally announced September 2017.
-
Improvements on the k-center problem for uncertain data
Authors:
Sharareh Alipour,
Amir Jafari
Abstract:
In real applications, there are situations where we need to model some problems based on uncertain data. This leads us to define an uncertain model for some classical geometric optimization problems and propose algorithms to solve them. In this paper, we study the $k$-center problem, for uncertain input. In our setting, each uncertain point $P_i$ is located independently from other points in one o…
▽ More
In real applications, there are situations where we need to model some problems based on uncertain data. This leads us to define an uncertain model for some classical geometric optimization problems and propose algorithms to solve them. In this paper, we study the $k$-center problem, for uncertain input. In our setting, each uncertain point $P_i$ is located independently from other points in one of several possible locations $\{P_{i,1},\dots, P_{i,z_i}\}$ in a metric space with metric $d$, with specified probabilities and the goal is to compute $k$-centers $\{c_1,\dots, c_k\}$ that minimize the following expected cost $$Ecost(c_1,\dots, c_k)=\sum_{R\in Ω} prob(R)\max_{i=1,\dots, n}\min_{j=1,\dots k} d(\hat{P}_i,c_j)$$ here $Ω$ is the probability space of all realizations $$R=\{\hat{P}_1,\dots, \hat{P}_n\}$$ of given uncertain points and $$prob(R)=\prod_{i=1}^n prob(\hat{P}_i).$$
In restricted assigned version of this problem, an assignment $A:\{P_1,\dots, P_n\}\rightarrow \{c_1,\dots, c_k\}$ is given for any choice of centers and the goal is to minimize $$Ecost_A(c_1,\dots, c_k)=\sum_{R\in Ω} prob(R)\max_{i=1,\dots, n} d(\hat{P}_i,A(P_i)).$$ In unrestricted version, the assignment is not specified and the goal is to compute $k$ centers $\{c_1,\dots, c_k\}$ and an assignment $A$ that minimize the above expected cost.
We give several improved constant approximation factor algorithms for the assigned versions of this problem in a Euclidean space and in a general metric space. Our results significantly improve the results of \cite{guh} and generalize the results of \cite{wang} to any dimension. Our approach is to replace a certain center point for each uncertain point and study the properties of these certain points. The proposed algorithms are efficient and simple to implement.
△ Less
Submitted 30 August, 2017;
originally announced August 2017.
-
Ultra-Dense Networks: A New Look at the Proportional Fair Scheduler
Authors:
Ming Ding,
David Lopez Perez,
Amir H. Jafari,
Guoqiang Mao,
Zihuai Lin
Abstract:
In this paper, we theoretically study the proportional fair (PF) scheduler in the context of ultra-dense networks (UDNs). Analytical results are obtained for the coverage probability and the area spectral efficiency (ASE) performance of dense small cell networks (SCNs) with the PF scheduler employed at base stations (BSs). The key point of our analysis is that the typical user is no longer a rando…
▽ More
In this paper, we theoretically study the proportional fair (PF) scheduler in the context of ultra-dense networks (UDNs). Analytical results are obtained for the coverage probability and the area spectral efficiency (ASE) performance of dense small cell networks (SCNs) with the PF scheduler employed at base stations (BSs). The key point of our analysis is that the typical user is no longer a random user as assumed in most studies in the literature. Instead, a user with the maximum PF metric is chosen by its serving BS as the typical user. By comparing the previous results of the round-robin (RR) scheduler with our new results of the PF scheduler, we quantify the loss of the multi-user diversity of the PF scheduler with the network densification, which casts a new look at the role of the PF scheduler in UDNs. Our conclusion is that the RR scheduler should be used in UDNs to simplify the radio resource management (RRM).
△ Less
Submitted 26 September, 2017; v1 submitted 26 August, 2017;
originally announced August 2017.
-
Diversity Pulse Shaped Transmission in Ultra-Dense Small Cell Networks
Authors:
Amir H. Jafari,
Vijay Venkateswaran,
David Lopez-Perez,
Jie Zhang
Abstract:
In ultra-dense small cell networks, spatial multiplexing gain is a challenge because of the different propagation conditions. The channels associated with different transmitreceive pairs can be highly correlated due to the i) high probability of line-of-sight (LOS) communication between user equipment (UE) and base station (BS), and ii) insufficient spacing between antenna elements at both UE and…
▽ More
In ultra-dense small cell networks, spatial multiplexing gain is a challenge because of the different propagation conditions. The channels associated with different transmitreceive pairs can be highly correlated due to the i) high probability of line-of-sight (LOS) communication between user equipment (UE) and base station (BS), and ii) insufficient spacing between antenna elements at both UE and BS. In this paper, we propose a novel transmission technique titled Diversity Pulse Shaped Transmission (DPST) to enhance the throughput over the correlated MIMO channels in an ultra-dense small cell network. The fundamental of DPST is to shape transmit signals at adjacent antennas with distinct interpolating filters, introducing pulse shaping diversity. In DPST, each antenna transmits its own data stream with a relative deterministic time offset-which must be a fraction of the symbol period-with respect to the adjacent antenna. The delay is interpolated with the pulse shaped signal generating a virtual MIMO channel that benefits from increased diversity from the receiver perspective. To extract the diversity, the receiver must operate in an over-sampled domain and hence a fractionally spaced equaliser (FSE) is proposed. The joint impact of DPST and FSE helps the receiver to sense a less correlated channel, eventually enhancing the UE's throughput. Moreover, in order to minimise the spatial correlation, we aim to optimise the deterministic fractional delay. Simulation results show that applying DPST to a correlated channel can approximately enhance the UE throughput by 1.93x and 3.76x in 2x2 and 4x4 MIMO systems, respectively.
△ Less
Submitted 4 November, 2016;
originally announced November 2016.
-
Performance Impact of LOS and NLOS Transmissions in Dense Cellular Networks under Rician Fading
Authors:
Amir H. Jafari,
Ming Ding,
David Lopez-Perez,
Jie Zhang
Abstract:
In this paper, we analyse the performance of dense small cell network (SCNs). We derive analytical expressions for both their coverage probability and their area spectral efficiency (ASE) using a path loss model that considers both line-of-sight (LOS) and non-LOS (NLOS) components. Due to the close proximity of small cell base stations (BSs) and user equipments (UEs) in such dense SCNs, we also co…
▽ More
In this paper, we analyse the performance of dense small cell network (SCNs). We derive analytical expressions for both their coverage probability and their area spectral efficiency (ASE) using a path loss model that considers both line-of-sight (LOS) and non-LOS (NLOS) components. Due to the close proximity of small cell base stations (BSs) and user equipments (UEs) in such dense SCNs, we also consider Rician fading as the multi-path fading channel model for both the LOS and NLOS fading transmissions. The Rayleigh fading used in most of existing works analysing dense SCNs is not accurate enough. Then, we compare the performance impact of LOS and NLOS transmissions in dense SCNs under Rician fading with that based on Rayleigh fading. The analysis and the simulation results show that in dense SCNs where LOS transmissions dominate the performance, the impact of Rician fading on the overall system performance is minor, and does not help to address the performance losses brought by the transition of many interfering signals from NLOS to LOS.
△ Less
Submitted 28 October, 2016;
originally announced October 2016.
-
Pulse Shaping Diversity to Enhance Throughput in Ultra-Dense Small Cell Networks
Authors:
Amir H. Jafari,
Vijay Venkateswaran,
David Lopez-Perez,
Jie Zhang
Abstract:
Spatial multiplexing (SM) gains in multiple input multiple output (MIMO) cellular networks are limited when used in combination with ultra-dense small cell networks. This limitation is due to large spatial correlation among channel pairs. More specifically, it is due to i) line-of-sight (LOS) communication between user equipment (UE) and base station (BS) and ii) in-sufficient spacing between ante…
▽ More
Spatial multiplexing (SM) gains in multiple input multiple output (MIMO) cellular networks are limited when used in combination with ultra-dense small cell networks. This limitation is due to large spatial correlation among channel pairs. More specifically, it is due to i) line-of-sight (LOS) communication between user equipment (UE) and base station (BS) and ii) in-sufficient spacing between antenna elements. We propose to shape transmit signals at adjacent antennas with distinct interpolating filters which introduces pulse shaping diversity eventually leading to improved SINR and throughput at the UEs. In this technique, each antenna transmits its own data stream with a relative offset with respect to adjacent antenna. The delay which must be a fraction of symbol period is interpolated with the pulse shaped signal and generates a virtual MIMO channel that leads to improved diversity and SINR at the receiver. Note that non-integral sampling periods with inter-symbol interference (ISI) should be mitigated at the receiver. For this, we propose to use a fractionally spaced equalizer (FSE) designed based on the minimum mean squared error (MMSE) criterion. Simulation results show that for a 2x2 MIMO and with inter-site-distance (ISD) of 50 m, the median received SINR and throughput at the UE improves by a factor of 11 dB and 2x, respectively, which verifies that pulse shaping can overcome poor SM gains in ultra-dense small cell networks.
△ Less
Submitted 12 May, 2016;
originally announced May 2016.
-
An improved Constant-Factor Approximation Algorithm for Planar Visibility Counting Problem
Authors:
Sharareh Alipour,
Mohammad Ghodsi,
Amir Jafari
Abstract:
Given a set $S$ of $n$ disjoint line segments in $\mathbb{R}^{2}$, the visibility counting problem (VCP) is to preprocess $S$ such that the number of segments in $S$ visible from any query point $p$ can be computed quickly. This problem can trivially be solved in logarithmic query time using $O(n^{4})$ preprocessing time and space. Gudmundsson and Morin proposed a 2-approximation algorithm for thi…
▽ More
Given a set $S$ of $n$ disjoint line segments in $\mathbb{R}^{2}$, the visibility counting problem (VCP) is to preprocess $S$ such that the number of segments in $S$ visible from any query point $p$ can be computed quickly. This problem can trivially be solved in logarithmic query time using $O(n^{4})$ preprocessing time and space. Gudmundsson and Morin proposed a 2-approximation algorithm for this problem with a tradeoff between the space and the query time. They answer any query in $O_ε(n^{1-α})$ with $O_ε(n^{2+2α})$ of preprocessing time and space, where $α$ is a constant $0\leq α\leq 1$, $ε> 0$ is another constant that can be made arbitrarily small, and $O_ε(f(n))=O(f(n)n^ε)$.
In this paper, we propose a randomized approximation algorithm for VCP with a tradeoff between the space and the query time. We will show that for an arbitrary constants $0\leq β\leq \frac{2}{3}$ and $0<δ<1$, the expected preprocessing time, the expected space, and the query time of our algorithm are $O(n^{4-3β}\log n)$, $O(n^{4-3β})$, and $O(\frac{1}{δ^3}n^β\log n)$, respectively. The algorithm computes the number of visible segments from $p$, or $m_p$, exactly if $m_p\leq \frac{1}{δ^3}n^β\log n$. Otherwise, it computes a $(1+δ)$-approximation $m'_p$ with the probability of at least $1-\frac{1}{\log n}$, where $m_p\leq m'_p\leq (1+δ)m_p$.
△ Less
Submitted 11 May, 2016;
originally announced May 2016.
-
The effect of network structure on innovation initiation process: an evolutionary dynamics approach
Authors:
Afshin Jafari,
S. Peyman Shariatpanahi,
Mohammad Mahdi Zolfagharzadeh,
Mehdi Mohammadi
Abstract:
In this paper we have proposed a basic agent-based model based on evolutionary dynamics for investigating innovation initiation process. In our model we suppose each agent will represent a firm which is interacting with other firms through a given network structure. We consider a two-hit process for presenting a potentially successful innovation in this model and therefore at each time step each f…
▽ More
In this paper we have proposed a basic agent-based model based on evolutionary dynamics for investigating innovation initiation process. In our model we suppose each agent will represent a firm which is interacting with other firms through a given network structure. We consider a two-hit process for presenting a potentially successful innovation in this model and therefore at each time step each firm can be in on of three different stages which are respectively, Ordinary, Innovative, and Successful. We design different experiments in order to investigate how different interaction networks may affect the process of presenting a successful innovation to the market. In this experiments, we use five different network structures, i.e. Erdős and Rényi, Ring Lattice, Small World, Scale-Free and Distance-Based networks. According to the results of the simulations, for less frequent innovations like radical innovation, local structures are showing a better performance comparing to Scale-Free and Erdős and Rényi networks. Although as we move toward more frequent innovations, like incremental innovations, difference between network structures becomes less and non-local structures show relatively better performance.
△ Less
Submitted 16 April, 2016;
originally announced April 2016.
-
Study on Scheduling Techniques for Ultra Dense Small Cell Networks
Authors:
Amir H. Jafari,
David Lopez-Perez,
Ming Ding,
Jie Zhang
Abstract:
The most promising approach to enhance network capacity for the next generation of wireless cellular networks (5G) is densification, which benefits from the extensive spatial reuse of the spectrum and the reduced distance between transmitters and receivers. In this paper, we examine the performance of different schedulers in ultra dense small cell deployments. Due to the stronger line of sight (LO…
▽ More
The most promising approach to enhance network capacity for the next generation of wireless cellular networks (5G) is densification, which benefits from the extensive spatial reuse of the spectrum and the reduced distance between transmitters and receivers. In this paper, we examine the performance of different schedulers in ultra dense small cell deployments. Due to the stronger line of sight (LOS) at low inter-site distances (ISDs), we discuss that the Rician fading channel model is more suitable to study network performance than the Rayleigh one, and model the Rician K factor as a function of distance between the user equipment (UE) and its serving base station (BS). We also construct a cross-correlation shadowing model that takes into account the ISD, and finally investigate potential multi-user diversity gains in ultra dense small cell deployments by comparing the performances of proportional fair (PF) and round robin (RR) schedulers. Our study shows that as network becomes denser, the LOS component starts to dominate the path loss model which significantly increases the interference. Simulation results also show that multi-user diversity is considerably reduced at low ISDs, and thus the PF scheduling gain over the RR one is small, around 10% in terms of cell throughput. As a result, the RR scheduling may be preferred for dense small cell deployments due to its simplicity. Despite both the interference aggravation as well as the multi-user diversity loss, network densification is still worth it from a capacity view point.
△ Less
Submitted 21 June, 2015;
originally announced June 2015.
-
Towards 1 Gbps/UE in Cellular Systems: Understanding Ultra-Dense Small Cell Deployments
Authors:
David Lopez-Perez,
Ming Ding,
Holger Claussen,
Amir H. Jafari
Abstract:
Todays heterogeneous networks comprised of mostly macrocells and indoor small cells will not be able to meet the upcoming traffic demands. Indeed, it is forecasted that at least a 100x network capacity increase will be required to meet the traffic demands in 2020. As a result, vendors and operators are now looking at using every tool at hand to improve network capacity. In this epic campaign, thre…
▽ More
Todays heterogeneous networks comprised of mostly macrocells and indoor small cells will not be able to meet the upcoming traffic demands. Indeed, it is forecasted that at least a 100x network capacity increase will be required to meet the traffic demands in 2020. As a result, vendors and operators are now looking at using every tool at hand to improve network capacity. In this epic campaign, three paradigms are noteworthy, i.e., network densification, the use of higher frequency bands and spectral efficiency enhancement techniques. This paper aims at bringing further common understanding and analysing the potential gains and limitations of these three paradigms, together with the impact of idle mode capabilities at the small cells as well as the user equipment density and distribution in outdoor scenarios. Special attention is paid to network densification and its implications when transitioning to ultra-dense small cell deployments. Simulation results show that network densification with an average inter site distance of 35 m can increase the cell- edge UE throughput by up to 48x, while the use of the 10GHz band with a 500MHz bandwidth can increase the network capacity up to 5x. The use of beamforming with up to 4 antennas per small cell base station lacks behind with cell-edge throughput gains of up to 1.49x. Our study also shows how network densifications reduces multi-user diversity, and thus proportional fair alike schedulers start losing their advantages with respect to round robin ones. The energy efficiency of these ultra-dense small cell deployments is also analysed, indicating the need for energy harvesting approaches to make these deployments energy- efficient. Finally, the top ten challenges to be addressed to bring ultra-dense small cell deployments to reality are also discussed.
△ Less
Submitted 12 March, 2015;
originally announced March 2015.
-
ABS-NET: Fully Decentralized Runtime Adaptation for Distributed Objects
Authors:
Karl Palmskog,
Mads Dam,
Andreas Lundblad,
Ali Jafari
Abstract:
We present a formalized, fully decentralized runtime semantics for a core subset of ABS, a language and framework for modelling distributed object-oriented systems. The semantics incorporates an abstract graph representation of a network infrastructure, with network endpoints represented as graph nodes, and links as arcs with buffers, corresponding to OSI layer 2 interconnects. The key problem we…
▽ More
We present a formalized, fully decentralized runtime semantics for a core subset of ABS, a language and framework for modelling distributed object-oriented systems. The semantics incorporates an abstract graph representation of a network infrastructure, with network endpoints represented as graph nodes, and links as arcs with buffers, corresponding to OSI layer 2 interconnects. The key problem we wish to address is how to allocate computational tasks to nodes so that certain performance objectives are met. To this end, we use the semantics as a foundation for performing network-adaptive task execution via object migration between nodes. Adaptability is analyzed in terms of three Quality of Service objectives: node load, arc load and message latency. We have implemented the key parts of our semantics in a simulator and evaluated how well objectives are achieved for some application-relevant choices of network topology, migration procedure and ABS program. The evaluation suggests that it is feasible in a decentralized setting to continually meet both the objective of a node-balanced task allocation and make headway towards minimizing communication, and thus arc load and message latency.
△ Less
Submitted 16 October, 2013;
originally announced October 2013.