Project List
Project List
Immersive video is increasingly popular and has numerous applications in entertainment, education,
medicine, and marketing to name a few. It often takes the form of a 4K or 8K video offering a 360-degree
view of an environment. From that 360-degree video, a much smaller region (viewport) is selected and
displayed to a user, typically corresponding to the human perception. The next step forward in making 360-
degree environments more realistic is to provide the depth perception (360-3D). Research is very active in
this field, but adequate research tools permitting processing and visualization of 360-3D videos are lacking.
In this project, we aim at developing a set of tools to process and visualize 360-3D videos to support
research activities. These tools will be integrated into a 360-3D video player as the end deliverable. Three
aspects of the player need to be addressed: the input data, the processing, and the rendering/visualization.
Regarding the first aspect, the player must be able to read content from various compression formats, video
configurations and representations. For configurations based on several cameras a stitching step should
be incorporated into the processing pipeline. For the second aspect, the processing tools will include
visualization modes permitting to superimpose depth and quality information on the original video, toggling
between original and processed video, viewing in 3D or single eye, showing the frontier of tiles in tiled-
compressed videos. For the third aspect, the player will be able to control the viewport dimensions and
positions to adjust to the the viewing conditions. It should allow to select the views from coordinates in a
file, interactively with a mouse or in sync with a head-mounted device. Furthermore, the player will need to
support a wide range of display devices: autostereoscopic, polarized and shutter glasses.
2.0
Project Title: A Drone System for Search and Inspection of Indoor Environments
The aim of the project is to develop a system of multi-rotor drones for search and inspection of unknown
indoor (GPS-denied) environments. Applications of interest are in search and rescue, disaster recovery,
law environment, surveillance, mining, and inspection of hazardous sites. In the proposed system, an expert
operator will guide a flock of drones to achieve the desired task using different levels of control autonomy
from direct teleoperation, to shared control, and eventually to full autonomous control. Machine learning
techniques will be employed to enable the drones to gradually learn the exploration task from the operator
and become increasingly autonomous as they collect more data during the operation. The project will be
carried out by a number of graduate and undergraduate students in the Telerobotics, Haptics and
Computational Vision Laboratory at McMaster University. The work will involve development of new drone
platforms and algorithms for control system, computer vision, SLAM, and machine learning.
3.0
Project Title: A Statistical Framework for Image Color Constancy
Color can be an important indication for computer vision or image processing related topics, like human-
computer interaction, color feature extraction and color appearance models. The colors that are present in
images are determined by the intrinsic properties of objects and surfaces as well as the color of the light
source. The effects of the light source should be filtered out as an important preprocessing step for robust
color-based computer vision algorithms. This ability to account for the color of the light source is called color
constancy. Previous work demonstrated empirically that Bayesian color constancy with appropriate non-
Gaussian models can outperform gamut mapping and Greyworld algorithms traditionally used for
computational color constancy. Human vision has the natural tendency to correct for the effects of the color
of the light source but the mechanism that is involved with this ability is not yet fully understood. Algorithms
for color constancy have fallen into three groups; static methods, gamut-based methods and learning-based
methods. Generally, the statistical approach is attractive, since it is more general and more automatic.
When using a single image that is taken with a regular digital camera, illuminant estimation is an under
constrained problem; both the intrinsic properties of a surface and the color of the illuminant have to be
estimated, while only the product of the two (i.e. the actual image) is known. The goal is to propose an
algorithm for color constancy that can accurately and efficiently estimate the color of the illuminant. With
various data sets currently available, ranging from high- quality hyperspectral scenes to large-scale real-
world RGB- images, the student will be able to evaluate the proposed computational color constancy
method.
2
IMPLEMENTATION OF A COLOR CONSTANCY APPROACH BASED ON A STATISTICAL
APPROACH.
BASIC STATISTICAL CONCEPTS AND PROGRAMMING SKILLS ARE REQUIRED. RESEARCH SKILLS WILL BE
DEVELOPED DURING THE PROJECT, WITH THE SPECIAL EMPHASIS ON STATISTICAL MODELING
AND COMPUTER VISION. RESULTS ANALYSIS SKILLS WILL BE DEVELOPED DURING THE RESEARCH CRUISE,
INCLUDING GOOD PRACTICES FOR DISCUSSION AND COMPARISON WITH PREVIOUSLY USED METHODS.
4.0
Project Title: Activity recognition for remote healthcare and e-learning systems
Vision-based motion detection and tracking leads the development of video surveillance to a new era which
is human activity recognition. Currently many algorithms have been proposed to recognize human activities
though a number of shortcomings are limiting their applicability in practical system design. The limiting
factors include inaccurate segmentation of moving objects, challenges in body parts tracking of human
motion, and lacking of robust features and methods for activity classification. Edge segment based motion
detection and tracking have good opportunity to provide robust solution in this aspect. Because of using
edge segment, it may be possible to utilize edges contextual information and local geometric transformation
to improve tracking of human body parts independently with a higher accuracy. From the correlation of
human body parts movement, it is possible to extract robust features to detect human activities accurately.
Different feature extraction and machine learning techniques will be used to identify the correlation of body
parts movement. Moreover, confident-frame based recognition algorithms will be used to increase the
robustness of the activity recognition system. One major application of this research will be Telecare which
will offer remote care to elderly and physically less able people while allowing them to remain living in their
own homes. Another potential application will be to help computers better interacting with humans in case
of nonverbal communication in online learning system. Gaming and animation are becoming interesting
mediums in online learning, where more realistic motion-synthesis is possible by human activity recognition
to make the game and animation more realistic and interesting. Many educational institutions that conduct
teaching online will be benefitted from this research.
3
5.0
Project Title: Activity Recognition using Deep Neural Network
Image processing has been a popular research area for a long time but with the emergence of deep neural
networks and high performance computing resources, we see a big hype in image processing and computer
vision research. Students at BAM Lab have worked on short term projects on image super-resolution, video
object recognition and autonomous vehicles using image recognition. This project will focus on extending
those deep neural network models to perform activity recognition from video clips. A small set of activities
will be used to validate the deep learning model. A public dataset will be used initially for the research and
later real-time and recorded video streams willl be used for testing the model. There are many applications
of such a model in creating a video surveillance system, skill development and testing system for
manufacturing industries or nursing, and patient monitoring system at retirement homes. Convolutional and
generative adversarial neural networks have shown great promises in image recognition. Long Short Term
Memory models are used to extend the networks to perform better. Computational cost is a big concern in
developing such deep learning models. Parallel programming using GPUs can expedite the training process
of the networks. Ultimately the neural network will perform classification to recognize activity patterns.
Interested students must have a keen interest in computer vision and willingness to explore the area of
activity recognition from videos.
THE STUDENT WILL WORK AT BAM LAB AT THE SCHOOL OF COMPUTING AT QUEEN'S UNIVERSITY FOR THE
MAJORITY OF THE TIME. THE STUDENT MUST ATTEND RESEARCH GROUP AND INDIVIDUAL MEETINGS
(TYPICALLY ONCE A WEEK). THE FREQUENCY AND DATES OF SUCH MEETINGS WILL BE DECIDED LATER.
THE STUDENT SHOULD ALSO TRY TO ATTEND OTHER RELEVANT RESEARCH TALKS AND EVENTS AT THE
SCHOOL W HICH ARE ANNOUNCED BY EMAILS. DURING THE 3 MONTH PROJECT, THE STUDENT WILL
EXPLORE THE STATE-OF-THE-ART RESEARCH TECHNIQUES ON DEEP LEARNING FOR ACTIVITY
RECOGNITION BY FINDING AND READING RESEARCH PAPERS AND RELEVANT INFORMATION SOURCES
(1ST MONTH), PREPROCESS THE DATA AS NECESSARY FOR DEVELOPING THE MACHINE LEARNING
MODELS, DESIGN AND IMPLEMENT THE MODEL USING A MACHINE LEARNING TOOL PREFERABLY
TENSORFLOW (2ND MONTH), AND VALIDATE THE CLASSIFICATION ACCURACY OF THE MODEL USING
ESTABLISHED MEASURES SUCH AS PRECISION AND RECALL AND SUBMIT A WRITTEN REPORT ON THE
WORK (3RD MONTH). THE STUDENT WILL GIVE PRESENTATIONS IN THE GROUP TO SHARE THE
KNOWLEDGE WITH OTHER MEMBERS IN THE TEAM.
6.0
Project Title: Affective computing with weak supervision
4
This research project aims at leveraging weakly supervised techniques for improving affective computing.
Affective computing is the automatic recognition of human emotions and feeling based on visual
appearance (mostly facial expressions, but also body posture and motion) and speech and other
measurements (e.g. heartbeat). Standard approaches for this task are based on fully supervised methods,
in which all data are fully annotated with the corresponding ground truth emotions. These approaches have
shown impressive results. However, they require a large set of annotated data, which is very expensive to
collect because it requires the manual annotation of every sample by one or multiple experts. In this project,
we aim to obtaining excellent affective computing recognition with limited data and annotations. For
instance, often a video sequence is annotated with the expressed emotion, but no annotation of the exact
frames where the emotion is actually expressed is given. Using a weakly supervised approach such as
visual attention, we can not only recognize the expressed emotion on a new video, but also temporally
localize in which frames of the video this emotion has been shown. The same approach can also be used
for localizing which parts of a face are the most important for showing a certain emotion. These methods
can be very relevant for some specific applications, in which we want to know not only the presence of an
emotion, but also its spatial and temporal localization without the burden of collecting full annotations. The
developed techniques will be evaluated on modern affective computing datasets. In case of promising
results, these will be published on an international conference or journal.
Strong mathematical background - Strong programming (possibly python and pytorch) and algorithm
development skills - Knowledge and previous experience on basic Deep Learning and Computer
VIsion algorithms - Ability to collaborate with other students in a project - Basic knowledge of the Linux
operating system
This project is about designing and testing artificial intelligence-enabled companion robots. We aim at
addressing human assistance through the means of facial expression-capable mobile robots. The project
requires the intern to develop (or improve existing) Android interfaces to be used by the robot. The backend
requires knowledge of Python and machine learning libraries including deep learning frameworks for
artificial intelligence. Furthermore, the project will require involvement in the design of a set of experiments
5
with human participants. We will test a companion robot in assisting human beings in their daily tasks,
including sentiment analysis with a number of artificial intelligence features such as voice and face
expression recognition, computer vision, and natural language processing. The overarching goal of this
project is to characterize and measure aspects of the interaction between a companion robot and a human
participant. For example, we are interested in knowing whether the robot can infer sentiment solely based
on a few cues (intentional or otherwise) from the human's voice, and how accurately the robot grows
empathy the longer the human interacts with the robot.
The successful students should know Python, machine learning, and deep learning frameworks, as well as
having some familiarity with the Android operating system which is the one used in the companion robots.
Experience with artificial intelligence and previous launching of Android apps in Google Play are considered
very valuable assets. Another valuable asset would be knowledge of, previous experience with, or at least
enthusiasm for positive psychology. Furthermore, the successful students must possess solid English
communication skills both verbal and written, have very good interpersonal skills, and be willing to work in
a dynamic team.
Corporations are increasingly interested in adopting Industry 4.0 in their production and services because
of its ability to boost their competitive strength in global markets. The provision of individualized mass
customization in Industry 4.0, on the one hand, satisfies personalized needs, quality and cost constraints
and, on the other hand, enables the businesses to respond to the environmental, energy and other global
challenges. In a manufacturing process, this combination requires a smart factory with remarkably flexible
operation of not only machines but also self-conscious products to swiftly respond to change in every step
of the process. The smart factory is composed of smart machines and product components, communication
networks, cloud services, and control terminals. These comprise the hardware for computation,
communication and control (3Cs) of smart production, but the autonomy of production in the smart factory
still requires the ability to analyze a great volume of data and determine the best possible actions in real
time. Artificial intelligence (AI) is the brain of the smart factory and serves two main purposes. First, it filters,
analyzes and ultimately learns from the big data mostly generated through machine to machine (M2M)
communication in the smart factory. Big data analytics can, for example, be used for digital twins or
predictive manufacturing to provide early warning and prevent catastrophic failures. Second, AI provides
reasoning and autonomous decision making tools that have the ability to analyze and adapt to new
situations. This is not a trivial task since decisions have to be made based on uncertain information and
6
imperfect data. This project will focus on some of the machine learning and approximate reasoning
techniques that can work for the AI component of a smart factory.
The successful candidate will need to have a basic skills in programming, hardware, and electronics but
must also be interested in computer vision, machine learning, artificial intelligence, mechatronics, and
robotic systems.
Project Title: Augmented reality tools for communicating with collaborative robots
In Canada, manufacturing companies currently face a lack of specialized workers. To avoid decreased
productivity and maintain competitiveness with a shrinking workforce, many industries turned to automation.
Industrial robots are already well integrated in large-scale manufacturing operations such as the automobile
industry. However, small and high-precision manufacturing facilities still rely on skilled manual labor. These
facilities produce low volume, but highly specialized products that are customized to their clients, and thus
require adaptable production facilities. Traditional automation solutions are not easily adaptable, hence why
they are not often used. Collaborative robotics aim at solving this issue. By providing advanced tools that
can work directly with highly skilled workers, collaborative robotics combine the best of both worlds: the
repeatability and precision of robots with the intuitiveness, decisional skills and process intelligence of
specialized workers. Collaborative robots are also easier to adapt, as they provide features such as learning
from demonstration by physically guiding the robot through its task. However, collaborative robots still
require specialized robotics knowledge to properly integrate them in an existing process. One issue is the
lack of proper communication channels between domain experts (the skilled workers) and the machine,
especially when the internal state of the robot needs to be known. For instance, when a robot stops in its
track, possibly because of a false obstacle detection, it is not always simple to understand what it perceives.
Robot programming interfaces present the information in a very schematical way, and this information is
often difficult to transpose in the real world. For this problem, recent augmented reality developments can
help. By displaying information such as sensor readings directly in the environment of the robot, we believe
this could help the process of programming collaborative robots by experts in their application domain
instead of experts of the robot itself.
7
Faculty supervisor François Ferland
At IntRoLab, we specialize in developing and using novel methods in
mecatronics and artificial intelligence to build autonomous intelligent systems.
Our research can be described as pragmatic, as we strive to overcome the
challenges faced when bringing robots in the real world, at home and at work.
We conduct projects ranging from telepresence robots for healthcare scenarios,
Specialization
3D simultaneous localization and mapping using both visual and rangefinding
sensors, open source hardware and software 3D sound source localization and
separation solutions, and control architectures for autonomous robots that
integrate episodic memory, selective attention, and artificial emotions to regulate
decisional processes.
Host Province Québec
Host University Université de Sherbrooke – Sherbrooke
Language English
Preferred start date 2020-05-01 (yyyy-mm-dd)
The candidate will be involved in developing prototypes of augmented reality applications for handheld
devices such as smartphones and tablet computers. The candidate will be involved in fields including, but
not limited to: - Visual 6 degrees-of-freedom odometry; - 3D recognition and localization of known patterns
and solid models, using machine learning and deep neural networks; - 3D rendering of information markers
and dynamic mechanical models; - Remote data processing; - Wireless communication for high fidelity
transport of dynamic visual content. The candidate will have to design, propose and implement solutions in
one or more of those fields. At the end of the internship, the candidate should be able to present their work
in a scientific context such as a robotics conference or workshop. The candidate will become a member of
IntRoLab, located at the Interdisciplinary Institute for Technological Innovation (3IT). This research facility,
situated at a walking distance from the main campus of Université de Sherbrooke, regroups researchers
from diverse backgrounds such as microelectronics, nanotechnology, mecatronics and interactive
intelligent systems. The 3IT has facilities and working spaces to both design and test physical prototypes.
We are looking for highly motivated students with a computer science, computer engineering, and/or
software engineering background. Skills in computer graphics and computer vision is a strong plus.
Experience with robotic environments such as ROS is a plus, but not required. We work in a Linux-based
environment, mostly in Python and C++, but also integrate web-based technologies and mobile
environments such as iOS and Android. Good writing and speaking skills, in english or french, are
mandatory.
This research project aims at improving recognition tasks by learning an automatic form of data
augmentation. Deep learning models have shown impressive performance, however they are very data
hungry and might require millions of data samples in order to provide good performance. In many fields
(e.g. medical images) collecting large datasets is difficult (lack of patients) and very expensive (cost of
expert annotators). To reduce the impact of this problem, regularization techniques can be used.
Regularization is a broad spectrum of methods that aims to reduce the space of possible solutions by
adding known or reasonable constraints. In this project, we aim to obtain good visual recognition
performance when using a reduced set of training data by the use of data augmentation. This technique
consists on creating new data samples as combination of the original training samples with well defined
transformations that are known to not change the semantic content of the data. For instance, for images,
typical transformations are horizontal flip, rotations, scale and illumination changes. While some
tranformations are discrete (e.g. image flip) other are defined by some parameters (e.g. angle of rotation).
In this case the standard way to find the right parameters is to use a validation set and test each value in a
brute-force approach. In this project we aim to learn those data augmentation parameters while learning in
order to avoid the slow iterative validation process. This will allow us to select a set of possible families of
transformations and let the learning select the right transformation parameters. The proposed technique
will be evaluated on classical computer vision task such as image classification and or object detection.
8
Faculty supervisor Marco Pedersoli
My research focuses on efficient methods for learning with reduced supervision.
The objective of my research is to leverage the exponentially growing data
coming from the internet (e.g. social media, internet of things, etc..) to obtain
Specialization stronger deep models, while reducing their computational complexity and the
need for costly annotations! My research is applied to visual recognition tasks
such as object detection, image understanding and captioning and emotion
recognition among others.
Host Province Québec
Host University École de technologie supérieure – Montréal
Language English
2020-06-01 (yyyy-mm-dd)
The student will be developing and evaluating algorithms for the proposed
project. For doing that, he/she will be expected to read papers from related work
to have a clear overview of the literature in the field. He/she will have to learn
how to use and share the computational resources available in my lab in order
to run experiments. Finally, he/she will be expected to write a report/article
about the project and results. The student will be hosted in my lab and
surrounded by other MSc and PhD students working on similar topics. The
student will be under my supervision and he/she will be associated with a more
experienced student of my group that can help him/her to successfully
accomplish the main objectives of the project. The student will have the
possibility to participate to the regular activities of the lab, such as reading
groups, invited presentations and other activities.
This project focuses on analyzing visual behaviors of articulated objects (human and animals) from video
feed, a problem that plays a crucial role in many real-life applications ranging from natural user interface to
autonomous driving. As a member of the Vision and Learning Lab at the ECE department
(http://www.ece.ualberta.ca/~lcheng5/), you are expected to work with a graduate student/Postdoc
researcher, get familiar with state-of-the-art deep learning techniques, and gain hands-on research
experience on benchmark and home-grown datasets. There are ample opportunities to be involved in
exciting research topics in computer vision and to publish research findings at top-tier conferences and
journals.
9
Project Title: Cardiac MRI Analysis Using Deep Convolutional Neural Networks
Cardiac MRI generates a large number of images per patient in each scan. For instance, a typical short-
axis sequence of an adult MRI scan consists of more than 200 images, and manual segmentation of left
ventricle from all these images might take more than 20 minutes. Only a limited set of measurements
associated with the scan is computed in regular clinical practice. Automating the analysis could lead to a
comprehensive analysis of the heart function. However, automating the analysis using traditional image
processing approaches poses a number of challenges including accurately identifying the regions of the
heart chamber walls which have more or less the same intensities as other tissues such as papillary
muscles. Recent studies have shown that deep learning approaches have the ability to accurately
delineating the regions of interests provided that they are trained with sufficient data. The objective of this
study is to delineate cardiac structures and analyze its function automatically. We intend to segment
ventricles and atria from a sequence of cardiac magnetic resonance imaging. We further plan to investigate
the cardiac function through a number of measurements including volumetric filling rate, strain, strain rates
and E and A peak filling rates. Deep learning approaches require a large number of datasets for training to
produce accurate results. Two hundred MR scans will be utilized in this study, where the first hundred scans
will be used as training set and the second hundred will serve as the testing set. The results of the
automated segmentation will be compared with manual delineation in terms of Dice score, root mean
squared error and Hausdorff distance. The segmentation results of the deep learning approaches will be
used for generating quantitative clinical measurements such as ejection fraction, stroke volume, strain,
strain rates, volumetric filling rates.
The project requires programming skills in Python and MATLAB. The successful candidate will be an
undergraduate student from computer science, mathematics, electrical engineering, computer engineering
or any other related discipline with a background in image processing & visualization, and computer
10
graphics. Previous coding experience with deep learning frameworks such as Tensorflow, Theano or Caffe,
and image processing software packages such as Insight Segmentation and Registration Toolkit (ITK), or
OpenCV is preferable
Optional pricing is one of the central activities in computational finance. The well-known Black-Scholes
model of pricing the European option leads to a time dependent partial differential equation (PDE). While
analytic solutions exist for special cases, numerical computation is necessary to approximate the solution.
Beyond the standard Black-Scholes model for European option, many numerical methods have been
developed for pricing American and exotic options such as Asian option. Furthermore, new features such
as jumps in underlying asset prices and nonconstant volatility have been incorporated in jump-diffusion,
Heston's, and regime-switching models. In practice, it is important that one can obtain the solutions fast
and accurate. However, the complexity of the solution process depends on the number of underlying assets.
When the number of assets are limited to 1 or 2, efficient PDE methods have been developed. In practice,
the number of assets is often much higher in the order of tens or even hundreds. Statistical methods such
as Monte Carlo have been proposed but still they are inefficient when the asset number is very high. The
research project is to explore machine learning technology in order to solve option pricing equations from
computational finance in high dimensions. The project will particularly focus on applying the state-of-the-
art machine learning models to improve the efficiency of financial computation. Machine learning has been
showing success in different applications such as computer vision, but relatively few in computational
finance. This project is to investigate the possibility of solving finance equations using machine learning
technology that would be more effective than traditional approach.
Students working in this project are preferred to have the following background: 1. Calculus and linear
algebra 2. Numerical computation 3. Programming in C/C++ (Background in finance/machine learning is
not required, but it would be useful)
Project Title: Computer vision and machine learning (deep learning) to assess orofacial deficit
Use of facial expression/movement analysis is becoming popular in clinical applications. Examples include
detecting and assessing signs of Parkinson's disease, depression, agitation, pain, etc. We are
developing computer vision and machine learning models to automatically analyse facial movements in
patient populations and to estimate clinically relevant information related to orofacial impairment. Once
developed, this will enable the automated and objective assessment of symptoms. The project involves the
application of deep learning techniques to analyse images and videos of patients' faces as they talk or as
they undergo a clinical assessment. We have a large dataset of such images and videos and and the
projects will focus on algorithm development and evaluation (and not on data collection). The two students
11
will be working closely together, and also closely with other members of our research team, including
research engineers, graduate students, and postdoctoral fellows.
We have two openings. For both positions, we are looking to recruit senior (final year) computer student
with computer science majors and concentration in computer vision/machine learning (or at least an
introductory course in both). Experience with a deep learning package (PyTorch or TensorFlow/Keras,
preferably both) is required. Programming proficiency in Python is required. Facility with implementing
algorithms from papers is desired.
In this project, we want to tackle open problems in Machine Learning, Artificial Intelligence (AI) and Robotic
Perception. The fusion of these research fields enables self-driving robots, autonomous swarms of UAVs,
and even robotic systems for planetary exploration---robotic systems that are available to us. At MIST Lab,
we want to enable multi-robot systems and swarms to operate autonomously in an open-world/open-set
context, e.g. search&rescue or planetary exploration. In addition, if time allows, we want to investigate the
field of Human-Robot Interaction (HRI) and see how a robotic system can apply machine learning and AI
to interact with humans in the loop. The MIST Laboratory has developed the Buzz
(http://the.swarming.buzz) language and has 10 Kephera IV wheeled robots, 6 Matrice 100 flying drones,
6 3DRobotics Solo drones, 3 Spiri, 25 Zooids, 1 Intel Aero, 60 Kilobots, and a state-of-the-art arena with
infrared tracking technology. All these are used for this project.
12
Work with multi-robot systems or swarms of terrestrial or aerial robots - Investigate current (unsolved)
Machine Learning and AI problems - Implement and compare a variety of algorithms from current research
- Possibly help with the execution of a HRI user study - If successful: Contribute to a journal or conference
research paper
Programming skills in C/C++ necessary, adittional languages are a plus. Experience with Machine
Learning, AI, Computer Visionand/or Robotics is desirable.
The advance of high- or super-resolution microscopy hardware nowadays enable biologists to see tiny
biological objects such as cells, cell nuclei, and even smaller protein structures. We have collaborated with
biologists and clinicians on analyzing cellular or medical images for answering important scientific
questions. In this project, you are expected to conduct 2D/3D biomedical image Understanding work in an
interdisciplinary research setting. As a member of the Vision and Learning Lab at the ECE department
(http://www.ece.ualberta.ca/~lcheng5/), you are expected to work with a graduate student/Postdoc
researcher, get familiar with state-of-the-art deep learning techniques, and gain hands-on research
experience on benchmark and home-grown datasets. There are ample opportunities to be involved in
exciting research topics in computer vision and to publish research findings at top-tier conferences and
journals.
Necessary image processing background and working knowledge of calculus and linear algebra; Good
programming skill in python/c++.
Project Title: Deep Learning Architectures for Visual Recognition in Video Surveillance
Applications
The ability to automatically recognise activities, cars, events, faces, license plates, people, and other
objects of interest in video streams recorded across a distributed network of surveillance cameras can
greatly enhance security and situational awareness. However, recognising objects in unconstrained real-
world videos are among the most challenging tasks in video surveillance because accurate and timely
responses are required for analysis in complex real-world environments. Indeed, the performance of state-
of-the-art recognition systems is severely affected by changes in captured conditions (pose, illumination,
blur, scale, etc.) and camera interoperability. Moreover, since object recognition models are typically
designed a priori using a limited number of labelled reference samples that are captured under controlled
and/or specific conditions, they are often poor representatives of a target object to be recognised during
operations. Adaptation of these systems to multiple non-stationary operational environments (defined by
varying environmental factors, camera viewpoints and individual behaviours) remains a fundamental issue
for object recognition in real-world surveillance applications. The objective of this project is to develop new
adaptive spatiotemporal systems for accurate recognition of faces and actions across a network of
surveillance cameras. Despite the limited amount of labelled reference data, these systems will benefit from
new cross-domain adaptation approaches that rely on unlabeled data and contextual information from the
operational environment in order to sustain a high level of performance. Our particular interest is with Trunk-
Branch Ensemble CNNs, where a trunk network extracts features from the global holistic appearance of
object ROIs, and branch networks effectively embed local asymmetrical and complex object
representations. For spatiotemporal recognition, hybrid deep learning architectures will allow combining
information from CNN and RNN layers over object trajectories. Finally, since time and memory complexity
13
are key issues for real-time video surveillance applications, efficient CNN architectures, like PVANET and
Deep Fried FreshNets, will be considered.
we are looking for highly motivated students, who are interested participating in cutting-edge research on
specialized techniques for the detection, tracking and recognition of actions, cars, faces, people, etc., in
video surveillance applications, with a particular focus on deep learning (e.g, CNN and LSTM) architectures,
information fusion and domain adaptation. A prospective applicant should have: • Strong academic record
in computer science, applied mathematics, or electrical engineering, preferably with emphasis on one of
the following areas: machine learning and computer vision; • A good mathematical background; • Good
programming skills in languages such as C, C++, Python and/or MATLAB.
The analysis of people’s nutrition habits is one of the most important topics in food-related studies. Many
people would like to keep track of their diets to achieve weight loss goals or manage their diabetes or food
allergies. Moreover, tracking human nutrition diaries also support services to improve human health as well
as help food companies target products and advertisements to customers more effectively. However, most
current applications which requires manual data entry are tedious and less efficient. Hence, methods
automatically logging and analyzing human meals could not only make the understanding of human eating
behaviour easier, but also boost multifarious applications in the healthcare industry. Moving along this
direction, our research aims to leverage smart sensing techniques to analyze the food of daily meals, and
then estimate the nutrition contents, e.g., calories, fat and carbohydrates. Specifically, we adopt computer
vision algorithms to detect and recognize the food contents from dish images captured by users’
smartphones or wearable cameras. Thanks to the rapid development of mobile networks and Internet of
Things (IoT), large-scale food data become more accessible since people always share, upload and record
what they eat. We collect such large-scale food data and develop several novel image processing (e.g.,
14
segmentation) and deep learning algorithms, trained to locate and identify different food items in dish
pictures. The proposed system will recognize major food contents of a meal and predict/estimate accurate
nutrition contents from a few images (e.g., less than 3). The evaluation and comparison between our
proposed methods and existing solutions will be provided. In this project, the participated students are
supervised and instructed to accelerate the design and implementation of the platform and the comparison
of aforementioned methods. The unified platform and machine learning methods will be open sourced at
the end of the project.
List2
The segmentation of medical images (e.g., brain, spine, heart, etc.) is of key importance to many clinical
applications for the diagnosis and tracking of various diseases, as well as for selecting optimal treatment
plans. Recently, methods based on deep neural networks have led to significant improvements in terms of
accuracy, in a wide range of segmentation tasks. However, these methods usually require a large amount
of expert-labelled data for training, which is rarely available in medical applications. The goal of this project
is to develop segmentation methods that use deep neural networks in a semi-supervised or weakly-
supervised manner to enhance the performance of current approaches, when a limited amount of training
data is available. In the semi-supervised setting, a large set of images is available, but only few of them
have been labelled by experts. Weakly supervised segmentation extends this setting by including
incomplete of noisy information from the expert. In clinical applications, weak annotations are used to
reduce the work required by the expert for labelling images. Various deep learning models may be
investigated for this project, among which are fully-convolutional neural networks for the segmentation of
3D images, and adversarial neural networks for exploiting unlabeled or weakly labeled images. The
research intern is expected to work in close collaboration with graduate students (Ph.D. and postdoc)
involved in ongoing projects related to this topic.
15
Host University École de technologie supérieure – Montréal
Language English
Preferred start date 2020-05-01 (yyyy-mm-dd)
n collaboration with graduate students, the intern will be responsible of coding and testing deep learning
models for the segmentation of medical images (e.g., 3D brain MRI). The intern will also write a technical
report summarizing his/her work and, possibly, will be involved in writing scientific papers resulting from
this project.
Programming in Matlab and/or Python; * A good understanding of machine learning techniques, in particular
those related to deep learning and computer vision (e.g., CNN, adversarial networks, etc.). Experience
with a deep learning library like Pytorch or TensorFlow is a plus; * Comfortable working in a team.
Project Title: Deep Learning for Mental State Detection from EEG
Electroencephalography (EEG) is a monitoring method for recording the electrical activity of the brain. EEG
devices have been used in medicine for various conditions including seizure disorders, head injuries, brain
tumors, sleep disorders, and stroke. In recent years, advancement of consumer-grade devices together
with their more reasonable price have created possibilities for new applications such as EEG-based brain-
computer interfaces. This research objective is to detect emotional and mental states such as focus, stress,
calm, and boredom from EEG signal and consequently enable exploration of reactions to various media
(video, music) or real-life events. Artifacts, undesirable electrical potentials which come from sources other
than the brain such as blinking, jaw clenching, squinting, tongue movement, eyeball movement, and
perspiration, are examples of factors that make this task difficult. Such artifacts need to be either filtered
out or accented for in the emotion state detection system. Placement of the EEG device on the subject also
may alter EEG signal. Moreover, demographics variations are presents; several studies noted gender and
age effects on EEG readings and observed effects of meditation. EEG readings even differ for the same
subject over different session depending on factors such as time of day. This creates an obstacle for
creating universal EEG categorizer and the emotion/mental state detection system may need to be
calibrated for each person and even for each recording session.
16
recognition, and natural language processing; here it will be used for states detection. Specifically,
Recurrent Neural Networks (RNNs) including GRU, LSTM and Seq2Seq will be explored because of their
ability to capture time dependencies present in time-series signals. The focus will be on training the system
on one set of subjects and using if for the different subjects while keeping the calibration to a minimum.
This project focuses on developing deep learning techniques to interpret images/videos. As a member of
the Vision and Learning Lab at the ECE department (http://www.ece.ualberta.ca/~lcheng5/), you are
expected to work with a graduate student/Postdoc researcher, get familiar with state-of-the-art deep
learning techniques, and gain hands-on research experience on benchmark and home-grown datasets.
There are ample opportunities to be involved in exciting research topics in computer vision and to publish
research findings at top-tier conferences and journals.
Deep learning is currently the pervasive neural network technique given its phenomenal performance
in computer vision (Russakovsky, et al., 2015), speech (Amodei, et al., 2015) and natural language
processing (Vinyals et al., 2015). In most cases, the performance of these models is fueled by the presence
of large datasets and availability of computational power. Recurrent Neural Networks (RNNs) and
Convolutional Neural Networks (CNNs) are two of the most powerful and popular frameworks for modeling
sequential data such as speech and text and images respectively. We propose to develop an
implementation of a model that utilized a combination of these models and is scalable to large-scale
industrial datasets. These models can be tweaked to suit the application at hand. We will test the
performance of these models on data obtained from safety-critical systems.
17
Python, C/C++., and R are required. Basic understanding of machine learning will be an asset. The
candidate should be familiar with TensorFlow, Caffe, and Keras.
Medical imaging are revolutionizing medicine. Images allow clinicians and researchers to peer inside the
human body. In order to quantify properties of tissue, e.g. to diagnose or track disease, computer software
and algorithms must be used to help analysis the data. The goal is to develop a graphical user interface
medical image analysis software based on the state of the art research that is done at the Medical Image
Analysis Lab at Simon Fraser University.
evelop a GUI software that allows the user to read 2D/3D medical images (e.g. microscopy or MRI) saved
using medical image file formats (e.g. DICOM, MetaImage). Display the images. Allow the user to specify
point and curves onto the image as well as set values of different parameters. Use the input from previous
steps to guide the segmentation/registration of medical images (methods for this step will be provided).
Allow the user to change parameters and update the results quickly.
This internship project is part of a larger, collaborative project with the department of linguistics at UQAM.
The goal of this larger project is to develop and assess articulatory biofeedback methods in the context of
speech therapy for speech-impaired children. While tongue motion biofeedback will be provided using real-
time ultrasound imaging (the group's specialty), lip motion biofeedback (which is also very important to
speech and the topic of the internship), can easily be provided using real-time imagery from a video camera.
The goal of the internship project is to develop a software prototype that allows real-time tracking of the
shape taken by the lips during speech. The software will be developed for a Linux platform and will exploit
existing open-source libraries for general computer vision (e.g., OpenCV) and face analysis (e.g. dlib).
18
Preferred start date 2020-05-01 (yyyy-mm-dd)
The student will develop a first prototype of the lip tracking software, possibly by integrating and adapting
existing open-source code to our needs. The student will then test the prototype software with input from a
standard webcam in a variety of imaging conditions and document the results in an internship report. The
student will also be required to present the work to the research group towards the end of the internship
Required: Strong programming skills in C++ and/or Python (preferably both), basic knowledge of the Linux
operating system and its software development environments, basic knowledge in computer vision/image
processing.
Project Title: Early dropout prediction in e-learning courses in Moodle using machine learning
High dropout rate is a major concern for many universities that offer e-learning courses. If the dropout-
prone students can be identified at their early stages of learning, the dropout rate can be reduced by
providing individualized care to the students at-risk. Due to the electronic nature of the learning
management systems (LMS), various attributes of the students’ progress can be monitored and analyzed
over time. The main objective of this project is to explore what progress information about a student can be
extracted from Moodle in a course and how this information can be used as features with a classifier to
predict dropout prone students at an early stage to provide individualized support. Combination of different
features as well as classifiers will also be analyzed in this project.
tudents are expected to have the following skills: --Feature selection --Pattern classification --Moodle –
Python
Student will develop a demo course with an instructor and few students in Moodle. -- Analyze in depth what
activities of a student can be monitored during students learning and how that information can be used as
features for a classifier -- Student will do programming in Python to use those features with a classifier or a
combination of classifiers for dropout prediction.
Earprints are widely used as a biometric parameter for the identification of individuals. In addition, they are
used in the study of crime scenes in the field of forensic sciences. An ear impression is the trace left by an
individual by sticking his ear to a surface such as a wall, door, window, etc. These fingerprints are sampled
similarly as for fingerprints. An image of an ear impression is inherently fuzzy and consists of several areas
from different prominent parts of the ear. In the literature, several techniques have been developed to
extract relevant information on discriminant characteristics for identification. The purpose of the internship
will be to apply the most relevant image processing algorithms to automate this task from a database of
fingerprints available at the University of Quebec at Trois-Rivières's Forensic Laboratory.
19
Language Either French or English
Preferred start date 2020-05-04 (yyyy-mm-dd)
At first, the student takes knowledge of the works realized by our team. The student will become aware of
computer programs developed by us and participate in the development of new IT tools.
This research project aims at improving the computational efficiency of computer vision techniques, with
special focus on image an video recognition.Computer vision approaches based on deep learning
techniques such as convolutional and recurrent neural networks have shown impressive results in the past
year, sometimes even reaching human level performance. However, these approaches are very demanding
in terms of computation. They require specialized hardware such as GPUS and still, their training can last
several days and they cannot be executed real-time. Additionally, these deep learning techniques must be
deployed on mobile phones and embedded devices. For instance the internet of things expects every
device to be connected to the internet and to be intelligent. However, at the moment, due to hardware
limitations, these embedded devices cannot really run intelligent algorithms. The other option of sending
the data to the cloud it´s often not feasible, due to the need for low latency answers or privacy issues. In
this project we aim at reducing the computational cost of convolutional neural networks. The main idea of
the project is to use conditional computation. That is, instead of computing every-part of the network for
every input data, our approach will build a set of networks or branches that will specialize to a specific
recognition task and only some specific input will be evaluated by the specific network. This allows us to
save computation and, at the same time, to produce more performing models because we can increase the
capacity of the models without increasing their computational cost. For doing that we will use different
techniques such as attention mechanism and reinforcement learning. The proposed method will be
evaluated in terms of classification accuracy as well as computational cost. This approach can also be
extended to a full tree of specialized recognition branches.
The student will be developing and evaluating algorithms for the proposed project. For doing that, he/she
will be expected to read papers from related work to have a clear overview of the literature in the field.
He/she will have to learn how to use and share the computational resources available in my lab in order to
run experiments. Finally, he/she will be expected to write a report/article about the project and results. The
student will be hosted in my lab and surrounded by other MSc and PhD students working on similar topics.
The student will be under my supervision and he/she will be paired with a more experienced student of my
group that can help him/her to successfully accomplish the main objectives of the project. The student will
20
have the possibility to participate to the regular activities of the lab, such as reading groups, invited
presentations and other activities.
21