Research & Publication

Our research interests focus on the intersection of robotics, machine learning and machine vision. We are interested in developing algorithms for an adaptive perception system based on interactive environment exploration and open-ended learning, which enables robots to learn from past experiences and interact with human users. We have evaluated our works on different robotic platforms including PR2, robotic arms, and humanoid robots. Our up-to-date list of publications and corresponding BibTeX files can be found on this Google scholar account . In partiuclar, our research is summarized by the following projects:

Simultaneous Multi-View Object Grasping and Recognition in Open-Ended Domains

Most state-of-the-art approaches tackle object recognition and grasping as two separate problems while both use visual input. Such approaches are not suitable for task-informed grasping, where the robot should recognize a specific object first and then grasp and manipulate it to accomplish a task. In this work, we propose a multi-view deep learning approach to handle simultaneous object grasping and recognition in open-ended domains. In particular, our approach takes multi-view of the object as input and jointly estimates pixel-wise grasp configuration and a deep scale- and rotation-invariant representation. The obtained representation is then used for open-ended object category learning and recognition. Experimental results on benchmark datasets have shown that our approach outperforms state-of-the-art methods by a large margin in terms of grasping and recognition.

MVGrasp: Real-Time Multi-View 3D Object Grasping in Highly Cluttered Environments

Nowadays service robots are entering more and more in our daily life. In such a dynamic environment, a robot frequently faces pile, packed, or isolated objects. Therefore, it is necessary for the robot to know how to grasp and manipulate various objects in different situations to help humans in everyday tasks. Most state-of-the-art grasping approaches addressed four degrees-of-freedom (DoF) object grasping, where the robot is forced to grasp objects from above based on grasp synthesis of a given top-down scene. Although such approaches showed a very good performance in predefined industrial settings, they are not suitable for human-centric environments as the robot will not able to grasp a range of household objects robustly. In this work, we propose a multi-view deep learning approach to handle robust object grasping in human-centric domains. In particular, our approach takes a partial point cloud of a scene as an input, and then, generates multi-views of existing objects. The obtained views of each object are used to estimate pixel-wise grasp synthesis for each object.

Self-Imitation Learning by Planning

Imitation learning (IL) enables robots to acquire skills quickly by transferring expert knowledge, which is widely adopted in reinforcement learning (RL) to initialize exploration. However, in long-horizon motion planning tasks, a challenging problem in deploying IL and RL methods is how to generate and collect massive, broadly distributed data such that these methods can generalize effectively. In this work, we solve this problem using our proposed approach called self-imitation learning by planning (SILP), where demonstration data are collected automatically by planning on the visited states from the current policy. SILP is inspired by the observation that successfully visited states in the early reinforcement learning stage are collision-free nodes in the graph-search based motion planner, so we can plan and relabel robot’s own trials as demonstrations for policy learning. Due to these self-generated demonstrations, we relieve the human operator from the laborious data preparation process required by IL and RL methods in solving complex motion planning tasks.

3D_DEN: Open-ended 3D Object Recognition using Dynamically Expandable Networks

Service robots have to work independently and adapt to the dynamic changes in real-time. One important aspect in such scenarios is to continually learn to recognize newer object categories when they become available. This combines two main research problems namely continual learning and 3D object recognition. Most of the existing research approaches include the use of deep Convolutional Neural Networks (CNNs) focusing on image datasets. A modified approach might be needed for continually learning 3D object categories. A major concern in using CNNs is the problem of catastrophic forgetting when a model tries to learn a new task. Despite various proposed solutions to mitigate this problem, there still exist some downsides of such solutions, e.g., computational complexity, especially when learning substantial number of tasks. These downsides can pose major problems in robotic scenarios where real-time response plays an essential role. In this work, we propose a new deep transfer learning approach based on a dynamic architectural method to make robots capable of open-ended learning about new 3D object categories.

OrthographicNet: A Deep Transfer Learning Approach for 3D Object Recognition in Open-Ended Domains

We present OrthographicNet, a deep transfer learning based approach, for 3D object recognition in open-ended domains. In particular, OrthographicNet generates a rotation and scale invariant global feature for a given object, enabling to recognize the same or similar objects seen from different perspectives. Experimental results show that our approach yields significant improvements over the state-of-the-art approaches concerning scalability, memory usage and object recognition performance. Moreover, OrthographicNet demonstrates the capability of learning new categories from very few examples on-site. Regarding real-time performance, three real-world demonstrations validate the promising performance of the proposed architecture.

Combining Shape Features with Multiple Color Spaces in Open-Ended 3D Object Recognition

Considering the expansion of robot applications in more complex and dynamic environments, it is evident that it is not possible to pre-program all object categories and anticipate all exceptions in advance. Therefore, robots should have the functionality to learn about new object categories in an open-ended fashion while working in the environment. Towards this goal, we propose a deep transfer learning approach to generate a scale- and pose-invariant object representation by considering shape and texture information in multiple color spaces. The obtained global object representation is then fed to an instance-based object category learning and recognition, where a non-expert human user exists in the learning loop and can interactively guide the process of experience acquisition by teaching new object categories, or by correcting insufficient or erroneous categories. In this work, shape information encodes the common patterns of all categories, while texture information is used to describes the appearance of each instance in detail. Multiple color space combinations and network architectures are evaluated to find the most descriptive system.

Few-Shot Visual Grounding for Natural Human-Robot Interaction

Natural Human-Robot Interaction (HRI) is one of the key components for service robots to be able to work in human-centric environments. In such dynamic environments, the robot needs to understand the intention of the user to accomplish a task successfully. Towards addressing this point, we propose a software architecture that segments a target object from a crowded scene, indicated verbally by a human user. At the core of our system, we employ a multi-modal deep neural network for visual grounding. Unlike most grounding methods that tackle the challenge using pre-trained object detectors via a two-stepped process, we develop a single stage zero-shot model that is able to provide predictions in unseen data. We evaluate the performance of the proposed model on real RGB-D data collected from public scene datasets. Experimental results showed that the proposed model performs well in terms of accuracy and speed, while showcasing robustness to variation in the natural language input.

Investigating the Importance of Shape Features, Color Constancy, Color Spaces and Similarity Measures in Open-Ended 3D Object Recognition

Despite the recent success of state-of-the-art 3D object recognition approaches, service robots are frequently failed to recognize many objects in real human-centric environments. For these robots, object recognition is a challenging task due to the high demand for accurate and real-time response under changing and unpredictable environmental conditions. Most of the recent approaches use either the shape information only and ignore the role of color information or vice versa. Furthermore, they mainly utilize the Ln Minkowski family functions to measure the similarity of two object views, while there are various distance measures that are applicable to compare two object views. In this paper, we explore the importance of shape information, color constancy, color spaces, and various similarity measures in open-ended 3D object recognition.

Local-HDP: Interactive Open-Ended 3D Object Categorization

We introduce a non-parametric hierarchical Bayesian approach for open-ended 3D object categorization, named the Local Hierarchical Dirichlet Process (Local-HDP). This method allows an agent to learn independent topics for each category incrementally and to adapt to the environment in time. Hierarchical Bayesian approaches like Latent Dirichlet Allocation (LDA) can transform low-level features to high-level conceptual topics for 3D object categorization. However, the efficiency and accuracy of LDA-based approaches depend on the number of topics that is chosen manually. Moreover, fixing the number of topics for all categories can lead to overfitting or underfitting of the model. In contrast, the proposed Local-HDP can autonomously determine the number of topics for each category. Furthermore, an inference method is proposed that results in a fast posterior approximation.

*This research was done in collaboration with Hamed Ayoobi

The State of Service Robots: Current Bottlenecks in Object Perception and Manipulation

Nowadays, robots are able to recognize various objects, and quickly plan a collision-free trajectory to grasp a target object. While there are many successes, the robot should be painstakingly coded in advance to perform a set of predefined tasks. Besides, in most of the cases, there is a reliance on large amounts of training data. Therefore, these approaches are still too rigid for real-life applications in unstructured environments, where a significant portion of the environment is unknown and cannot be directly sensed or controlled. In this paper, we review advances in service robots from object perception to complex object manipulation and shed a light on the current challenges and bottlenecks.

Accelerating Reinforcement Learning for Reaching using Continuous Curriculum Learning

Reinforcement learning has shown great promise in the training of robot behavior due to the sequential decision making characteristics. However, the required enormous amount of interactive and informative training data provides the major stumbling block for progress. In this study, we focus on accelerating reinforcement learning (RL) training and improving the performance of multi-goal reaching tasks. Specifically, we propose a precision-based continuous curriculum learning (PCCL) method in which the requirements are gradually adjusted during the training process, instead of fixing the parameter in a static schedule. To this end, we explore various continuous curriculum strategies for controlling a training process. This approach is tested using a Universal Robot 5e in both simulation and real-world scenarios.

Look Further to Recognize Better: Learning Shared Topics and Category-Specific Dictionaries for Open-Ended 3D Object Recognition

In human-centric environments, fine-grained object categorization is as essential as basic-level object categorization. In this work, each object is represented using a set of general latent topics and category-specific dictionaries. The general topics encode the common patterns of all categories, while the category-specific dictionary describes the content of each category in details. We discovered both sets of general and specific representations in an unsupervised fashion and updated them incrementally using new object views.

Interactive Open-Ended Learning Approach for Recognizing 3D Object Category and Grasp Affordance Concurrently

This paper presents an interactive open-ended learning approach to recognize multiple objects and their grasp affordances concurrently. This is an important contribution in the field of service robots since no matter how extensive the training data used for batch learning, a robot might always be confronted with an unknown object when operating in human-centric environments. Our approach has two main branches. The first branch is related to open-ended 3D object category learning and recognition. The second branch is associated with learning and recognizing the configuration of grasps in a reasonable amount of time.

Learning to Grasp 3D Objects using Deep Residual U-Nets

In this study, we present a new deep learning approach to detect object affordances for a given 3D object. The method trains a Convolutional Neural Network (CNN) to learn a set of grasping features from RGB-D images. We named our approach Res-U-Net since the architecture of the network is designed based on U-Net structure and residual network-styled blocks. It devised to be robust and efficient to compute and use. A set of experiments has been performed to assess the performance of the proposed approach regarding grasp success rate on simulated robotic scenarios. Experiments validate the promising performance of the proposed architecture on ShapeNetCore dataset and simulated robot scenarios.

Coping with Context Change in Open-Ended Object Recognition without Explicit Context Information

To deploy a robot in a human-centric environment, it is important that the robot is able to continuously acquire and update object categories while working in the environment. Therefore, autonomous robots must have the ability to continuously execute learning and recognition in a concurrent or interleaved fashion. One of the main challenges in unconstrained human environments is to cope with the effects of context change. This paper presents two main contributions: (i) an approach for evaluating open-ended object category learning and recognition methods in multi-context scenarios; (ii) evaluation of different object category learning and recognition approaches regarding their ability to cope with the effects of context change. Off-line evaluation approaches such as cross-validation do not comply with the simultaneous nature of learning and recognition. A teaching protocol, supporting context change, was therefore designed and used in this work for experimental evaluation. Seven learning and recognition approaches were evaluated and compared using the protocol. The best performance, in terms of number of learned categories, was obtained with a recently proposed local variant of Latent Dirichlet Allocation (LDA), closely followed by a Bag-of-Words (BoW) approach. In terms of adaptability, i.e. coping with context change, the best result was obtained with BoW, immediately followed by the local LDA variant.

Perceiving, Learning, and Recognizing 3D Objects: An Approach to Cognitive Service Robots

This paper proposes a cognitive architecture designed to create a concurrent 3D object category learning and recognition in an interactive and open-ended manner. In particular, this cognitive architecture provides automatic perception capabilities that will allow robots to detect objects in highly crowded scenes and learn new object categories from the set of accumulated experiences in an incremental and open-ended way. Moreover, it supports constructing the full model of an unknown object in an on-line manner and predicting next best view for improving object detection and manipulation performance.

Active Multi-View 6D Object Pose Estimation and Camera Motion Planning in the Crowd

In this project, we developed a novel unsupervised Next-Best-View (NBV) prediction algorithm to improve object detection and manipulation performance. Particularly, the ability to predict the NBV point is important for mobile robots performing tasks in everyday environments. In active scenarios, whenever the robot fails to detect or manipulate objects from the current view point, it is able to predict the next best view position, goes there and captures a new scene to improve the knowledge of the environment. This may increase the object detection and manipulation performance.

Hierarchical Object Representation for OpenEnded Object Category Learning and Recognition (Local LDA)

This paper proposes an open-ended 3D object recognition system which concurrently learns both the object categories and the statistical features for encoding objects. In particular, we propose an extension of Latent Dirichlet Allocation to learn structural semantic features (i.e. topics), from low-level feature co-occurrences, for each category independently. Moreover, topics in each category are discovered in an unsupervised fashion and are updated incrementally using new object views. In this way, the advantage of both the local hand-crafted and the structural semantic features have been considered in an efficient way.

GOOD: A Global Orthographic Object Descriptor for 3D Object Recognition and Manipulation

The Global Orthographic Object Descriptor (GOOD) has been designed to be robust, descriptive and efficient to compute and use. GOOD descriptor has two outstanding characteristics: (1) Providing a good trade-off among: descriptiveness, robustness, computation time, memory usage; (2) Allowing concurrent object recognition and pose estimation for manipulation. The performance of the proposed object descriptor is compared with the main state-of-the-art descriptors. Experimental results show that the overall classification performance obtained with GOOD is comparable to the best performances obtained with the state-of-the-art descriptors. Concerning memory and computation time, GOOD clearly outperforms the other descriptors. The current implementation of GOOD descriptor supports several functionalities for 3D object recognition and object manipulation.

Towards Lifelong Assistive Robotics: A Tight Coupling between Object Perception and Manipulation

In this work, we propose a cognitive architecture designed to create a tight coupling between perception and manipulation for assistive robots. This is necessary for assistive robots, not only to perform manipulation tasks in a reasonable amount of time and in an appropriate manner, but also to robustly adapt to new environments by handling new objects. In particular, this cognitive architecture provides perception capabilities that will allow robots to, incrementally learn object categories from the set of accumulated experiences and reason about how to perform complex tasks.

Interactive Open-Ended Learning for 3D Object Recognition: An Approach and Experiments

This work presents an efficient approach capable of learning and recognizing object categories in an interactive and open-ended manner. In particular, we mainly focus on two state-of-the-art questions: (1) How to automatically detect, conceptualize and recognize objects in 3D scenes in an open-ended manner? (2) How to acquire and use high-level knowledge obtained from the interaction with human users, namely when they provide category labels, in order to improve the system performance?

Learning to Grasp Familiar Objects using Object View Recognition and Template Matching

In this work, interactive object view learning and recognition capabilities are integrated in the process of learning and recognizing grasps. The object view recognition module uses an interactive incremental learning approach to recognize object view labels. The grasp pose learning approach uses local and global visual features of a demonstrated grasp to learn a grasp template associated with the recognized object view. A grasp distance measure based on Mahalanobis distance is used in a grasp template matching approach to recognize an appropriate grasp pose.

Humanoid Robots (RoboCup-HL)

After obtaining extensive knowledge about real-time intelligent robotic systems in Middle-Size League, I tried to make humanoid robots and formed two new robotic teams namely Persia and BehRobot for participating in RoboCup humanoid leagues. We worked on three different types of humanoid robots including kid-size (height = 59cm, weight = 4kg), teen-size (height = 93cm, weight = 7Kg) and adult-size (height = 155cm, weight = 11:5Kg) robots. We were one of the successful teams in the humanoid leagues and achieved several ranks in national and international competitions.

Middle Size Soccer Robots (RoboCup-MSL)

During the second year of my undergraduate program, I got familiar with RoboCup competitions. I formed a team of Middle Size Soccer Robots (RoboCup-MSL) namely ADRO in 2006. We provided five player robots and one goalkeeper robot with similar structure but equipped with some additional accessories and sensors. Through this teamwork, I took an active role in the development of the robots’ software. Furthermore, I worked on the mechanical design of the robot via Autodesk Inventor. We achieved several ranks in national and international RoboCup competitions.


Dr. Hamidreza Kasaei
Artificial Intelligence Department,
University of Groningen,
Bernoulliborg building,
Nijenborgh 9 9747 AG Groningen,
The Netherlands.
Office: 340
Tel: +31-50-363-33926