Research

RESEARCH GRANTS

I have received several research grants as listed below.
• Thermal Aware 3-Dimensional Multi-Core Systems Design, Singapore MoE Tier 2 Grant, Principal Investigator, S$571,000, 11/2014~10/2017
• Development of Real-Time FPGA-Based Brain Computer Interface Applications, Singapore MoE ARC Grant, Principal Investigator, S$179,900, 03/2014~02/2017
• Fault-tolerant Multi-processor Systems for nano-satellites, Singapore DSO, Principal Investigator, S$429,600, 05/2012~05/2015
• Partially Reconfigurable Heterogeneous Multi-processor System-on-Chip, Singapore MoE ARC Grant, Principal Investigator, S$179,990, 08/2011~07/2014
• A Power-Efficient Heterogeneous Architecture and Run-Time Manager for Data Center Servers, Singapore A*Star Grant, Co-Investigator, S$496,555, 08/2011~04/2013
• Secured Large Scale Shared Storage System, Singapore A*Star Grant, Co-Investigator, S$1,815,515, 08/2011~10/2014

ON-GOING RESEARCH PROJECTS

Thermal Aware 3-Dimensional Multi-Core Systems Design

Multi-core systems are increasingly used to meet the performance constraints of high-performance computing (HPC) applications. However, most of these systems are implemented in a 2D IC, which seems to become complex, inefficient and uneconomic with the advancement in the process technology due to inefficient scalability of interconnects with respect to the logic. A viable alternative to cater for such limitations of interconnects is a 3D IC, where multiple layers of logic can be stacked vertically and they can be connected by high speed small vertical interconnects. Integration of various types of cores in a 3D IC provides several advantages, but at the cost of increased power density within the chip, which results in serious thermal problems, affecting performance and reliability of the system. While there are few works that target to mitigate the thermal issues in 3D multi-core architectures, very few target real-time applications with strict timing deadlines. In this project, we will perform investigations to identify a promising 3D multi-core architecture to support a set of real-time applications and to devise techniques for thermal-aware mapping of different simultaneous active applications while guaranteeing their performance (throughput) constraints. The 3D architecture is expected to integrate different types of cores such as general purpose processors and accelerators to achieve a promising architecture. The real-time applications to be mapped on the architecture will be streaming multimedia applications (e.g., MPEG-4 decoder, JPEG decoder, MP3 decoder) that are omnipresent in modern electronic systems such as smart phones, tablets, PDAs, etc.

Development of Real-Time FPGA-Based Brain Computer Interface Applications

A Brain-computer interface (BCI) is a system capable of utilizing the brain’s electrical signals for direct communication with a computer system, without reliance on the usual neuromuscular pathways. In a BCI system, the control signals to a computer or a prosthetic device is derived from the electrical activity of the brain (EEG). These signals are then subsequently amplified, digitized and sent to a computing system which establishes the symbol or character thought of by the user. In this project, we plan to explore efficient implementations of various BCI applications on FPGAs. The example BCI applications are SSVEP BCI Multimedia Control System, P300 Brain Computer Interface Speller Application, P300 Wave based Person Identi?cation, etc. Implementing such applications on FPGAs will provide high performance towards fulfilling the end user requirements. The main goal of the project will be to investigate various implementation possibilities and then to identify the best alternative. The best implementation alternative may contain different types of processing elements within the FPGA in order to perform different kinds of required computations efficiently. The required area and power of the implementation will be optimized while trying to achieve high performance.

Fault-tolerant Multi-processor Systems for nano-satellites

The main objective is to develop fault tolerant multi-processor architecture for nano-satellites and map the final design onto FPGA. Nano-satellites usually require innovative communication and computation systems due to a number of reasons. First of all, phenomena that are especially common in space are single event effects (SEEs). SEEs mostly affect digital devices. When a high-energy particle travels through a semiconductor, it leaves an ionized trail behind. This can often cause a glitch in the output that can lead to corrupt values. Two of the most common SEEs are Single Event Upsets (SEUs) and Single Event Latchup (SEL). While the former is a transient effect the latter may be permanent and render the chip unusable. One of the most common approaches taken to circumvent the problem is that of hardening. However, this approach is often very expensive. (A typical hardened processor like Sparc V8 costs about $20,000.) Further, procurement delay causes significant delays in the project development. In this project, a scalable and cheaper way of devising fault-tolerant systems will be developed that are able to withstand transient faults. The developed system will be mapped onto an FPGA platform.

COMPLETED RESEARCH PROJECTS

Partially Reconfigurable Heterogeneous Multi-processor System-on-Chip

In this project, a partially reconfigurable heterogeneous multi-processor system-on-chip (MPSoC) architecture will be developed that can be customized at run-time depending on the application requirements. Such architecture will bring together the benefits of both worlds - FPGA fabric and heterogeneous MPSoC. A tile-based heterogeneous MPSoC platform will be used where some tiles consist of pure reconfigurable fabric which can be configured at run-time. While most of the tasks of an application can be performed by different types of processors available in the MPSoC, some of the tasks which take a long time in software can be speeded by configuring the programmable tiles appropriately. Another benefit which comes with such architecture is that of fault-tolerance. If some processor tiles in the MPSoC platforms are rendered unusable at any point in time, the reconfigurable tiles can be configured to replace the functionality of the existing processor.

A short video describing the features of the developed architecture can be found at this link.

A Power-Efficient Heterogeneous Architecture and Run-Time Manager for Data Center Servers

The idea of this project is to develop heterogeneous multi-processor architecture targeted towards servers in data centers. Our target architecture will have processors of different performance-power trade-offs including some processors operating at lower voltage. In addition to that it will have some reconfigurable logic, some application specific processors and some ASIC components in it. Having heterogeneous architecture would allow processors of different granularities to be integrated in one platform and give the middleware/operating system the opportunity to match the right processor or computing element for the right pay load job. For example, for a bit level encryption/decryption payload, it is extremely inefficient to execute them on a word level homogenous processor, but it can be extremely efficient to execute them on a fine-grain bit level FPGA platform in terms of power and timing. Heterogeneous server architecture makes this perfect matching possible, thus greatly saves the server computing energy as it is much more efficient.

Secured Large Scale Shared Storage System

The main objective is to develop fault tolerant architecture for scalable storage security and map the final design onto FPGA. A data-center requires high-security standards with reliability. As transistor size is shrinking, yield has become a major problem. Falling yield calls for fault-tolerant mechanisms to ensure that high-performance can be achieved even in cases when a part of the hardware has succumbed to errors. The work will involve a number of tasks. First of all, a thorough analysis of the security algorithms that are to be implemented will be carried out to understand the data flow and various performance bottlenecks of the system. A suitable architecture (possibly multi-core with accelerators) will be determined to accommodate the required performance. Architecture designed will be scalable to accommodate the future demands of the data-center.

GENERAL RESEARCH THEME

The focus of my research is to design predictable multi-processor systems - predictable in terms of both the architecture and the applications. Multiprocessor systems-on-chip (MPSoCs) have been proposed as a solution to rising power and area of modern embedded systems. These systems are becoming increasingly heterogeneous with use of dedicated IP blocks and application domain specific processors. To achieve high performance in such systems, the limited computational resources must be shared. The concurrent execution of dynamic applications on shared resources is a potential source of interference. Modelling and analyzing this interference is a key to building cost-effective systems which can deliver the desired performance of the applications. However, the following four main issues remain that have been the focus of my research over the last five years.

• Design such platforms from a high-level description automatically such that they consume less time and are not prone to error; contrary to existing manual techniques.
• Program these heterogeneous multiprocessor platforms such that the functional requirement is met.
• Design a run-time resource manager analogous to an operating system for such heterogeneous MPSoCs.
• Analyze the behaviour of multiple applications that share resources and thereby suffer from resource contention.

Some key achievements during this research are listed below:

• A detailed analysis of the problems encountered when mapping multiple applications on an MPSoC platform was carried out, and measures provided to overcome these problems. The findings and solutions are published in DSD 2006, and later in Journal of System Architecture 2008.

• A resource manager was developed for admission control and budget enforcement to ensure all applications can meet their performance requirements even when sharing platforms with other concurrent applications. The results are published in ESTIMEDIA 2006. The work is now being implemented on FPGA.

• The first ever working prototype of Silicon Hive VLIW cores integrated with Æthereal network-on-chip (developed by Philips Research) on an FPGA was created. This integrated multiprocessor platform is now used for a Masters course at TUe. The results are published in DATE 2007.

• A probabilistic technique was developed to model resource contention when multiple applications are executing simultaneously on a multi-processor platform. The results are published in proceedings of DAC 2007. The technique was later refined and published in TCAD 2010.

• A flow to generate an MPSoC platform was designed - both hardware and software, directly from a very high-level (SDF) description of multiple applications. This is first design-flow that allows such automated synthesis. This flow MAMPS was released in 2007, and the details are described in proceedings of FPL 2007, and in ToDAES 2008 with support for multiple use-cases.MAMPS is now being used by 8 universities. Various enhancements have been done by many students since.

TOOLS FROM RESEARCH PROJECTS

1. PRGEN

This tool has been developed as a part of the research done in our group. It allows generation of Partially Reconfigurable Heterogeneous Multiprocessor platform. More details about the architecture can be found in the following publication. Download

PR-HMPSoC: a Versatile Partially Reconfigurable Heterogeneous Multiprocessor System-on-Chip for Dynamic FPGA-based Embedded Systems (PDF ~617kB)

Tuan D. A. Nguyen and Akash Kumar
In: Proceedings of International Conference on Field Programmable Logic and Applications (FPL), 2-4 Sep 2014
Munich, Germany. IEEE.

2. MAMPS

This tool allows users to generate a multiprocessor platform for multiple applications. Details can be found in the following publications. Online link

Multi-processor Systems Synthesis for Multiple Use-Cases of Multiple Applications on FPGA (PDF ~663kB)

Akash Kumar, Shakith Fernando, Yajun Ha, Bart Mesman and Henk Corporaal
In: ACM Transactions on Design Automation of Electronic Systems. Vol 13, Issue 3, July 2008, pp. 1-27, ISSN:1084-4309.

Multi-processor System-level Synthesis for Multiple Applications on Platform FPGA (PDF ~157 kB)

Akash Kumar, Shakith Fernando, Yajun Ha, Bart Mesman, and Henk Corporaal
Proceedings of Field Programmable Logic (FPL) Conference, Aug 2007, pp. 92-97. ISBN: 1-4244-1060-6
Amsterdam, The Netherlands, 2007. IEEE Circuit and Systems Society.

3. NoC-Gen

This tool allows users to generate a SDM-based NoC. More details can be found in the following publication. Online link

An Area-efficient Dynamically Reconfigurable Spatial Division Multiplexing Network-on-Chip with Static Throughput Guarantee (PDF ~543kB)

Zhiyao Joseph Yang, Akash Kumar and Yajun Ha.
In: Proceedings of the International Conference on Field-Programmable Technology, 8-10 Dec 2010.
Beijing, 2010. IEEE.