Algorithms, Design Methods, and Many-core Execution Platform for Low-Power Massive Data-Rate Video and Image Processing

Artemis 2013 GA 621439


Societal Impact

Technical Innovation


Societal Impact

The project provides the core of solutions for the big societal challenges like affordable healthcare and wellbeing, green and safe transportation, and reduced consumption of power.

1. Enable cross-domain re-use and interoperability for different product categories and application domains, thus promoting cross-fertilization and reuse of technology results.

2. Facilitate predictable system and product properties, and robust solutions.

3. Develop joint hardware-software techniques for resource and power management, yet providing massive data-rate processing and supporting interoperability over cross-domain platforms.

Technical Innovation

Advanced image and video processing systems are becoming a crucial and resource consuming part of embedded applications in many sectors. ALMARVI aims to facilitate the transition from a vertically structured market to a horizontally structured market. In particular, it focuses on reducing overall system design cost and time-to-market and enabling low cost solutions for high volume markets in different industrial domains and creating new market opportunities, and supporting SMEs.

The demonstrators developed under this project for the healthcare, security/surveillance/monitoring, and mobile use cases will directly lead to marketable applications and products in their relevant domains. Integrated releases of the image/video processing algorithm libraries, reference design tools and platforms, and system software stack solutions will be made available along with their evaluation for the demonstrated use cases. Cross-domain applicability will reduce fragmentation, thus increasing the market share of European supplier industry.


1. Reduce the cost of the system design 20% - 30% through modularity, flexible interfacing, adaptive architecture, execution platform with well-developed tool chains, adaptability and run-time configurability.

2. Reduce in development cycles  25% - up to 35% through seamless scalability and integration of hardware and software components and cross-domain component reuse, cross-domain system software stack, design tools, understanding of relevant  system layers

3. Manage a complexity increase with 30% -60% effort reduction through novel algorithms, architecture, design tools, execution platforms, and system software stack with run-time adaptive resource and power management techniques

4. Reduce effort and time for re-validation and re-certification  15% - 20% through incremental design, develop, test, integrate, validate cycles.

5. Cross-sectoral re-usability of Embedded Systems 20% - 50% through system architecture accounting for the common requirements of different sectors and application domains.

The key is to leverage the properties of image/video content while jointly adapting algorithms and hardware in order to achieve a much higher potential for power savings and to enable massive data rate processing.  At the Application Layer, the goal is to adapt algorithms towards the architectures. At the System Software Stack Layer, the adaptive run-time system allocates resources to different applications executing simultaneously in an energy-efficient way. At the Hardware Layer, the ALMARVI’s many-core execution platform provides the compute capabilities to diverse image/video processing applications.

Work Packages
Start time: 01.04.2014         Duration: 36 Months         Budget:   EUR 8.789 M
Countries involved:
Netherlands                 Turkey                  Czech Republic               Finland


About ALMARVI project
  1. M. Koskela, T. Viitanen, P. Jääskeläinen, and J. Takala, “Half-Precision Floating-Point Ray Traversal,” in Proc. Joint Conf. Comput. Vision Imaging Comput. Graphics Theory Appl., Rome, Italy, 2016.
  2. M. Hendriks, J. Verriet, T. Basten, B. Theelen, M. Brassé, and L. Somers, “Analyzing execution traces critical-path analysis and distance analysis”, Accepted for publication in Springer International Journal on Software Tools for Technology Transfer, 2016.
  3. Šroubek Filip, Kamenický Jan, Lu Y. M. “Decomposition of Space-Variant Blur in Image Deconvolution”  IEEE Signal Processing Letters vol.23, 3,  pp. 346-350, 2016.
  4. Hadi Alizadeh Ara, Marc Geilen, Twan Basten, Amir Behrouzian, Martijn Hendriks and Dip Goswami, “Tight Temporal bounds for dataflow applications mapped onto shared resources”, Accepted for publication and presentation at the proceeding of the 11th IEEE International Symposium on Industrial Embedded Systems 23-25 May 2016.
  5. Amir Behrouzian, Dip Goswami, Marc Geilen, Martijn Hendriks, Hadi Alizadeh Ara, Eelco Horssen, Maurice Heemels and Twan Basten, “Sample-Drop Firmness Analysis of TDMA-Scheduled Control Applications”, Accepted for publication and presentation at the proceeding of the 11th IEEE International Symposium on Industrial Embedded Systems 23-25 May 2016.
  6. A. A. C. Brandon, J. J. Hoozemans, J. Van Straten, A. F Lorenzon, A. L. Sartor, A. C. S. Beck, S. Wong, “A Sparse VLIW Instruction Encoding Scheme Compatible with Generic Binaries” in Proc. International Conference on ReConFigurable Computing and FPGAs (ReConFig), Mayan Riviera, Mexico, 2015.
  7. J. J. Hoozemans, J.  Johansen, J. Van Straten, A. A. C. Brandon, S. Wong, “Multiple Contexts in a Multi-ported VLIW Register File Implementation” in Proc. International Conference on ReConFigurable Computing and FPGAs (ReConFig), Mayan Riviera, Mexico, 2015.
  8. T. Äijö, P. Jääskeläinen, T. Elomaa, H. Kultala, and J. Takala, “Integer Linear Programming Based Scheduling for Transport Triggered Architecture,” ACM Trans. Architecture and Code Optimization, Vol. 12, Issue 4, pp. 59:1-59:22, 2015.
  9. P. Jääskeläinen, C.S. de La Lama, E. Schnetter, K. Raiskila, J. Takala and H. Berg: “pocl: A Performance-Portable OpenCL Implementation,” Int. J. Parallel Programming, Vol. 43, Issue 5, pp. 752 – 785, 2015.
  10. H. Yviquel, A. Sanchez, P. Jääskeläinen, J. Takala, and M. Raulet, “Embedded Multi-Core Systems Dedicated to Dynamic Dataflow Programs,” J. Signal Processing Systems, Vol. 80, Issue 1, pp. 121 – 136, 2015.
  11. P. Jääskeläinen, H. Kultala, T. Viitanen, and J. Takala, “Code Density and Energy Efficiency of Exposed Datapath Architectures,” J. Signal Processing Systems, Vol. 80, Issue 1, pp. 49-64, 2015, doi:
  12. V. Korhonen, P. Jääskeläinen, M. Koskela, T. Viitanen, and J. Takala, “Rapid Customization of Image Processors Using Halide,” in Proc. IEEE Global Conf. Signal Inf. Process., Orlando, FL, USA, 2015.
  13. J. Glossner, P. Blinzer, and J. Takala, “HSA-Enabled DSPs and Accelerators,” in Proc. IEEE Global Conf. Signal Inf. Process., Orlando, FL, USA, 2015.
  14. T. Viitanen, M. Koskela, P. Jääskeläinen, H. Kultala, and J. Takala, “MergeTree: A HLBVH Constructor for Mobile Systems,” in ACM SIGGRAPH Asia, Kobe, Japan, 2015.
  15. H. Kultala, J. Multanen, P. Jääskeläinen, and J. Takala, “Impact of Operand Sharing to the Processor Energy Efficiency,” in Proc. CSI Int. Symp. Comput. Arch. & Digital Syst., Tehran, Iran, 2015.
  16. M. Koskela, T. Viitanen, P. Jääskeläinen, J. Takala, and K. Cameron, “Using Half Floating-Point Numbers for Storing Bounding Volume Hierarchies,” in Computer Graphics International Conference, Strasbourg, France, 2015.
  17. J. Kotera, F. Sroubek and B. Zitova, "PSF accuracy measure for evaluation of blur estimation algorithms", in International Conference on Image Processing (ICIP), Canada, 2015, (accepted for publication)
  18. I. Szentandrási, M. Zachariáš, J. Tinka, M. Dubská, J. Sochor and A. Herout, “INCAST”, in International Symposium on Mixed and Augmented Reality (ISMAR), Fukuoka, Japan, 2015
  19. M. J. Turnquist, M. Hiienkari, J. Makipaa and L. Koskinen, “A Fully Integrated Self-Oscillating Switched-Capacitor DC-DC Converter for Near-Threshold Loads” in Asian Solid-State Circuits Conference (A-SSCC), 2015 (accepted for publication)
  20. M. Hradiš, J. Kotera, P. Zemčík and F. Šroubek, “Convolutional Neural Networks for Direct Text Deblurring”, in Proceedings of The British Machine Vision Association and Society for Pattern Recognition BMVC, Swansea, UK, 2015
  21. T. Viitanen, H. Kultala, P. Jääskeläinen and J. Takala,"Heuristics for greedy transport triggered architecture interconnect exploration", In International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES), India, 2014
  22. I. Pöllänen, B. Braithwaite, T. Ikonen, H. Niska, K. Haataja, P. Toivanen, and T. Tolonen, “Computer-Aided Breast Cancer Histopathological Diagnosis – Comparative Analysis of three DTOCS-based Features: SWDTOCS, SW-WDTOCS, and SW-3-4-DTOCS”, In 4th International Conference on Image Processing Theory, Tools, and Applications (IPTA), France, 2014
  23. H. Kultala, T. Viitanen, P. Jääskeläinen, J. Helkala, J. Takala, "Compiler optimizations for code density of variable length instructions", In IEEE Workshop on Signal Processing Systems (SiPS), Ireland, 2014
  24. T. Ikonen, H. Niska, B. Braithwaite, I. Pöllänen, K. Haataja, P. Toivanen, J. Isola, and T. Tolonen, “Computer-Assisted Image Analysis of Histopathological Breast Cancer Images Using Step-DTOCS”, In 14th International Conference on Hybrid Intelligent Systems (HIS), Kuwait, 2014
  25. D. Goswami, D. Müller-Gritschneder, T. Basten, U. Schlichtmann and S. Chakraborty, “Fault-tolerant Embedded Control Systems for Unreliable Hardware,” In International Symposium on Integrated Circuits (ISIC), Singapore, 2014
  26. B. Braithwaite, H. Niska, I. Pöllänen, T. Ikonen, K. Haataja, P. Toivanen, and T. Tolonen, “Optimized Curve Design for Image Analysis Using Localized Geodesic Distance Transformations”, In IS&T SPIE Electronic Imaging, California, USA, 2015
  27. I. Zliobaite, J. Hollmén, J. Teittinen and L. Koskinen, “Towards hardware-driven design of low-energy algorithms for data analysis” in ACM SIGMOD Record archive, 2014
  28. K. van Gend, “Cut Power Consumption by 5x Without Losing Performance”, in LinuxCon, Düsseldorf, Germany, 2014
   SAMOS XV, 2015 Special session on “Mid-Term Results of the ALMARVI ARTEMIS project”  organized by J. Takala and Z. Al-Ars includes the following publications:
  • “Multi-Constraint Multi-Processor Resource Allocation” by A. R. B. Behrouzian, D. Goswami, T. Basten, M. Geilen and H. Alizadeh Ara (TUE)
  • “GPU Implementation of an Anisotropic Huber-L1 Dense Optical Flow Algorithm Using OpenCL” by D. Buyukaydin and T. Akgun (ASEL)
  • “Using VLIW Softcore Processors for Image Processing Applications” by J. Hoozemans, S. Wong and Z. Al-Ars (TUD)
  • “Power Optimizations for Transport Triggered SIMD Processors” by J. Multanen, T. Viitanen, H. Linjamäki, H. Kultala, P. Jääskeläinen, J. Takala, L. Koskinen, J. Simonsson, H. Berg, K. Raiskila and T. Zetterman (Multi-partner collaboration: TUT, UTU, NOK)
  • “Current Analysis Approaches and Performance Needs for Whole Slide Image Processing in Breast Cancer Diagnostics” by I. Pöllänen, B. Braithwaite, K. Haataja, T. Ikonen and P. Toivanen (UEF)
  • “Performance evaluation of image noise reduction computing on a mobile platform” by J. Hannuksela, M. Niskanen and M. Turtinen (VIS)
  • “Video Chain Demonstrator on Xilinx Kintex7 FPGA with EdkDSP Floating Point Accelerators” by J. Kadlec (UTIA)
Contact us


2014 ALMARVI. All Rights Reserved