NAVE: An Autoencoder-based Video Quality Metric

NAVE is an autoencoder based video quality model

This month our latest research will be presented at the 2019 IEEE International Conference on Image Processing (ICIP) in Taipei, Taiwan. This is the premier event for image and video processing featuring international researchers and experts in this field.

We will present a Video Quality Metric that uses a deep autoencoder to train a model for video quality prediction. The presentation will be on Tuesday, 24 September at noon as part of a wider session on Novel Approaches for Image & Video Quality Assessment.

My paper, along with research by accepted authors, is currently available as a free download on IEEE Xplore until 25 September 2019. I hope you’ll download it and share any insights, feedback, and questions with me.


3D Sound and Vision with the Occulus Quest

Matthew Parker from Texas Tech University spent the summer at QxLab hosted by Insight and the school of Computer Science. Over eight weeks, he developed a virtual reality environment using the newest generation headset, the Oculus Quest wireless VR headset to explore audio-visual fusion.

Matthew presenting his poster titled “Perception Deception: Exploring Audio Localization in Virtual Reality Using The McGurk Effect” at the UCD Science poster session.

The audio and video immersion provided by virtual reality headsets is what makes virtual reality so enticing. Better understanding of this audio and video immersion is needed particularly in streaming applications where bandwidth is limited. The full 360 degree spaces that are the trademark of virtual reality require copious amounts of data to produce at a high quality. His work examined audio localization as a possible candidate for data compression by utilizing an audio-video phenomenon known as the McGurk Effect. The McGurk Effect occurs when mismatched audio and video stimuli are experienced by someone and the resulting perception of the sound is different than either of the stimuli. The common example for this phenomenon is a video of a talker saying /ga/ dubbed with the audio of someone saying /ba/. This usually results in the perception of /da/.

He used Ambisonics for audio, a type of audio that can be used over headphones to mimic how humans naturally hear sounds using head related transfer functions and acoustic environments.


Seeking words of Wisdom

QxLab took a couple of days away from our desks for a co-located writing workshop. Basing ourselves in one of UCD’s new University Club meeting rooms, we spent two days working on our research writing.  A starter session reflected on writing style and discussed where to publish. We practiced with some free writing and structured writing exercises and reviewed Brown’s 8 questions as a set of prompts.

  1. Who are intended readers? (3-5 names)
  2. What did you do? (50 words)
  3. Why did you do it? (50 words)
  4. What happened? (50 words)
  5. What do results mean in theory? (50 words)
  6. What do results mean in practice? (50 words)
  7. What is the key benefit for readers (25 words)
  8. What remains unresolved? (no word limit)

These questions, originally devised by Robert Brown but popularized by Rowena Murray, are a great way to get a writing retreat going. The rest of the sessions were spent progressing our writing towards our personal writing objectives – a bit like a natural language “Hackathon”.

For the final session we chose a short piece of own own writing and shared them for a non-judgemental peer review session where the author could choose the scope for their own feedback. A lot of the feedback followed common themes as we fell into similar traps with our writing.

A recent twitter thread offered a lot of advice we could relate to and a key take home message was to remember was that when you read published papers you only see the finished article. The papers you read have been through countless iterations and review feedback sessions from co-authors, reviewers and copy editors. Don’t compare your first draft to a published paper!


Useful references

Murray, R (2005) Writing for Academic Journals. Maidenhead: Open University Press-McGraw-Hill.

Murray, R & Moore, S (2006) The handbook of academic writing: A fresh approach. Maidenhead: Open University Press-McGraw-Hill.

Fusion Confusion in Massachusetts

Today QxLab’s Dr Abubakr Siddig presented collaborative work on immersive multimedia. As part of the ACM MMSys conference in University of Massachusetts Amherst Campus, the International Workshop on IMmersive Mixed and Virtual Environment Systems, MMVE 2019 is celebrating its 11th edition.

The paper, Fusion Confusion: Exploring Ambisonic Spatial Localisation for Audio-Visual Immersion Using the McGurk Effect, looked at the relationship between visual cues and spatial localisation for speech sounds.

AbuBakr rehearsing his presentation at the QxLab group meeting

The paper found that the McGurk Effect, where visual cues for sounds override what you hear, occurs for spatial audio but is not sensitive to whether the speech sound is aligned in space with the lips of the speaker.

The research, carried out by QxLab’s UCD based researchers and funded by two SFI centre’s CONNECT and INSIGHT.

Well done to AbuBakr, the presentation and demo were well received by the workshop attendees.

Demo of the fusion confusion at MMVE.

QxLab at ISSC 2019

QxLab has two papers at the Irish Signals and Systems Conference in Maynooth University today.  MSc student, Tong Mo presented work of speech Quality of Experience. Her research investigated how computer models for speech quality prediction in systems such as Skype or Google Hangouts. She developed an algorithm to minimise errors in the presence of jitter buffers.







A second paper was presented by PhD candidate Hamed Jahromi entitled, “Establishing Waiting Time Thresholds in Interactive Web Mapping Applications for Network QoE Management.” Hamed’s work looked at the perception of time in web applications. Is an additional delay of half a second noticeable if you have already waited 5 seconds for a Google Map page to load? Time is not absolute and Hamed wants to understand the impact of delays on web applications in order to optimise network resources for interactive applications other than speech and video streaming. This work was co-authored with Delcan T. Delaney from UCD Engineering and Brendan Rooney from UCD Psychology.

This research was sponsored by UCD School of Computer Science and the SFI CONNECT Centre for Future Networks.










Today is World Archive Day

Today is the UNESCO World Archives Day, highlighting the important work of archives and archivists in preserving our cultural heritage. The date was chosen to commemorate the creation of the International Council on Archives (ICA) founded on 9th of June 1948 under the auspices of the UNESCO. According to the ICA, “[a]rchives represent an unparalleled wealth. They are the documentary product of human activity and as such constitute irreplaceable testimonies of past events. They ensure the democratic functioning of societies, the identity of individuals and communities and the defense of human rights.”

Following quickly after the 6th June D-Day commemorations, today is a good day to highlight the important work that has been taking place to digitise and preserve the audio archives of the Nuremberg trials. Witnesses, lawyers and judges were recorded in their native tongues together with recordings of the live translations.  This resulted in 775 hours of original trial audio recorded on 1,942 Presto gramophone discs and translations on Embossed tape, a clear-colored film also known as Amertape. While the tape degraded, the discs survived. The digitisation will be published next year but the fascinating story of  was recently published by the Verge and PRI articles by Christopher Harland-Dunaway. University of Fribourg’s Ottar Johnsen worked with Stefano Cavaglieri, a colleague at the Swiss National Sound Archives and the International Court of Justices archivists using imaging and audio digital signal processing to capture the archive material. You can listen to it here:

Last week, at the 11th International Conference on Quality of Multimedia Experience (QoMEX), QxLab PhD student Alessandro Ragano presented our work on how audio archive stakeholders perceive quality in archive material. By examining the lifecycle from digitisation through restoration and consumption, the influence factors and stakeholders are highlighted. At QxLab we are interested in how audio digital signal processing techniques can be used in conjunction with data driven machine learning to capture, enhance and explore audio archives.

Alessandro’s research is supported in part by a research grant from Science Foundation Ireland (SFI) and is co-founded under the European Regional Development Fund under Grant No. 17/RC-PhD/3483. This publication has emanated from research supported by Insight which is supported by SFI under grant number 12/RC/2289. EB is supported by RAEng Research Fellowship RF/128 and a Turing Fellowship.




PhD opportunities in Machine Learning and Digital Content QoE

QxLab is participating in two Science Foundation Ireland centre’s for research training. These centres will recruit cohorts of students in innovative, industry partnered, research training programmes.

If you are interested in machine learning for multimedia quality of experience or health applications for quality of life, apply via to the ML-Labs Centre. If you are interested in speech and audio applications for Augmented or Virtual Reality take a look at the D-Real Centre.

The ML-Labs and D-Real centre’s for research training are recruiting now with the first cohorts starting in September 2019.

This is the biggest single funding scheme for a cohort focused PhD training programme in Ireland with an investment by SFI of €100 million and QxLab is part of two of the 5 training centres.

Media Coverage: Silicon Republic | Irish Tech News | Business World


AES Ireland Section AGM at UCD

The first AES Ireland Section meeting will take place in room B1.09 in the Computer Science building at UCD on Friday, February 15th. This meeting will begin with a lecture at 17:00 by Dr. Andrew Hines (details below) followed by the first AGM at 18:00.

If anyone would like to put themselves forward for election to any of the committee roles, please contact Enda Bates ( Please note: you must be an AES member to qualify for these roles but non-members are welcome to attend.

Speaker: Dr Andrew Hines, Assistant Professor, School of Computer Science, University College Dublin (

Title: Quality Assessment for Compressed Ambisonic Audio


Spatial audio with a high degree of sound quality and accurate localization is essential for creating a sense of immersion in a virtual environment. VR content creators can use spatial audio to attract the audiences attention in relation to their story or to guide the audience through a narrative in VR, relying on hearing something to focus our attention before we see it. Delivering spatial audio over networks requires efficient encoding techniques that could compress the raw audio content without compromising quality. Lossy compression schemes such as Opus typically reduce the amount of data to be sent by discarding some information. This discarded information can be important for ambisonic spatial audio in regards to listening quality or localization accuracy. Streaming service providers such as YouTube typically transcode uploaded content into various bit rates and need a perceptually relevant audio quality metric to monitor users’ perceived quality and spatial localization accuracy. This talk will present subjective listening test experiments that explore the effect of Opus codec compression on the quality as perceived by listeners. It will also introduce a full reference objective spatial audio quality metric, AMBIQUAL, which derives both listening quality and localization accuracy metrics directly from the spatial audio B-format ambisonic audio.

What do you tell a room full of PhD students?

When I was asked to give the talk I went through many of the stages of the “PhD roller-coaster” compressed into several hours. I accepted the request without reflection (other than a “sure, that needs no preparations…”) and then panicked that my PhD experiences were stale and possibly no-longer relevant. Then I reflected that I wasn’t being asked to advice through the lens of a student, the actual question was what advice could I offer as someone who as experienced both sides of the student-advisor relationship. Getting the research question right was an important first step. Next I read a few other blogs, papers and tweets. There is already a large body of work in the area of PhD advice so I decided to skip the exhaustive literature review and to provide a case study style approach focusing purely on my own experience.

Having “mastered the topic” (or at least as much as I was going to!), I scribbled a few notes on areas I thought I might want to cover: literature review, self-management, research network building, developing your identity as a researcher. I then wondered about how to present it. I considered what might make it engaging – neat slides, video examples – and decided that I deliver the advice without aids as an example of how if the content of your talk is of interest to the audience, they will remain engaged even if they have nothing more interesting to look at than the speaker themselves. In order to tie it together (and to help me remember what I planned to say) I decided to present it in the format of twelve tips. If you are interested in reading them, they recently got posted on the school blog.



It was a lot of Hot Air compared to Quantum Computers

I was back at the RDS in Dublin visiting the BT Young Scientist and Technology Exhibition. Beginning in 1963, the exhibition concept was created by to UCD academics from the School of Physics. Fast forward to 30 years ago and I participated for the first of two visits. Arriving in again to visit thirty years later, I was struck by the  professional finish of posters. So much has improved, but I still love the hand made stands and eye catching props to lure you into a project. As you can see from the newspaper clipping, our project may have involved a lot of hot air but I recall there was some scientific rigour to our methodology!

I met 2019 winner of the BT Young Scientist and Technology Exhibition, Adam Kelly, while judging the national finals of SciFest 2018  where he also won first prize. As a judge I was struck by his demonstration of all the attributes of a quality scientist: imagination, methodology and a great ability to communicate the work. He knew what he had done and was able to explain what he had not done, and why. Adam’s project for SciFest was entitled ‘An Open Source Solution to Simulating Quantum Computers Using Hardware Acceleration’ and was the overall winner out of more than 10,000 students competed in the regional heats to progress to the national SciFest 2018 final.

Adam Kelly (Photo: Irish Times)

The event is an inspiring way to start the year: seeing the curiosity and scientific rigour on display by second level students who are motivated not by the prizes but by the desire to explore interesting questions.