PhD Position in Long-term Action Recognition and Tracking

University of Amsterdam Informatics Institute


PhD Position Long-term Action Recognition and Tracking

Publication date 23 April 2021

Closing date 25 May 2021

Level of education Master's degree

Hours 38 hours per week

Salary indication €2,395 to €3,061 gross per month

Vacancy number 21-281


In the last years and with the advent of deep learning, video understanding, be it action or activity classification, video object recognition or object tracking, has benefited significantly. Interestingly, the majority of progress has focused on stylized, short-video segments of utmost few seconds long. However, in more than one ways this is an artificial constraint, besides the obvious and trivial fact that the input is simply ‘longer’.

  • For one, ordinary videos can be minutes or hours long, or even streaming endless, featuring actions that might span the entirety of the video (for instance, riding a bicycle in Tour de France).
  • With short videos one can basically treat the stylized video as an ‘extended image’ such that standard imagebased deep neural networks can be used. Doing the same with long videos is nonsensical, both from a computational and a learning point of view. Any model must be able to learn spatial and temporal correlations at different scales and in the presence of significant noise.
  • With short videos it is rather straightforward how to employ standard learning objectives, be they supervised or unsupervised. In the presence of labels one can simply rely on maximum likelihood. In the unsupervised setting one can rely on standard contrastive or autoencoding methodologies. In long videos, however, supervised labels are diluted by the spatial and temporal complexity. And autoencoding or contrastive objectives would likely fail due to the great variation in the inputs and, therefore, the anticipated outputs as well.
  • The amount of added spatiotemporal complexity increases exponentially, such that for accurate classifications, qualitatively different models and architectures are required, ones that are fundamentally sample and computationally more efficient.
  • In tracking, continuous model updates add to model bias and eventually drift. Longer videos means more updates and catastrophic drift.

These are few only basic difference of long videos from short ones. These differences alone already show that for long-term action classification and tracking we need fundamentally different and more nuanced models.


In this PhD position, we will research long-term action recognition and tracking, that is automatically classifying and localizing specific actions, objects and activities as they happen in long and complex videos and spatiotemporal sequences. Our lab has significant experience in long-term video action classification and tracking, showing that decomposition of long spatiotemporal convolutions (Hussein, Gavves, Smeulders, 2019a), and spatiotemporal graphs (Hussein, Gavves, Smeulders, 2019b), and Siamese networks (Tao, Gavves, Smeulders, 2015) are key to very scalable long-term action recognition and tracking. Inspired by prior work, and given the aforementioned (subset of) fundamental challenge, the research includes overarching questions like:

  • What are optimal deep spatiotemporal architectures for longterm action classification and tracking?
  • What are optimal deep learning representations for longterm action classification and tracking at multiple spatiotemporal scales?
  • What are optimal supervised and unsupervised learning objectives for longterm action classification and tracking?
  • Is there interplay between longterm action classification and neighboring learning tasks like video object recognition, object tracking, pose estimation? How can this interplay be leveraged?
  • Is it possible to perform highly accurate longterm action classification while still maintaining a reasonable computational budget?

In this position you will be supervised by Dr. E. Gavves, Associate Professor at the University of Amsterdam. This project is financed by the winning H2020 ERC Starting Grant ‘EVA: Expectational Visual Artificial Intelligence’ and NWO VIDI Grant ‘TIMING: Learning Time in Videos’.

What are you going to do

You will carry out research and development in the area of Deep Machine Learning and Vision. The research is embedded in the VISlab group at the University of Amsterdam. Your tasks will be to:

  • develop new deep machine learning and/or computer vision methods on Long-term Action Recognition and Tracking;
  • collaborate with other researchers within the lab;
  • regularly present internally on your progress;
  • regularly present intermediate research at international conferences and workshops, and publish them in proceedings (CVPR, ICCV, ECCV, NeurIPS, IMCL, ICLR) and journals (PAMI, IJCV, CVIU);
  • assist in relevant teaching activities;
  • complete and defend a PhD thesis within the official appointment duration of four years.

What do we require

  • An MSc degree in Artificial Intelligence, Computer Science, Engineering, (Applied) Mathematics or Physics, or related field;
  • a strong background/knowledge in computer vision, machine learning, and deep learning;
  • excellent programming skills preferably in Python;
  • solid mathematics foundations, especially statistics, calculus and linear algebra;
  • a highly motivated, passionate, creative and independent attitude;
  • strong communication, presentation and writing skills and excellent command of English.

Prior publications in relevant machine learning, vision, dynamical systems conferences or journals (NeurIPS, IMCL, ICLR, CVPR, ICCV, ECCV, JMLR, PAMI, IJCV, CVIU) is advantageous.

Our offer

A temporary contract for 38 hours per week for the duration of 4 years (the initial contract will be for a period of 18 months and after satisfactory evaluation it will be extended for a total duration of 4 years). This should lead to a dissertation (PhD thesis). We will draft an educational plan that includes attendance of courses and (international) meetings. We also expect you to assist in teaching.

The salary will be € 2,395 to € 3,061 (scale P) gross per month, based on a fulltime contract (38 hours a week). This is exclusive 8% holiday allowance and 8.3% end-of-year bonus. A favourable tax agreement, the ‘30% ruling’, may apply to non-Dutch applicants. The Collective Labour Agreement of Dutch Universities is applicable.

Are you curious about our extensive package of secondary employment benefits like our excellent opportunities for study and development? Take a look here.

About us

The University of Amsterdam (UvA) is the Netherlands' largest university, offering the widest range of academic programmes. At the UvA, 30,000 students, 6,000 staff members and 3,000 PhD candidates study and work in a diverse range of fields, connected by a culture of curiosity.

Curious about our organisation and attractive fringe benefits such as a generous holiday arrangement and development opportunities? Here you can read more about working at the UvA.

The Faculty of Science has a student body of around 7,000, as well as 1,600 members of staff working in education, research or support services. Researchers and students at the Faculty of Science are fascinated by every aspect of how the world works, be it elementary particles, the birth of the universe or the functioning of the brain.

The mission of the Informatics Institute is to perform curiosity-driven and use-inspired fundamental research in Computer Science. The main research themes are Artificial Intelligence, Computational Science and Systems and Network Engineering. Our research involves complex information systems at large, with a focus on collaborative, data driven, computational and intelligent systems, all with a strong interactive component.

The position is with Dr. Efstratios Gavves, Associate Professor in the Video & Image Sense lab, led by Prof. C. Snoek. VISlab is a world-leading lab on Computer Vision and Machine Learning, and has over 40 PhD students, postdoctoral researchers and faculty members working on a broad variety of core computer vision and core machine learning subjects: from action and object recognition or efficient spatiotemporal deep learning, to stochastic probabilistic models, temporal causality and graph neural networks. In the lab we encourage strongly collaborations. Other labs on Machine Learning and Computer Vision at the Informatics Institute include Amsterdam Machine Learning Lab, led by Prof. M. Welling and  Computer Vision Lab, led by Prof. T. Gevers.


Do you have questions about this vacancy? Or do you want to know more about our organisation? Please contact:

  • Efstratios Gavves, Associate Professor, tel. + 31 (0)20 525 8701

Are you curious about our extensive package of secondary employment benefits like our excellent opportunities for study and development? Take a look here.

Job Application

The UvA is an equal-opportunity employer. We prioritize diversity and are committed to creating an inclusive environment for everyone. We value a spirit of enquiry and perseverance, provide the space to keep asking questions, and promote a culture of curiosity and creativity. The Informatics Institute strives for a better gender balance in its staff. We therefore strongly encourage women to apply for this position.

Do you recognize yourself in the job profile? Then we look forward to receiving your application by 25 May 2021. You can apply online by using the link below. 

Applications in .pdf should include:

  • CV (max 2 pages) - including a list of publications if applicable and preferred starting date
  • Motivation letter (max 1 page) - motivating your choice for this position
  • Research statement (max 2 pages) – describing your thoughts/ideas about the project, no need for fully-fledged description, a sketch of creative approaches will be appreciated
  • MSc thesis - if still studying, a short summary up to 4 pages is also possible.
  • Record of MSc and BSc courses - including grades and explanation of the grading system
  • Names and contact addresses of two academic references.

Please mention the months (not just years) in your CV when referring to your education and work experience.

We will invite potential candidates for interviews soon after the expiration of the vacancy on 25 May, 2021


In your application, please refer to