Visual saliency and eye movements have been well studied, mostly in the capacity of predicting topographical spatial saliency maps. In this thesis, we examine the problem of sequential selection and sampling of image content in detail. Careful scrutiny is applied to existing metrics for measuring success of sequential selection strategies, and a new family of metrics is proposed with an intuitive interpretation and that provides more discriminative power in revealing differences between viewing patterns or computational models. This is accompanied by experimentation based on classic strategies for simulating sequential selection from traditional representations of saliency, and deep neural networks that produce sequences by construction. Experiments provide strong support for the necessity of sequential analysis of attention and a roadmap for moving forward.
Supplementary notes can be added here, including code and math.