I agree completely with your assessment of human vision! The fovea centralis is only about 2 degrees across — a tiny fraction of our field of view. If a neural network were to ‘see’ with human-like vision, it would need a fish-eye lens, greatly magnifying the details at its center, and squishing the peripheral vision to a narrow band along its edge. There are actually neural networks that ‘see’ this way — they are Attention networks, which glance at small swatches of an image, and they only perceive their periphery as a fuzzy blur.
Yet, those attention networks still make vague classifications about their periphery, as we do. Image recognition software used in self-driving cars actually classifies almost every image in a scene simultaneously. Fundamentally, human vision is only an Ethernet cable worth of data (10Mb per second). So, any neural network that receives equal data rates should be able to recognize objects at least as well as us. What I show in the article is that after that data goes in, our brains use about 10,000 times as much ‘computation’ as artificial neural networks do! :) And, those image recognition systems are more accurate than humans.