I’ve readed topics about senses and I’ve seen that hearing is disregarded, (no, I don’t mean echolocation, I mean hearing) why stoping just at the speakers when all the other senses are going to be visual? It can be made so that when your creature hears a sound a marker signing where the sound came from and at what distance from the player could appear for a while and then dissapear and evolving a stronger short-temp memory could make it stay longer (this would also give importance to short-temp memory), and the more complex the hearing mechanism you have the further away this marker will be able to appear.
Edit: please don’t turn this to a blackhole discussion like you did with the other topic.