BLEnD-Vis: Benchmarking Multimodal Cultural Understanding in Vision Language Models