VTN-EG: CLIP-Based Visual and Textual Fusion Network for Entity Grounding | IEEE Conference Publication | IEEE Xplore