Currently, upon reviewing the request headers, there does not seem to be any way to differentiate between string and image loading requests for the same URL. With the limitations of VRCUrl, it would be very nice to have a way to tell them apart.
Some potential solutions include specifying a value for the Accept header, extending the User-Agent header, or adding a new VRChat-specific header. I believe the last option would be the least likely to break any existing worlds, but there are various pros and cons of each.
An example use-case would be a system to download a mesh (binary data, string loading) and its texture (image loading) using a single URL. The server could use the headers to determine which content it should respond with.
This would also be useful for decreasing the size of VRCUrl pools as used in YTS and other worlds. We are already using useragent-dependent responses between video player and string loading useragents, but we'd like to do the same with image loading too.