Even if it was 4s, you can always parallelize the frames to do it “realtime”, just the latency for the output will be 4s (provided you can get a cluster with 120 or 240 GPUs to do 4s of frames going in parallel (if it’s 30ms per image then you only need 2 GPUs to do 60fps on a video stream).