Vision Models Contains implementation from scratch of Image transformations ViT and custom convolution CLIP FCOS detection Diffusion