Current generation astronomical sky surveys have collected multi-band imagery for billions of extra-galactic sources and reserve the promise of providing an economical method of determining redshift via imaging with a precision tradeoff. In this work, we estimate the photometric redshifts using convolutional neural networks trained directly from multi-band images with a low number of spectroscopic redshifts. Our model uses a self-training paradigm, where we first train a teacher network to generate pseudo-redshifts for a separate student network to learn from. Doing so allows our implementation to make use of every detected object in multi-band surveys, with or without associated spectroscopic redshifts. In addition, our networks contain a density function projection head that allows us to generate probabilistic predictions and evaluate the network performance on non-representative training samples. We illustrate our model on data from the Hyper-Suprime-Cam Strategic Survey Program. With as few as 2500 spectroscopic redshifts, we are able to determine redshifts at dz/(1+z) < 0.03 up to z<4, and with a fraction of outliers of 3.6%. Our results demonstrate a novel method well below the requirements of weak lensing for Euclid and Vera Rubin surveys and also can be adapted for more heterogeneous imaging surveys.