Operating Principle of the Image Encoder for Advanced Imaging Systems

The realm of optical image compression is undergoing a significant transformation, driven by innovative approaches leveraging auto-encoder neural network frameworks. At the heart of this technique lies the concept of naturally forming a compressed image within the “bottleneck” layer of a neural network [20]. This methodology has garnered considerable attention in recent years due to its inherent capability to simultaneously achieve data compression, dimensionality reduction, and noise mitigation [20, 21, 22]. Our implementation advances this field by introducing a hybrid optoelectronic auto-encoder, strategically partitioning the neural network processing. The initial stage, responsible for mapping raw image data to a compressed format at the bottleneck layer, is executed optically, capitalizing on the speed and efficiency of photonics. Conversely, the subsequent stage, dedicated to image reconstruction, is performed digitally, leveraging the flexibility and processing power of electronics. This synergistic approach is detailed in Fig. 1, which illustrates the architecture of our optical encoder and the corresponding neural network structure.

A key advantage of our hybrid optoelectronic auto-encoder design stems from the inherent characteristics of neural encoding networks. These networks typically exhibit a degree of insensitivity to the intricate details of the feature map, particularly concerning the weights and connections within the initial layers. Research has demonstrated that the first few layers can often function effectively even with randomly assigned weights, without significantly compromising overall performance [23]. This principle enables us to employ a pre-designed, passive photonic device to implement the transform required for the first layer of the auto-encoder network. In our system, the photonic layer is engineered to perform local kernel-like random transforms on small, discrete blocks of the image. This specific random encoding strategy was selected based on the principles of compressive measurement theory. This theory posits that random transforms are exceptionally well-suited for various dimensionality reduction and compression tasks [24, 25, 26, 27, 28]. However, it is crucial to note a key distinction: unlike conventional compressed sensing measurements, which are typically applied during the initial data acquisition phase [11], our approach addresses dimensionality reduction and data compression at a later stage, specifically after the image has already been formed. This innovative approach, detailed in the groundbreaking research paper identified by the reference 48099 [20], paves the way for highly efficient and rapid image processing.

Fig. 1: Working principle of the photonic image encoder. The diagram illustrates the hybrid optoelectronic auto-encoder, where the optical encoder performs the initial compression stage, and the digital decoder completes the image reconstruction. This architecture is a key innovation discussed in research paper 48099.

The silicon photonics-based all-optical image encoder is composed of a series of N single mode input waveguides. These waveguides serve as conduits for pixel information from the images, transmitting this data in the optical domain and representing a (sqrt{N}times sqrt{N}) pixel block of the input image. These input waveguides are connected to a multimode silicon waveguide region, which is followed by a disordered scattering region. This scattering region is crucial as it encodes the input data through a local random transformation, facilitating image compression. The output of this encoding process is a spatially varying intensity pattern that appears random. This pattern is then binned into M non-overlapping spatial regions, each corresponding to an M detector. The system is designed such that M is less than N, achieving image compression through a sequence of these block-wise transforms. While the compression itself is executed optically, the subsequent steps of reconstruction and image conditioning are performed electronically at the backend, ensuring a balanced and efficient processing pipeline.

As further depicted in Fig. 1, the silicon photonics-based image encoder incorporates several key components. These include N single mode input waveguides, each paired with a dedicated modulator. Following the waveguides is a multimode waveguide region, succeeded by a random encoding layer, and finally, an array of M photodetectors. A laser source (not explicitly shown) provides a consistent light amplitude to all N input waveguides. The N modulators then play a critical role, encoding a (sqrt{N}times sqrt{N}) pixel block of the input image onto the amplitude of light that is transmitted through each waveguide. Light from each waveguide is then channeled into a multimode waveguide region before it undergoes scattering within the random encoding layer and ultimately reaches the photodetectors. The random encoding layer’s structure is composed of numerous randomly positioned scattering centers. These centers are created by etching air holes into the silicon waveguiding layer, a process detailed further in the “Methods” section of the research paper 48099. Given that the optical device operates within the linear regime, the encoding process can be mathematically described using a single transmission matrix (T). This matrix effectively relates the input (I) to the transmitted output (O) through the equation O = TI. In this equation, I represents an N × 1 vector, O is an M × 1 vector, and T is an (Mtimes N) matrix. By design, M is set to be less than N, which allows the device to perform a matrix multiplication that compresses an N pixel block of the original image into a smaller M output pixel representation. Since the random encoding layer is entirely passive, this compression process can achieve remarkable speeds, operating on N pixels in parallel. The speed is primarily limited by the response times of the modulators and photodetectors. Furthermore, the energy consumption of this system scales linearly with N, the number of modulators, even though the device is performing (Mtimes N) operations, highlighting its efficiency.

To fully understand the role of our data compression system, it’s beneficial to break down the entire image acquisition and compression process into a sequence of four distinct steps:

(1) Conventional Imaging Optics: The initial step involves standard imaging optics that form an image onto the focal plane array of a camera.
(2) Focal Plane Array Detection: Next, conventional focal plane array detectors convert the analog optical image into the electrical domain, capturing the image information.

At this juncture, we have two potential paths forward, offering flexibility in system design:

(3a) Digitization and Re-encoding (Conventional Camera Approach): Most commercially available cameras are designed to digitize the image data as recorded on the focal plane array. In this scenario, a digital-to-analog converter (DAC) would then be used to drive the optical modulators on-chip. This process re-encodes the image information back into the optical domain onto an optical carrier.
(3b) Analog Output and Direct Re-encoding (Advanced Approach): Alternatively, focal plane arrays are also available commercially that offer a direct analog output [29]. This analog output can be used to directly drive the optical modulators, re-encoding the image information without the need for an intermediate digitization step. A transimpedance amplifier (TIA) can be employed to directly convert the analog photocurrent into a voltage signal suitable for driving the re-encoding modulation, with appropriate amplification.
(4) Photonic Encoding and Digital Storage: In the final step, the on-chip photonic encoder performs high-speed, low-power compression. The output from the detectors integrated on the chip is then digitized and stored for offline image reconstruction.

Opting for focal plane arrays that provide an analog output (option 3b) presents a significant opportunity to reduce overall power consumption and enhance throughput. This is primarily achieved by eliminating the intermediate analog-to-digital conversion (ADC) and DAC steps, streamlining the process. However, it’s important to emphasize that our core approach—utilizing a photonic chip for compression—remains compatible with both options. This versatility is particularly advantageous given the widespread prevalence of cameras equipped with integrated ADCs. This adaptability ensures that the photonic image encoder, as detailed in research 48099, can be integrated into a broad spectrum of imaging systems.

The local kernel size, denoted as N, is a critical parameter that significantly influences the performance characteristics of the photonic image processing engine. While employing smaller kernel-like transforms inherently reduces the data throughput, as the device can only compress N pixels at a time, it also confers several notable advantages. Firstly, local transforms are effective in preserving the spatial structure inherent in the original image. This preservation tends to result in improved image reconstruction quality, a point we will elaborate on in the subsequent section. Secondly, the kernel-based approach provides scalability, enabling the compression of arbitrarily large images without necessitating a proportional increase in the number of modulators and detectors. Thirdly, the use of local transforms aids in isolating noise originating from a specific pixel, such as a hot pixel. Without this isolation, such noise could potentially propagate and affect the entire compressed image. Finally, because this compression scheme essentially maps input image blocks to speckle patterns, employing excessively large kernels could lead to low-contrast speckles. These low-contrast speckles can, in turn, degrade the quality of image reconstruction, mirroring the trend observed in speckle-based spectrometers [30]. The optimal selection of kernel size therefore represents a critical design consideration for achieving balanced performance.

Effect of kernel size and kernel type on image compressibility

To rigorously evaluate the impact of kernel size on image compression, we conducted a series of numerical simulations. These simulations modeled the entire image compression and reconstruction process, utilizing a dataset of images sourced from the DIV2K and Flickr2K datasets [31, 32]. In conjunction with these real-world images, we also employed synthetically generated random T matrices to simulate the compression process under controlled conditions. To manage computational demands, the images were pre-processed by converting them to grayscale and cropping them to a uniform resolution of 512 (times) 512 pixels. The dataset comprised 4152 images, partitioned into a training set of 3650 images and a validation set of 502 images. In the simulation, the compression process was implemented by multiplying each (sqrt{N}times sqrt{N}) block of an image with a numerically generated random matrix. These matrices consisted of real, positive numbers that were uniformly distributed between 0 and 1. Following the simulation of compression, we trained a neural network specifically designed to reconstruct the original image from its compressed representation. The detailed architecture of this neural network and the specifics of the training routine are provided in the “Methods” section of the research paper 48099. Finally, to quantitatively assess the fidelity of the reconstructed images, we used test images from the DIV2K and Flickr2K datasets. These images were subjected to compression using kernels of varying sizes, and the quality of reconstruction was evaluated using standard metrics.

To illustrate the effects of kernel size, Fig. 2a presents an example image from our test set. This original image is juxtaposed with its compressed versions obtained using two different kernel sizes: 8(times)8 pixels (Fig. 2b) and 32(times)32 pixels (Fig. 2e). In these simulations, the compression ratio ((M:N)) was held constant at 1:8. This resulted in compressing the original 512(times)512 images into compressed datacubes of dimensions 8(times left(64times 64right)) pixels and (128times left(16times 16right)) pixels, respectively. The images reconstructed from these compressed representations are shown in Fig. 2c, f. A visual comparison clearly indicates that using a smaller kernel size (8(times)8 pixels) retains more of the spatial structure of the original image in the compressed domain (Fig. 2b). This enhanced spatial preservation translates directly to a higher fidelity reconstruction. To quantify this observation, we calculated the average peak signal-to-noise ratio (PSNR) and the structural similarity index measure (SSIM) for the reconstructed images across the entire test image dataset. These metrics are plotted in Fig. 2d as a function of kernel size. The results consistently demonstrate that smaller kernel sizes generally lead to superior image reconstruction quality, achieving higher PSNR and SSIM values. This finding underscores a fundamental characteristic of image data: unlike spatially uncorrelated and sparse data, which can be efficiently compressed using large random matrices, image data inherently possesses spatial structure and correlations that are crucial to maintain for high-quality reconstruction. Therefore, the optimal kernel size is not universal but rather contingent on the specific characteristics of the images being compressed, including factors such as sparsity and spatial frequency content.

Fig. 2: Effect of kernel size on image compression and reconstruction using images from the DIV2K and Flickr2K dataset [31, 32]. The figure illustrates the impact of kernel size on image compression, a key aspect of the photonic image encoder system detailed in research paper 48099. a Original 512 × 512 grayscale image. b, e Compressed images using 8 × 8 (b) and 32 × 32-pixel kernels (e). c, f Reconstructed 512 × 512 images from (b) and (e) respectively. d Mean PSNR and SSIM as a function of kernel size.

For benchmarking purposes, we compared our compression technique, utilizing both 8(times)8 and 16(times)16 kernels, against the widely used standard JPEG compression. This comparison was performed on images from the same test dataset (DIV2k and Flickr2K) to ensure a fair evaluation. Figure 3 (a, b) presents a comparative analysis of the average PSNR (Fig. 3a) and SSIM (Fig. 3b) achieved by JPEG compression and our encoding scheme across the test image dataset. The results are plotted as a function of compression ratios. As shown in Fig. 3a, b, our photonic compression approach exhibits slightly lower PSNR/SSIM values compared to JPEG at lower compression ratios, specifically in the range of 20 dB up to ratios of 1:256. However, it’s crucial to note a limitation of JPEG compression in this context: unlike photonic compression, which allows for fixed compression ratios, not all images within the test dataset could be compressed to a ratio of 1:64 using JPEG. In fact, only approximately 400 out of the 500 test images could be compressed to this extent. Figure 3c, d further illustrates this comparison by showing the compression ratio dependence for the same example image used in Fig. 2a. The images were reconstructed after compression using both the photonic approach and JPEG. For this specific image, the highest compression ratio achievable with the JPEG algorithm was approximately 1:45 (detailed in Supplementary Information: Section S6. Comparison of JPEG and photonic compression for more examples). At lower compression ratios of 1:8 and 1:16, both techniques yield images of excellent quality. However, at higher compression ratios (exceeding 1:32), JPEG compression begins to introduce noticeable pixilation artifacts. In contrast, the photonic compression scheme tends to lose some of the finer, higher spatial frequency content. These observed differences in image degradation characteristics stem from fundamental distinctions in the underlying compression mechanisms of the two approaches.

Fig. 3: Comparison of photonic image compression to digital JPEG image compression using images from the DIV2K and Flickr2K dataset [31, 32]. This figure, referenced in research paper 48099, provides a benchmark comparison, evaluating the photonic approach against traditional JPEG compression. a Mean PSNR comparison. b Mean SSIM comparison. c Reconstructed images using photonic compression at varying ratios. d Compressed images using JPEG at varying ratios.

The JPEG compression algorithm, a cornerstone of digital image compression, operates by applying a discrete cosine transform (DCT) to every 8 × 8-pixel block within an image [19]. Following the DCT, a thresholding operation is performed to selectively retain and store the most significant basis functions, effectively discarding less crucial information. While this nonlinear and data-dependent transformation approach can achieve high-quality compression, particularly at lower compression ratios or for images that are inherently sparse, it presents several drawbacks when compared to the photonic compression scheme introduced in research 48099. These drawbacks are particularly relevant in high data-rate imaging applications. Firstly, JPEG compression ratios are inherently image-dependent. This variability poses a challenge for high-data-rate image acquisition systems, as they would need to dynamically allocate variable-sized memory blocks to accommodate fluctuations in compression ratios. Secondly, JPEG compression entails significantly more computational operations per pixel than our photonic scheme. This is because JPEG performs a full DCT transformation on each 8 × 8 pixel block before proceeding to select basis functions for retention. The increased computational burden translates to higher power consumption and slower throughput, as multiple clock cycles are required for processing. Finally, JPEG compression in its standard form is not equipped to perform denoising or image conditioning concurrently with compression. In contrast, as detailed in the Experimental section of research 48099, the photonic scheme offers the capability to simultaneously achieve image compression, image conditioning (addressing pixel linearity, hot pixels, and other sensor imperfections), and denoising. This is accomplished through the use of a neural-network-based non-linear decoding scheme at the backend, providing a more integrated and efficient approach to image processing.

Beyond kernel size, we also investigated the impact of kernel type by considering two distinct types of random kernels. In Figs. 2 and 3, the simulations presented utilized synthesized random T matrices that were constrained to be real and positive. This configuration simulates the compression process under conditions where light coupled to each input waveguide is effectively incoherent. Such incoherence can be realized, for example, by employing a frequency comb or another multi-wavelength source to couple light at different wavelengths into each waveguide. To ensure effective incoherence, the frequency separation between light in adjacent waveguides should be at least approximately ten times the detector bandwidth to minimize interference effects [33]. Under these conditions, the speckle patterns generated by light from each waveguide would sum incoherently at the detectors. Consequently, the compression process can be accurately modeled using a random T matrix that is real-valued and non-negative. The second case we explored involved using a complex-valued field T matrix. In this scenario, each element within the T matrix was assigned a random amplitude and phase. The compressed image was then obtained as the square-law detector response, mathematically expressed as: (O=left(Tsqrt{I}right){left(Tsqrt{I}right)}^{*}). This case simulates the scenario where a single, coherent laser source is coupled to all input waveguides simultaneously. In this configuration, the measured speckle pattern is formed by the interference between light originating from each waveguide, resulting in a complex-valued transmission matrix.

To evaluate the performance trade-offs between real and complex transforms, we assessed the reconstructed image quality across varying levels of noise. Noise in imaging systems can originate from multiple sources. It can be introduced during the initial image formation process, for instance, due to low-light conditions or imperfections in the imaging optics. Noise can also arise during the camera’s opto-electronic conversion process, potentially due to pixel non-linearity or the limited bit depth of camera pixels. Furthermore, the optical compression process itself, as described in research 48099, can introduce noise, possibly from laser intensity fluctuations, environmental variations affecting the T matrix, or fundamental shot noise at the detection stage. To simulate the impact of noise on the reconstruction of compressed images, we numerically added Gaussian white noise to the compressed representations. Figure 4a, d displays the same test image previously used in Fig. 2. This image was compressed by a factor of 8X (compression ratio 1:8) using both real-valued and complex-valued 8 × 8 pixel T matrices. In these simulations, Gaussian white noise with an amplitude equivalent to 2% of the average signal level in the image (corresponding to a signal-to-noise ratio (SNR) of 50) was added to each compressed image. The reconstructed images obtained using the real and complex (T) matrices are shown in Fig. 4b, e. At this 2% noise level (SNR = 50), the reconstructed images exhibit only a marginal degradation in quality compared to the noise-free reconstruction presented in Fig. 2c (PSNR = 25.1 dB with noise versus PSNR = 26.9 dB without noise for the real transform case). This observation reinforces the inherent noise resilience of the autoencoder framework, a characteristic that aligns with prior applications of autoencoders in denoising tasks. This robustness to noise has significant implications. It potentially allows the system to forgo energy-intensive image conditioning steps by directly encoding raw image data and relying on the backend neural network to compensate for noise arising from factors like pixel non-uniformity. Figure 4c, f presents a more comprehensive analysis. It shows the average PSNR and SSIM for reconstructed test images that were compressed using either real-valued or complex-valued T matrices, plotted as a function of the SNR of the compressed images. These simulation results reveal that at relatively high SNR levels (greater than 50), both real and complex-valued T matrices provide comparable performance in terms of image reconstruction quality. However, as the SNR decreases to lower levels, the complex-valued T matrices demonstrate superior robustness in image compression. This enhanced robustness is attributed to the higher contrast achievable in compressed images when using a complex (T) matrix, which aids in preserving image information even in the presence of increased noise.

Fig. 4: Effect of kernel type (real vs complex) on image compression and reconstruction using images from the DIV2K and Flickr2K dataset [31, 32]. This figure, integral to the findings of research paper 48099, explores the impact of kernel type on the photonic image encoder’s performance. a Compressed image using a real 8×8 kernel with 2% noise. b Reconstructed image from (a). c Mean PSNR and SSIM for real kernels at varying SNR. d Compressed image using a complex 8 × 8 kernel with 2% noise. e Reconstructed image from (d). f Mean PSNR and SSIM for complex kernels at varying SNR.

Experimental image compression and denoising

Building upon the simulation results, we proceeded to experimental validation to confirm key predictions of the photonic image encoder concept outlined in research 48099. The primary objectives of these experiments were twofold:

(1) Validation of Photonic Compression Quality: To experimentally verify that our proposed approach—employing an analog photonics-based fixed linear random matrix for compression and a non-linear neural network for decompression—can achieve image compression quality comparable to JPEG. This is particularly significant given that JPEG relies on an image-dependent compression scheme that is substantially more energy and time-intensive.
(2) Demonstration of Simultaneous Denoising and Compression: To experimentally demonstrate that our photonic technique can be effectively used for both image denoising and compression concurrently, showcasing its multi-functional capabilities.

For experimental validation, we fabricated a prototype device using a silicon photonics platform. The experimental device incorporated (N)=16 single-mode input waveguides. These waveguides were connected to the scattering layer through a multimode waveguide region. The inclusion of the multimode waveguide region was crucial. It allowed light from each single-mode waveguide to spread out along the transverse axis before reaching the random scattering layer. This design ensured that we obtained a uniformly distributed random transmission matrix, which is essential for effective compression, without requiring an excessively long random scattering medium. An excessively long scattering medium would introduce undesirable excess loss due to out-of-plane scattering. To illustrate the impact of the multimode waveguide, we performed full-wave numerical simulations. These simulations compared two configurations: single-mode waveguides connected directly to the scattering layer, and single-mode waveguides connected through an intermediate multimode waveguide region. In the first configuration, depicted in Fig. 5a, the scattering layer’s thickness was insufficient to allow light to fully diffuse along the transverse axis. This resulted in a high concentration of transmitted light in proximity to the input waveguide’s position. In terms of the ({T}) matrix, this manifested as stronger coefficients along the diagonal, deviating from the desired uniformly distributed random matrix. We then simulated the effect of introducing a 32 μm multimode waveguide between the single-mode input waveguide and the scattering layer. As shown in Fig. 5(b), the multimode waveguide effectively facilitated the lateral spreading of light from the single-mode waveguide across the scattering layer. This resulted in a transmitted speckle pattern that was uniformly distributed, indicating a more desirable random transmission matrix.

Fig. 5: Numerical simulations and experimental characterization. This figure, referenced in research paper 48099, details the device design and characterization methods. a, b Numerical simulations showing the impact of a multimode waveguide on light diffusion. c Scanning electron micrograph of the fabricated photonic image encoder. d Experimental measurement setup for characterizing the transmission matrix.

The device fabrication was carried out using a standard silicon-on-insulator wafer with a 250 nm thick silicon layer. The fabricated device comprised 16 single-mode input waveguides, designed to connect the device to the chip’s edge for input coupling. These waveguides were 450 nm in width and spaced 3 (mu m) apart, corresponding to a spacing of approximately 2(lambda) at a wavelength of 1550 nm. This spacing was chosen to minimize evanescent coupling between adjacent waveguides. All 16 waveguides were connected to a 55.2 (mu m) wide, 120 (mu m) long multimode waveguide region, followed by a 30 (mu m) long scattering region. The scattering region was meticulously designed and fabricated to induce random scattering. It consisted of randomly positioned cylinders with a radius of 50 nm and a filling fraction of 3%, etched into the silicon waveguiding layer. The parameters of the scattering layer, including cylinder size, density, and region length, were empirically optimized to achieve a target transmission of approximately 30% [see Supplementary Information: Section S5. Additional Experimental Characterization for experimental results]. To mitigate light leakage at the edges of the scattering layer and ensure efficient confinement of light within the active region, we incorporated a full band-gap photonic crystal layer along the sides of the scattering region [34, 35, 36]. Experimental validation confirmed that the transmission through the fabricated device was indeed approximately 30%, as detailed in the Supplementary Information [Supplementary Information: Section S5. Additional Experimental Characterization]. As this initial prototype did not include integrated photodetectors, we implemented a ridge etch in the silicon waveguiding layer immediately after the scattering region. This ridge served as a means to extract and record the light scattered out-of-plane. By measuring the optical power scattered from this ridge, we could effectively simulate the optical power that would be recorded if detectors were integrated directly into the device. Scanning electron microscope images of the fabricated device, showcasing the intricate details of the waveguide structures and scattering region, are presented in Fig. 5c.

It is important to highlight that the design objectives for the random scattering medium in this image compression application are significantly different from those in prior applications of on-chip scattering media, such as the speckle spectrometer reported in Ref. 36. While speckle spectrometers rely on generating distinct random projections for different input wavelengths, necessitating a relatively large and strongly scattering region, our image compression application requires a scattering medium with a broadband response (to minimize temperature dependence) but distinct, uniformly distributed random projections for different spatial modes. Furthermore, the compression device design prioritizes minimizing both footprint and optical loss while still achieving the fully distributed random transmission matrix necessary for high-quality image compression. As demonstrated in Fig. 5 a, b, we found that a design incorporating a multimode waveguide region followed by a short scattering region effectively met these requirements. This configuration allowed each spatial input mode to overlap before reaching the scattering medium, resulting in a uniformly distributed transmission matrix without necessitating an extended scattering region that would introduce significant optical loss.

To characterize the device’s performance, we first experimentally measured the (T) matrix. This was achieved by coupling an input laser operating at a wavelength of 1550 nm into each single-mode waveguide sequentially. For each input waveguide, we recorded the speckle pattern scattered from the detection ridge located after the scattering layer using an optical microscope setup. A representative image captured using this optical setup, illustrating the speckle pattern generated by the device, is shown in Fig. 5d.

To accurately account for experimental noise inherent in the image compression and recovery process, we recorded two sequential (T) matrices, as shown in Fig. 6a, b. The high degree of repeatability of the ({T}) matrix measurements is evident in Fig. 6c, which displays the difference between the two independently acquired matrices. A histogram of the element-wise differences between the two matrices is presented in Fig. 6d. This histogram reveals a Gaussian-like random noise distribution with an amplitude of approximately 1% of the average signal value. This noise level corresponds to a measurement SNR of approximately 100. As previously shown in Fig. 4, at this SNR level, both real and complex transformations yield comparable results in terms of image reconstruction quality. This finding implies that we can effectively utilize the experimentally measured intensity transmission matrix for image compression without significant performance degradation due to measurement noise.

Fig. 6: Experimental demonstration of denoising and image compression using images from the DIV2K and Flickr2K dataset [31, 32]. This figure from research paper 48099 presents experimental results validating the photonic image encoder. a, b Sequential measurements of the transmission matrix. c Difference between the two measured transmission matrices. d Histogram of the difference matrix elements, showing noise distribution. e Compressed image using the experimental transmission matrix. f Reconstructed image from (e).

It is important to note that the experimental noise observed in our measurements primarily arises from noise sources inherent in the experimental setup, such as laser noise and electronic noise. These noise sources are representative of the noise expected in real-world applications. In contrast, the photonic encoder itself exhibits remarkable stability and provides a highly repeatable response. To rigorously assess the device’s long-term stability, we monitored the transmission matrix over a period of 60 hours. The results of this long-term stability test are detailed in the Supplementary Information [Supplementary Information: Section S5. Additional Experimental Characterization]. The findings demonstrate that the device is indeed highly stable, exhibiting negligible fluctuations in its transmission characteristics over the 60-hour period, and without requiring any active temperature stabilization. This level of stability in an integrated photonic device is not unexpected, given the short transit time of light through the scattering region, which corresponds to a low effective quality factor and minimal temperature dependence. Regarding temperature sensitivity, based on our prior work [36] and assuming a thermo-optic coefficient in silicon (Si) of approximately dn/dT ≈ 1.8 × 10−4 K−1 [37], we estimate that the generated speckle pattern at the output will remain correlated for temperature variations up to ± 4°K. This inherent stability is a significant advantage of the compact scattering device structure employed in this work. Moreover, as we will discuss later, our unique approach of combining image compression with denoising offers the potential to mitigate some of the noise introduced by thermal fluctuations during image compression. This can be achieved by training the backend image reconstruction neural network using data acquired across a range of temperatures, making the system more robust to thermal variations.

To process the raw measured transmission matrix into the (T) matrix used for image compression, we selected four non-overlapping spatial regions along the output ridge, as illustrated in Fig. 5d. This selection effectively corresponds to choosing four columns from the matrix depicted in Fig. 6a. The resulting updated (T) matrix had dimensions of (16times 4), providing a compression factor of 1:4. We then utilized this experimentally derived matrix to train the backend neural network responsible for reconstructing the original image. Crucially, we incorporated noise into the training process by adding Gaussian noise with the same 1% variance that we experimentally measured. This noise injection during training enhances the robustness of the neural network to real-world noise conditions. Finally, we proceeded to compress the test images from the DIV2K and Flickr2K datasets using our experimental setup. During this compression process, we again added random noise with a variance of 1% to simulate realistic operating conditions. A representative example of a compressed image obtained using the experimentally measured ({T}) matrix is shown in Fig. 6e, and the corresponding reconstructed image is shown in Fig. 6f. The reconstructed image demonstrates excellent agreement with the original, achieving a PSNR of 26.02 dB and an SSIM of 0.91. For comparison, applying 1:4 JPEG compression to the same image, under similar SNR conditions, yields a PSNR of 29.63 dB and an SSIM of 0.83. We extended this evaluation to the entire set of test images and obtained average performance metrics of (26pm 4) dB for PSNR and (0.9pm 0.07.) for SSIM. Further examples of compressed and reconstructed images, along with detailed comparisons to JPEG compression, are available in the Supplementary Information [Supplementary Information: Section S1. Additional Experimental results: Compressed and reconstructed images and their statistics and Section S6. Comparison of Digital JPEG and Photonic Compression].

In addition to demonstrating robustness against noise introduced during the analog photonic image compression step, this experimental validation also underscores the potential of this technique for image denoising. From the perspective of the backend image reconstruction neural network, noise introduced during the original image acquisition process (e.g., due to pixel noise, non-uniform responsivity, or low light levels) is effectively indistinguishable from noise added during the image compression step. Our experimental testing, which explicitly added noise during compression, therefore effectively simulates the system’s ability to handle noise originating from various sources within the imaging pipeline. Thus, this work highlights the potential of our photonic image compression technique to shift the computationally and energy-intensive image conditioning and denoising steps to the backend image reconstruction stage. This paradigm shift could lead to significant improvements in the energy efficiency and processing speed of future imaging systems.

While our initial experimental demonstration was conducted at a relatively low speed due to the absence of integrated detectors and modulators in the prototype device, the underlying approach is inherently compatible with high-speed operation, potentially exceeding 10 GHz. In essence, our compression device can be viewed as a photonic communication link. In this analogy, image data is encoded onto an optical carrier using the input modulators, transmitted through the scattering region (analogous to signal propagation along a bus waveguide or through an optical fiber in a communications link), and finally recorded by a high-speed photodetector. Given that the optical loss through the scattering region is relatively low (approximately 30%), we anticipate that the compression device can operate with a comparable SNR to photonic links operating at similar speeds. One potential challenge is the introduction of noise during the data encoding step, specifically at the high-speed integrated modulators. However, our approach offers the advantage of simultaneous compression and denoising. As demonstrated in Fig. 4f, the image compression process remains effective even at relatively low SNR levels, down to approximately 10. Furthermore, we conducted simulations to evaluate the compression quality as a function of noise added directly to the input image. These simulations, detailed in the Supplementary Information (S4: Denoising Images), effectively simulate the impact of noise introduced by the modulators during input encoding. The simulation results confirm that our approach exhibits significant robustness to noise originating from the input modulators, further validating its practical viability for high-speed, low-noise image compression applications.

Predicted energy consumption and operating speed for the photonic image processor

As previously described, our encoding and compression technique fundamentally relies on a matrix multiplication operation. To provide a quantitative comparison of the energy efficiency of our photonic approach against traditional electronic schemes, we estimated the energy per multiply-accumulate (MAC) operation for both methodologies. It is well-established that electronic hardware accelerators have undergone extensive optimization to minimize the power consumption per MAC operation.

The total power consumed by the photonic image processing engine is comprised of contributions from several key components: the laser source, the optical modulators, and the photodetectors. To estimate the required laser power, we first determined the necessary detected optical power to ensure sufficient signal-to-noise ratio for accurate image compression. Assuming shot-noise limited detection, the required optical power reaching each photodetector can be expressed as [38]:

$${P}_{{Rx}}={2}^{2{ENOB}}q{f}_{0}/{{{{{mathscr{R}}}}}}$$

(1)

In this equation, ENOB represents the required effective number of bits, (q) is the elementary charge (1.6 × 10−19 coulombs), ({f}_{0}) denotes the operating frequency of the modulator (and the detector baud rate), and ({{{{{mathscr{R}}}}}}) is the responsivity of the photodetector, typically measured in units of (A/W). The ENOB is directly related to the measurement SNR in dB by the relationship ({SNR}=6.02times {ENOB}+1.72) [38]. For the energy consumption calculations presented below, we assumed a required ({ENOB}) of 6. This corresponds to a measurement SNR of 38 dB, providing a considerable margin compared to the experimentally measured SNR of 17 dB. Based on the calculated required power at the detector, we can then work backward to estimate the necessary laser power using the following equation:

$${P}_{{laser}}=frac{Ntimes {P}_{{Rx}}}{{T}_{{{{{mathrm{mod}}}}}}{T}_{{scatter}}}$$

(2)

Here, (N) is the number of pixels in an image block, ({T}_{{{{{mathrm{mod}}}}}}) represents the transmission efficiency of the optical modulators, and ({T}_{{scatter}}) is the transmission efficiency of the scattering medium. The electrical power required to drive the laser is then given by ({P}_{{laser}}/eta), where (eta) is the wall-plug efficiency of the laser. The factor of (N) in Eq. (2) reflects the fact that the multimode waveguide and scattering region are designed to support (N) spatial modes. This is the minimum number of modes required to efficiently couple light from (N) single-mode input waveguides. In our initial experimental demonstration, we employed a slightly larger multimode waveguide than strictly necessary to simplify the experimental setup. As a result, the optical power was distributed over more than (N) modes in our initial demonstration. Future optimized designs would involve adiabatically coupling the single-mode input waveguides into an (N)-mode multimode waveguide to maximize power efficiency.

The power consumption of the optical modulators can be estimated using the equation [39]:

$${P}_{{Mod}}=frac{1}{2}{C}_{{Mod}}{V}_{{pp}}^{2}{f}_{0}$$

(3)

where ({C}_{{Mod}}) is the capacitance of the modulator, and ({V}_{{pp}}) is the peak-to-peak driving voltage applied to the modulator. The power required by the photodetectors can be approximated as:

$${P}_{{PD}}approx {V}_{{bias}}{{{{{mathscr{R}}}}}}{P}_{0}$$

(4)

where ({V}_{{bias}}) is the bias voltage applied to the PN junction of the photodetector. The total electrical power consumed by the photonic image-processing engine can then be calculated by summing the power contributions from each component:

$${P}_{{total}}={P}_{{laser}}/eta+Ntimes {P}_{{{{{mathrm{mod}}}}}}+Mtimes {P}_{{PD}}$$

(5)

Since the total number of MACs per second is given by (Ntimes Mtimes {f}_{0}), the energy consumption per MAC can be calculated as ({P}_{{total}}/left(Ntimes Mtimes {f}_{0}right)). Importantly, by substituting Eq. 1 into the expressions for ({P}_{{laser}}({{{{{rm{Eq}}}}}}.2)) and ({P}_{{PD}}) (Eq. (4)), we observe that the total energy consumption per MAC operation is independent of the modulation frequency ({f}_{0}). This frequency independence is a significant characteristic of the photonic approach.

To provide a quantitative comparison of the energy per MAC required by an optimized photonic processing engine with that of a conventional electronic GPU, we adopted typical specifications for commercially available optoelectronic components. Typical values for these specifications are as follows: ({C}_{{Mod}}) is on the order of 1 fF, ({V}_{{pp}}) is approximately 1 V, ({V}_{{bias}}) is typically 3.3 V, and ({{{{{mathscr{R}}}}}}) is typically around 1 mA/mW at a wavelength of 1550 nm [40, 41]. Furthermore, typical insertion loss for high-speed optical modulators is approximately 6.4 dB (corresponding to ({T}_{{{{{mathrm{mod}}}}}}=0.27)), and the wall-plug efficiency for distributed feedback lasers is assumed to be (eta=0.2) [42, 43]. The transmission efficiency through the experimental scattering medium is taken as ({T}_{{scatter}}=0.2). This value accounts for both the intrinsic transmission of the scattering medium and the coupling efficiency to the integrated photodetectors. While our scattering medium exhibits a transmission of 30% as experimentally measured and detailed in the Supplementary Information [Supplementary Information: Section S5. Additional Experimental Characterization], we assumed an overall transmission of 20% to conservatively account for approximately 67% coupling efficiency to the photodetectors (further details in Supplementary Information: Section S3. Integration of photonic encoder with silicon photonics and CMOS components).

The estimated energy consumption per MAC operation as a function of the image block size (N) is graphically presented in Fig. 7. The results clearly demonstrate that the energy required by the photonic image processor decreases rapidly as the image block size increases. Our analysis also indicates that the laser source is the dominant contributor to the overall power consumption. Specifically, ({P}_{{laser}}) is estimated to be 9.2 mW for a kernel size of (8times 8) pixels and an ENOB of 6. While this power level is readily achievable with most commercially available lasers operating in the 1550 nm wavelength regime, further reductions in power consumption could be realized if a lower ({ENOB}) is acceptable for specific applications. A lower ({ENOB}) requirement would enable the use of a lower power laser source (see Fig. 4c, f for an analysis of the trade-off between image reconstruction quality and the SNR of the compressed image). Nevertheless, our analysis reveals that for an image block size of (8times 8) pixels, the photonic image processor has the potential to achieve a 100-fold reduction in power consumption compared to a typical GPU. While the photonic processor exhibits even greater energy efficiency with larger image blocks, it’s important to note that increasing the kernel size can potentially degrade the quality of image reconstruction, as shown in Fig. 2(d). Future research directions may explore the use of alternative inverse-designed transforms. Such transforms could potentially enable the use of larger pixel blocks without compromising image reconstruction fidelity, further enhancing the efficiency of the photonic image compression approach.

Fig. 7: Comparison of energy consumption for electronic and all-optical encoding approaches for image compression. This figure, central to the energy efficiency analysis in research paper 48099, compares photonic and electronic methods. The graph illustrates the energy consumption per MAC for both approaches as a function of input pixels N. Electronic methods (GPU, SoC, ASIC) are represented by the black line, and the photonic approach is shown in blue, with component-specific breakdowns (laser, modulators, detectors).

We can also utilize this framework to estimate the energy consumption per pixel, calculated as ({P}_{{total}}/left(Ntimes {f}_{0}right).) This metric, energy per pixel, is independent of both the modulation frequency and the size of the pixel blocks. For an ({ENOB}) of 6 and the component parameters detailed above, the energy per pixel can be as low as 72 fJ. This value is significantly lower than the approximately 0.1 (mu)J per pixel consumed by existing image processing systems. However, it’s important to acknowledge that the latter figure typically includes the power required to operate the pixels themselves and the analog-to-digital conversion process necessary to extract the signal recorded by each pixel. Nevertheless, given that image compression and conditioning account for more than 50% of the total energy consumption in standard electronic image processing systems, our optoelectronic approach has the potential to contribute substantially to reducing overall energy consumption in imaging applications.

Finally, we can estimate the device throughput, in terms of pixels processed per second, as (Ntimes {f}_{0}). Assuming an image block size of (8times 8) ((N=64)), this approach can achieve a throughput of 1 Terapixel/second using a clock speed of approximately 16 GHz. This clock speed is readily achievable with current high-speed optical modulators and photodetectors [16, 17, 18]. While achieving such compression rates would necessitate significant temporal multiplexing to provide the compression engine with 64-pixel kernels at a 16 GHz rate, current on-board memory technology offers the capability to store up to 1 Gigapixel of data in a buffer to feed the compression engine for real-time processing. Since the compression engine is capable of processing 1 Tpixel/sec (or 1 Gpixel/msec), this throughput is sufficient to keep pace with the data rates acquired by a Gigapixel camera operating at 1 kHz, which is well beyond current state-of-the-art capabilities. This processing speed significantly surpasses the compression speeds achievable with conventional state-of-the-art digital electronics schemes such as JPEG compression, which can compress at most around 1 Gigapixel/sec.

While the energy consumption analysis presented above compares our photonic approach to GPUs implementing the same random transform-based compression algorithm, another relevant comparison is the energy efficiency of JPEG compression when implemented on various processors commonly found in digital cameras or smartphones. To estimate the energy consumption for JPEG compression [19], we first analyzed the computational complexity of the JPEG algorithm. The standard JPEG compression algorithm applies a discrete cosine transform (DCT) followed by thresholding (to retain significant basis functions) for every 8 × 8 non-overlapping pixel kernel. To encode an image with (Ntimes N) pixels, ({N}^{2}/64) DCTs are required, with each DCT operation entailing 64 × 64 multiply-accumulate (MAC) operations. This results in a total of 64({N}^{2}) MAC operations for JPEG compression. The more advanced JPEG 2000 compression algorithm [44] decomposes the image into wavelet representations by iteratively passing it through 2D (n)-tap filters and applying 2 × 2 downsampling. For an image with (Ntimes N) pixels undergoing (K)-level decompositions, the total number of MAC operations is given by (4nmathop{sum }nolimits_{j=0}^{K-1}N/{2}^{j}). Notably, the computational complexity of both JPEG and JPEG 2000 algorithms scales as (O({N}^{2})). This scaling is the same as the computational complexity that our algorithm would exhibit if the random projections were implemented using digital electronics rather than analog photonics.

Modern digital cameras and smartphones typically employ an embedded system-on-chip (SoC) as the central processing unit. These SoCs typically integrate ARM cores for general-purpose computing, memory interface, and device control, along with dedicated hardware codecs (e.g., JPEG/HEIF encoder in Canon DIGIC processors) or image-processing ASICs (e.g., Apple A16 Bionic) specifically designed for JPEG compression and decoding. The power consumption of these dedicated codecs or ASICs typically ranges from 0.5 to 20 pJ/MAC [45]. Because digital hardware codecs are optimized for streaming pipelined, sequential input of image blocks, the total power consumption also scales as (O({N}^{2})). Consequently, the power consumption per MAC operation remains relatively constant regardless of the image size, exhibiting a similar scaling trend to GPUs. Moreover, the energy per MAC for a GPU (approximately 1 pJ/MAC) is comparable to that of SoCs and ASICs, as illustrated in Fig. 7.

In conclusion, the power consumption for JPEG compression on smartphones or digital cameras is comparable to that of GPUs and exhibits similar scaling characteristics. The primary advantage of utilizing a GPU for compression lies in its potential to achieve higher throughput by leveraging parallel processing capabilities. However, our analysis unequivocally demonstrates that the photonic compression engine significantly outperforms both GPUs and dedicated digital electronics solutions in terms of both processing speed and energy efficiency. The photonic approach offers the potential for orders-of-magnitude lower power consumption compared to digital electronics-based solutions, whether implemented on GPUs, ASICs, or SoCs. Table 1 in the Supplementary Information (Section S2. Energy consumption typical of mainstream electronic architectures) provides a summary of the typical energy consumption values for mainstream electronic architectures, including desktop processors, with the most energy-efficient schemes achieving approximately 1 pJ/MAC. The research paper 48099 provides a comprehensive analysis of the photonic image encoder, highlighting its potential to revolutionize image compression technology.

Operating Principle of the Image Encoder for Advanced Imaging Systems

Effect of kernel size and kernel type on image compressibility

Experimental image compression and denoising

Predicted energy consumption and operating speed for the photonic image processor

Comments

Leave a Reply Cancel reply