Medical Device – Katrin Mentl, Boris Mailhe, Mariappan S. Nadar, Siemens Healthcare GmbH

Abstract for “Denoising medical imagery by learning sparse representations of images with a deep unfolding method”

“The present embodiments concern denoising medical pictures. The following embodiments include methods and apparatuses for machine learning sparse images with deep unfolding, and deploying the machine-learnt network to denoise the medical images. Iterative thresholding can be performed by a deep neural network. Each layer is trained as an iterative shrinkage algorithm. Randomly initialize the deep neural network and train it independently using a patch-based method to learn sparse representations of image data for denoising. The layers of the deep neural networks are rolled up into a feed forward network that is trained from end to end.

Background for “Denoising medical imagery by learning sparse representations of images with a deep unfolding method”

Noise is an inevitable part of image acquisition. In X-ray imaging for example, the reduction of radiation exposure of patients comes at the expense of increased noise in the image. This tradeoff is particularly apparent when multiple images are taken at once, as in the case of monitoring interventional surgery (e.g., cardiac catheterization using X-ray fluoroscopy). A high-quality reconstruction is required for monitoring interventional surgery. It also provides an efficient way to reduce noise levels in real-time. Computed tomography (CT), imaging is used to monitor surgery in real time. CT imaging reconstructs medical images using multiple X-ray projections of the patient in multiple orientations.

“As we have discussed, CT image acquisition and other xray imaging modalities require that images are acquired for CT or other x-ray imaging modities. This is because there is a balance between radiation dose and signal-to noise ratio. The acquired images can be denoised if low-dose radiation has been used. Different image reconstruction and denoising methods may produce clearer, more understandable images during interventional surgery. Simple averaging filters can be used to process real-time data, but blurred edges or other details are common. Advanced algorithms can also be used to reduce signal dependent noise (e.g. block-matching, block-matching, and 3D filtering clipped BM3Dc), among others. Independent additive noise (e.g. adaptive variational denoising, block-matching, and 3D filtering BM3D), dictionary learn (K-SVD), etc .).”

“The BM3D algorithm, for example, achieves excellent image denoising results. The BM3D method is based upon providing a sparse representation of images in a transform domain. This sparse representation can be enhanced by grouping like 2D image fragments (e.g. image blocks) into 3D dataarrays that are filtered with collaborative filtering. The collaborative filtering produces a 3D estimate that contains filtered image blocks. The filtered blocks are repositioned and averaged over any overlapped blocks. The BM3D algorithm can be extended and improved further, including the BM4D algorithm that uses the same approach to 3D image data.

“Pure data-driven deep learning has been used for CT denoising and other imaging modalities. However, pure data-driven approaches suffer from a lack in flexibility due to their learned dependency to acquisition parameters (e.g. the noise level). Deep learning can also become too computationally costly when applied to large 3D volumes in close real-time.

“The present embodiments concern denoising medical pictures. The following embodiments include methods and apparatuses for machine learning sparse images with deep unfolding, and deploying the machine-learnt network to denoise the medical images. Iterative thresholding can be performed by a deep neural network. Each layer is trained as an iterative shrinkage algorithm. Randomly initialize the deep neural network and train it independently using a patch-based method to learn sparse representations of image data for denoising. The layers of the deep neural networks are rolled into a feed forward network that is trained from end to end. Machine learning sparse image representations using deep unfolding may reduce computational cost, allowing denoising images to be done in real-time.

“In a first embodiment, a method for denoising medical images in a computer-tomography (CT), system is disclosed. This method involves scanning a patient using a CT system to create CT image data. The CT image data is then denoised with an image processor. A deep-learnt multiscale filter network is applied to the CT data to decompose the CT data into sparse representations at different scales. Deep-learnt multiscale network filters, including a number of trained sparse decoding autoencoders. The CT image data is recursively downloaded to generate CT image data. After that, the denoised image data at each scale are resampled to full resolution. Finally, the CT data set with the final denoised image CT data sets is summed. This method also displays an image created from the final denoised CT data set.

“In a second aspect, there is a method for training a deep learning based network for denoising medical images using multiscale sparse representations. The method involves receiving a plurality training image data set at a first scale, and downsampling them into a plurality training image data set at a second. The image processor creates a first deep neural networks with the plurality training image sets at the first and second scales, respectively, to collectively denoize medical images using sparse representations at multiscales. The method also involves upsampling denoised data from the second-scale back to the first and applying a learned linear filter to denoize image data at the first and second scales. To obtain the final denoised image data, a summation is done on the denoised data. Randomly initializing the weights for deep neural networks during training is done. The method then compares the final denoised images with target data to update the weights for the first and second deep networks by backpropagation. Deep-learning-based networks are saved as the trained deep neural network.

“In a third aspect, the system is used to denoise medical images. The system comprises a scanner that can scan an image of a patient, and an image processor that denoises the image using machine-learnt multiscale sparse representations. Multiscale sparse representations of image data include layers of sparse autoencoders that have been trained with image data at various resolutions in an unfolded independent feed forward network. A display is also included to display the denoised patient image.

“The following claims define the invention. Nothing in this section should be construed as limiting those claims. Additional aspects and benefits of the invention will be discussed below, in conjunction with the preferred embodiments. They may be claimed later independently or together.

“Embodiments can be used to denoise image data by machine learning sparse representations of the data using a deep-learning and unfolding approach. Iterative thresholding can be achieved using a deep neural net. This is done by dividing a shrinkage algorithm into layers within the deep neural networks. Each layer of the network corresponds to an iteration in the iterative shrinkage algorithm. Each layer has its own parameters that are initialized and trained separately from the others. Each layer can be used as a multiscale denoising autoencoder, which is a coder that works on different resolutions and scales of image data. Each layer takes the image data and decomposes it into a sparse representation. The threshold coefficients of this representation remove noise from the image data. Finally, the reconstruction is done back into the original denoised representation. Each layer or iteration is then rolled into a feed forward network of multiscale autoencoders. Each layer or scale is independently initialized using image patches.

Iterative thresholding is the basis of deep learning image denoising techniques, such as the multiscale autoencoders discussed above. Iterative thresholding assumes that the learned transform domain (i.e. a dictionary D) contains essential image information. This image information is represented by a small amount of high-magnitude coordinators and that noise is distributed uniformly over the transform domain in a large number low-magnitude coordinators. A non-linearity shrinkage function that applies element-wise to the transform domain is used to denoise the image data. The low-magnitude coefficients are set to zero. The process is repeated iteratively in order to create a clean proximal map of the image data using a set sparse representations.

Traditional deep learning approaches are slow to convergence making them inapplicable for real time applications (e.g., surgical interventions and the like). The Dictionary D parameters are used to initialize the datasets. This makes it difficult for the trained network to adapt to the specific datasets. These embodiments offer a method of deep learning that overcomes some of the limitations of traditional deep learning methods. They do this by dividing the thresholding iterations into independent, trainable and randomly initialized layers. This could be used to create networks multiscale denoising automaticencoders or other deep learning networks. To reduce the computational cost of denoising image data, a multiscale patch based sparse representation is learned. Image processor speed is improved by reducing the computational cost, which allows for real-time scanning or denoising. Furthermore, images can be displayed and reconstructed with greater accuracy. This allows for more precise diagnosis and treatment. Patients are safer because radiation doses are lower.

The following description is given with reference to 3D CT volumes for denoising, but the same principles can be applied to any 2D or 3D imaging modalities. Machine learning techniques can be applied to 2D imaging modalities, such as processing image data slide-by-slide using 2D decomposition filter. 3D decomposition filter is used to learn the sparse representations of 3D volumes. The 3D decomposition filter can provide better results than using volumetric information, but it is computationally more expensive. The machine-learnt network’s denoising capabilities are described in relation to medical images (e.g. X-Ray fluoroscopy images and 2D CT slices, 3D volumes of CT volumes, and the like). However, these embodiments can be applied to all types of image data.

“Referring back to the example of 3D CT volume. The denoising problem can be expressed as the estimation of hidden image x in function of noisy scan y:ny=x (1)\nwhere ? (1)nwhere? represents the noise introduced to the scan. The noise could be, for example, The noise is not just white noise, as a low-pass kernel is used during CT reconstruction to transform the noise into a texture. It is also difficult to get a statistical description of noise in the image domain because it is not Gaussian in the raw measurement domain.

Deep-learning-based networks can be used to remove noise from 3D CT images and solve denoising problems. The network learns a sparse representation base (i.e., Dictionary D with image decomposition filters) by mapping corrupted input data to the corresponding optimal features for detecting denoising in a transform domain. To further strengthen network learning, adaptively learned threshold function values are used to denoising in the transform domain. The network is trained from real high-dose CT scans. Synthetic noise is used to simulate low-dose scans. Multiscale or layered approaches are used to capture important features on different scales and process large CT volumes quickly. Recursively, the CT volume data are downsampled, used with a denoising operation of a constant size per layer, and then trained at each scale independently.

“In different embodiments, each layer of a network is trained with a demoising autoencoder to determine the layer’s scale. A denoising autorcoder is generally a neural network (N), which has been trained using image pairs (y,x) through supervised learning. The denoising self-encoder is trained in order to convert noisy input y into transform domains and reconstruct input as close as possible to ground truth image (x) by removing noise?. The denoising self-encoder extracts the relevant features from noisy input y to reconstruct ground truth image (x). The mapping can be expressed as follows: ncircumflex (x) (y),?x? (2)nA denoising autorecoder is an example supervised learning. This allows the network to learn how to reconstruct noisy inputs using the ground truth images (i.e. a clean image). The network algorithm uses supervised learning to learn the actual statistics of noise, rather than using an approximate model. If the ground-truth image data is from a high-dose clinical scan, there will be noise in the ground data. This allows the network algorithm learn to denoise an output while keeping the noise texture of ground truth data. The preservation of the noise texture allows for reconstruction of natural-looking images that have a higher perceived quality.

Traditional methods of learning noise models may have the disadvantage of being tied to a particular dose or scanner setting. It is difficult to deploy a trained network in a clinical environment, where doses can be adjusted routinely (e.g. to adjust a dose for the patient’s mass etc.). The network adjusts to noise levels by denoising sparse images and changing the threshold values for the coefficients of the transform domain. An autoencoder that transforms domain denoisers may be described as: ncircumflex above (x) =W?h(Wy). (3)nWhere W is a convolutional decomposition operator that can be trained, and W? W is a trainable reconstruction operation and h is a sparsity-inducing activation operator. By requiring that the reconstruction operator W=WT be used, the number of parameters available is decreased. This drives W towards a narrow frame. The trained network can operate on any scan setting and dose by removing sparse representations of images. One network can be used for many scans and different patients.

“FIG. 1. illustrates an example for a spare denoising self-encoder to denoising sparse images. Refer to FIG. 1. Referring to FIG.

FIG. 1. Input 101 is a noisy image In, and output 103 a denoised Ir image. The input 101 is corrupted to train the denoising coder 100. This could be done by adding noise such as modeling noise from low-dose CT scans or a noise distribution that mimics real-world noise. Ground truth data is used in a pair of training images to represent the uncorrupted input. Decomposition block 101 maps the corrupted input 101 to a sparse representation of the image using a variety of trainable weights. To remove noise, the sparse representation of the image is thresholded using thresholding block107’s trainable thresholding filter. A reconstruction block 109 maps to output 103. This reconstructs the sparse image representation that has been denoised. Reconstructed output 101 is in the same format as input 101, using the reconstruction block109 of the same shape and decomposition block105. This results in denoised image Ir. To better adapt the parameters to 101 input data, the Dictionary D and thresholding functions of thresholding block107 are randomly initialized during training.

“The decomposition block105 is used as an edge filter to generate sparse images in a transfer domain. The initial filter weights for the edge filter are generated from a random zero mean Gaussian distribution. They do not have any distinct structure, and they are trained to adapt to the training datasets (i.e. noisy images In), as long as the loss decreases over training. Nearly all of the filter coefficients are trained and adjusted, with clear edge structures visible in the transfer domain.

“The thresholding block107 is used to remove noise. As discussed below, the thresholding functions can be provided as shrinkage function, such as a nonnegative garrote function. The shrinkage function takes the sparse representation for each input and shrinks filter coefficients according the noise level (e.g., each input has a different noise level as measured by the standard deviation of the ground truth). The denoising network is able to adapt to different noise levels. In the transfer domain, for example, there is a small number of strong coefficients that correspond to edge structures, and a larger number of weaker coefficients that correspond to noise. The thresholding function reduces noise by setting the smaller coefficients at zero.

“Referring to FIG. 1. The denoising autoencoder 100 has been trained on the noise image input 101. The decomposition, thresholding, and reconstruction blocks 105 are then trained for each patch. This is done by repeating training for each patch for multiple training images within a training dataset. Sharp edges are better reconstructed when the autoencoder 100 is trained with image patches. Image patches can be a compromise. Although the denoising algorithm 100 can be trained on larger images (e.g. the entire image), it will reduce noise better. However, processing large patches requires more computation and may not be suitable for real-time applications. To increase processing speed, any size patch can be used with the denoising autoencoder100, such as 5×5 or 5×5.

A multiscale transform can be used to reduce noise and computational cost by using smaller patches. An architecture with multiscale decomposition is created to speed 3D processing and adapt to noise levels.

“FIGS. 2A-2B are examples of networks of multiscale autoencoders for denoising. Referring to FIG. FIG. 2A shows a multiscale architecture with multiple denoising automaters 217A. This network 200 is referred to. Each denoising autoencoder 217A is equipped with a decomposition, thresholding and reconstruction block as described above in FIG. 1. Each autoencoder for denoising is trained independently on image patches at a different scale. As a first layer in the network 200, you will find, for example, reconstruction block 209, thresholding block 217B, and decomposition block 205. Recursive block 217B contains one or more levels of denoising self-encoders. Recursive block 217B can be used to represent any number of denoising autoencoders. Each level has a separate trained decomposition block, thresholding and reconstruction block.

“Each level in the network 200 downsamples image data using a low pass filter (LPF), such LPF 213 by a factor two as shown in block 215. Other factors can be used. The same patch size is used for each level of subsampling. This allows the downsampled levels of denoising to cover a larger area of the image. An input noise level (i.e., noise std.) The thresholding blocks at each level of the multiscale architecture are provided with an input noise level (i.e., noise std.). This allows the network to adapt to different noise levels through deeper learning and incorporating additional information into its learning process.

“Due to downsampling the network is trained using relatively large image patches while still using a simple sparse representation for computational efficiency. The downsampling also means that image regions with lower gradient energy might show stronger edges at lower scales. This makes the filters trained for the different scales very different. While sharing filter weights among scales would reduce computational complexity and trainable parameters, it is better to not share filter parameters among scales. This allows different layers to adapt to each scaling separately, resulting in more accurate denoising, image reconstructions, and so on. The thresholding functions for each scale can also be trained separately.

“To generate the reconstruction output 203 (i.e. the denoised Ir), the summation block 223, which combines the outputs from each scale, is used. Each scale’s outputs are upsampled in the same way as the downsampling operations. Network 200 can train the summation block 223, which is a weighted sum. Each scale’s outputs are subject to additional high-pass and low-pass filters HPF211 and LPF221. After upsampling to block 219 with the same factor, LPF 221 passes low-spatial frequency image data that was denoised at scale original, HPF 211 passes the high spatial frequency data.

Refer to FIG. 2B, the recursive blocks 217B and 217B are expanded to show a three-level network 200 that includes three denoising automatencoders who have been trained at different levels. Recursive block 217B, for example, is replaced with two denoising and low-pass filtering autoencoders. After the input 101 has been downsampled at LPF 213, it is fed into an intermediate-scale autoencoder, which includes a separate trainable decomposition block 223, thresholding blocks 225 and reconstruction blocks 227. The 101-bit downsampled input is then downsampled at LPF 213. It is fed into an intermediate scale autoencoder, which includes another separately trainable block 223, thresholding block 235, and reconstruction block 237. The intermediate output can be reconstructed by adding the low and intermediate outputs to summation block 241 following the same procedure as above. Network 200 can train the summation block 241 as a weighted sum. The HPF 229 passes intermediate spatial frequencies image data that has been denoised at a downsampled level, while LPFT 239 passes low spatial frequency data that has been denoised at a downsampled level. This is after upsampling to a downsampled size by the same factor LPF 231.”

“Specifically, low-pass filtering is done by lowpass wavelet degradation performed by convolution at LPF 213, followed by further downsampling to LPF 231. Wavelet reconstruction is achieved by successive upsampling, transposed convolution using LPFT 239 or LPFT 221. The two lower scales are summed using a trainable weighted amount 241, then a summation using the highest scale at the trainable weighted amount 223. The sum of LPFT 221, the HPF reconstruction 211 and the thresholding function realizes almost perfect reconstruction.

Referring to the 3D CT scan example, we will compare traditional CT denosizing to learn the sparse multiscale image representations. Traditional 2D CT denoising was done with filters measuring 17 by 17 pixels. These filters are too computationally costly to be applied to large 3D CT volumes (e.g., at most 5123 voxels). Autoencoders that do not denoize, such as those with multiscale decompositions as shown in FIGS. 2A-2B use recursive downsampling rather than the larger filter size. The traditional 2D filter size would be applied to a 3D CT scan. This would result in a larger filter size (17 by 17 by 17 pixels) Furthermore, each scale can be trained independently, allowing for greater accuracy than traditional CT denoising. This is because each scale uses different filter parameters.

“FIG. 2C shows the smaller patch sizes at three scales. The multiscale sparse code network is used to code at the three scale levels shown in FIG. 2C. Downsampling allows the same size patch to be able to control a greater portion of the input 201. This is illustrated in scales 1-3 in FIG. 2C. 3D CT Example: The multiscale sparse code network uses three levels of decomposition, such as blocks 205, 223, and 233. Each sparse-denoising autoencoder maps input data to hidden representations using a convolutional layer in patches (5 by 5 by 5-voxel patches in 3D). A 2D example uses 25 filter Kernels (5 by 5 pixels in 2D). These 25 filters correspond with the elements of W to learn’s decomposition operator. After obtaining 25 feature maps (e.g. the representation coefficients), they are thresholded using a non-negative garrote operation hgarrote. The input is then reconstructed to the same shape by using a transposed Convolutional Layer with the same dictionary filter elements. (e.g. corresponding to the convolution with 25 filters of 5 by 5) Each scale level has a different operator with the same size filter kernels W1. . . Wn is the learned value, where n is the number of scale levels. Filter kernels of size 5 x 5 correspond to the processing of the original network input using filters of 10 x 10 on scale level 2, and 20 x 20 on scale 3 which have a scaling factor of 2.

“The networks in the above-described embodiments are only provided with one thresholding block per branch. The network may be made deeper by integrating interscale or interfeature correlations. However, this will improve performance while still maintaining low noise levels and low computational cost. Interscale denoising can be done after the output of 200 multiscale autoencoders.

“FIG. “FIG. The network 200 is extended by an additional autoencoder 300. This includes an additional filtering block and thresholding block as well as a reconstruction block. Based on the summated output of the network 200, the additional autoencoder 300 applies an extra thresholding block between different scales (e.g. interscaling). Additional thresholding functions denoise again based on input noise level and learn an appropriate scale to the interscaling.

“Additionally the networks described above in FIGS. The layers 1-3 can be trained by deep networks using deep unfolding. This involves stacking multiple layers of a chain and using a deep folding approach. Unfolding is the process of replacing a fixed number of iterations or cycles with a finite structure. FIG. FIG. 4 shows an example of how to learn sparse image representations using deep unfolding. FIG. FIG. 4 shows three iterations, or cycles, of a network multiscale autoencoders that are connected into a feed forward network. A feed-forward network is composed of three layers of autoencoders connected by weighted average between them. The architecture of each iteration 400 to 402 corresponds to a layer in a neural network such as FIG. 200, which is the multiscale decoding network. 2B. 2B. The weighted sum, or recombination, after each layer is calculated between the weighted input of the previous layer, the weighted output of the current layer and the weighted noise input to the entire network.

“Image denoising can be greatly improved by sparse image representation-based algorithms for denoising, formulated as an unfolded-feed-forward neural network. The network can learn more denoising parameters by using a randomly-initialized dictionary and thresholding functions, as discussed above. The 3D CT data example shows how the denoising algorithms are being rolled out into a feed forward network. To run as few iterations possible (e.g. realize convergence after n=3iterations instead of? The n iterations are rolled into a feed forward network as shown in FIG. 4. The feed-forward network then learns to optimize weights and thresholds for each iteration. As a trained convolutional neural net (CNN), the feed-forward network can be used to perform new CT scans.

“FIG. “FIG.5” illustrates an example Garrote thresholding function. The threshold function can be used in accordance with the embodiments described herein as an empirical Wiener filter, which is a Wiener-like filter that does not know the target variance. FIG. FIG. 5 shows a plot of the non-negative garrote function that is used to threshold sparse image representation coefficients. FIG. FIG. 5 illustrates both soft and hard thresholding functions. These functions can also be used in other embodiments. Based on the observation that noise is a major factor in the image data’s sparse representation, the threshold function applies to the image. Large coefficients are primarily responsible for the main image features, such as edges and borders. The thresholding function will denoise the image data by setting the small coefficients at zero. FIG. 4 shows an example of a feed-forward neural network. 4. A separately trained thresholding function is used to set the small transform coefficients to zero for each architecture iteration 402, 406 and keep the large transform coefficients. This results in a denoised estimate of input X0.

FIG. 5 shows the non-negative garrote function. 5. This function may be able to overcome the weaknesses of both the hard and soft thresholding functions. The soft thresholding shrinkage has a larger bias because of the shrinkage large coefficients. Hard thresholding shrinkage, on the other hand, is not continuous and incompatible with backpropagation training. The noise level is a factor that affects the non-negative garrote function. A thresholding level of k is used to force sparsity on each representation transformation coefficient zj by:

“z ^ j = h garrote ? ( z j ) = ( z j 2 – k ? ? ? 2 ) + Z j?, ( 4?)nin to get the thresholded coefficients circumflex above (z)j. The positive part function + can be defined as: nx +=max x, 0? (5)nThe input noise variance?2 allows training and testing at multiple dose settings. The thresholding value, k, is a trainable parameter. To avoid training in the flat region of 0 where backpropagation fails to produce gradients, it should be set very low as an initial value. The non-negative garrote function is used to train neural networks. This allows for the reconstruction and display of clean image data using denoised estimates. The network can adapt to various doses of scanning by using a learnable threshold value proportional the input noise level.

“FIG. FIG. 6 shows an example of how a trained network can be applied to achieve the same results as those discussed previously. FIG. FIG. 6 illustrates how networks that have been trained to denoise CT datasets using multiscale sparse representations of images do not produce texture artifacts as compared with previous methods (e.g. the prior art BM3D technique, which is discussed in the background). Rows A, B, and C show different images taken with a CT scanner. Column 1 contains ground truth data that can be used for evaluation and training. Ground truth data is captured using high dose scans (e.g., 100% dose), which results in high signal-to noise (SNR). Column 2 shows a lower dose scan (e.g. 30% dose), which results in noisy inputs with a lower ratio of SNR. The low dose simulation is done by using the raw high dose scan, and then introducing noise to the image based upon the imaging modality (e.g. Poisson’s noise or the like). Multiscale networks can learn sparse image mappings from high-dose imaging data that has been artificially corrupted without the need for prior information about the noise models.

Effective denoising algorithms are designed to reconstruct sharp edges of organ borders, medical instruments, and other objects. The reconstructed edges can often provide valuable information to physicians regarding the patient. Another goal of efficient denoising algorithms are to avoid artifacts in the image data. These include ringing effects along edges and splotchy or other artificial structures. Image artifacts can be a problem because they may hinder a physician’s ability or decision making based on the reconstructed images. FIG. FIG. 6 shows the results of applying both 3D and 2D multiscale sparse code networks to denoising synthetically-corroded CT slices/volumes. Columns 3, 4, and 5 show the results of applying a 2-dimensional multiscale network (e.g. FIG. 4), a 3D multiscale network (e.g., FIG. 4) and the prior art BM3D approach, respectively. The results of the 2D/3D multiscale network and prior art BM3D approach are comparable. Multiscale networks produce sharper edges and less artifacts in the reconstructed images. Row C column 4 arrows indicate more obvious boundaries/edges that were reconstructed using the 3D multiscale networks. Row B, column 5 also shows texture artifacts that were created by the prior art BM3D approach (i.e. vertical streak artifacts). These artifacts are not present in images reconstructed using the 2D/3D multiscale network (columns 3, and 4).

“In further embodiments based on the disclosure, the networks can be trained using an additional noise modeling for the imaging modeality. FIG. FIG. 7A shows a network consisting of autoencoders who have been trained using a noise model. FIG. FIG. 7A shows three autoencoders 700 to 702 704 in a feed forward network. Each autoencoder is weighted averaged. Other iterations are possible. FIG. FIG. 7A shows that a noise model is used as an input to each threshold block of the network of automatic encoders. The dose settings can be modified from scan to scan after the network has been trained. The noise model is included in training so that the autoencoders can be noise-aware and learn denoising parameters based upon the scan dose from the training datasets. The noise model can be used as an input to deeper learning. Based on real-world physics, the noise model can be presented as Poisson’s Noise. You can also use Gaussian noise, input scan, and dose settings.

“In further embodiments the networks can be trained using scan-specific metadata for the input image data. FIG. FIG. 7B shows a deep learning network that is trained with image acquisition metadata. This includes scan settings such as dose and x-ray source voltage. FIG. FIG. 7B shows a multiscale deep neural system, each layer being trained using scan metadata and input image data. The multiscale deep neural networks provide deeper learning, allowing the network adapt to different scan parameters. The deeper learning allows for the addition of fully connected controller networks 707, 709 and 711 to replace each autoencoder (e.g. FIG. 2B). FIG. FIG. 7B shows the output thresholds of each filter. For example, 75 trained filters with 25 filters for each scale. The fully connected controller network 705 inputs include the three image reconstruction kernel parameters (e.g. rho50 rho10 and and rho2), and the dose level. Other inputs include scanner geometry, through plan, slice size and spacing, as well as other metadata. For deeper learning and improved adaptability to a wider range imaging parameters such as region-of-interest (e.g. affecting image contrast), reconstruction resolution (e.g. affecting noise texture), and tube voltage (e.g.), additional or different inputs can be provided.

“In embodiments of training networks with or with the additional noise model imaging modality deployed and scan particular metadata for the input data set as discussed above with regard to FIGS. Further learning can be achieved by using 7A-7B. FIG. FIG. 8 shows an example of deep learning for fully connected controller networks. This example shows how a deeper network can be created by learning additional features in transform domains before reconstructing denoised image data at each scale. FIG. 8 shows an example of this. 8 shows an autoencoder that transforms noisy input 1 into output 2, with only one layer in its transform domain. The right side is FIG. FIG. 8 shows a deeper network that transforms noisy input 1 into output 3. An additional layer 3 is added to the transform domain. The transform domain may contain additional layers. Learning ability of the network is enhanced by adding layers to transform domains. FIG. 8 shows an example. 8 shows how the controller learns thresholds at a second level. The network learns 25 features per layer in this example. Therefore, the second layer doubles its learning abilities. The network’s ability to adapt to scans of different scans can be improved by adding layers to deepen learning.

“Each of these embodiments can be provided as a network trained in residual correction as shown in FIG. 9. The network is now trained to correct the image data, rather than directly on image data. This means that instead of training the network on how to extract the image data from the image, the network is taught to remove the noise from the image data. If there is no noise in a dataset, then it is not necessary to train the network to reconstruct image data. Training on corrections may be a faster option.

“FIG. 10. This is a method to denoise medical images using a computed-tomography (CT). FIG. 12. (discussed below), and/or another system. You may also be provided with additional, different or fewer acts.

“Act 1002 is when a CT scanner scans a patient to create CT image data. CT scans are performed using radiation that results in noise in the CT data. A CT scan can be performed in two or three dimensions, depending on whether it is done during an interventional surgery. The CT scan may also be captured using different doses and scan settings.

“At act 1004, CT image data are decomposed by an image processor. An image processor uses a deep-learnt network of filters to decompose the CT image data into sparse representations at multiple scales. As a trained network of sparse-denoising autoencoders and other deep learning networks, the deep-learnt multiscale filter network is used. Lower levels of deep-learnt network apply learned filters to image data that has been recursively reduced from CT images. This is in contrast to the previous levels of deep-learnt multiscale neural networks.

“The filters were independently trained and initialized randomly prior to training. Additionally, the threshold values of sparse-denoising autoencoders could be learned separately at each scale. For example, the filter weights were initialized using a random zero-mean Gaussian distribution. This provided no structure that could be used to better adapt the filter weights for training datasets. Independent learning enabled the parameters to be optimally adjusted to each scale’s training datasets. The filters were further trained using patches at each scale.

“The denoising learned filter maps the CT image data patch from the patient to sparse representations. The sparse representations are thresholded using a thresholding function in order to remove noise. Each autoencoder has a separate trained decomposition filter to transform an input into a sparse representation at various scales. A separately trained thresholding function is also available for each scale to remove any noise from the sparse representations. Finally, each block contains a corresponding reconstruction block to reconstruct a clean version. As inputs, the autoencoders might receive concatenation data from CT pixel and metadata.

“At act 1006, an image from the denoised CT data is rendered and displayed to a user. To update an image displayed during surgery, acts 1002, 1004 or 1006 can be repeated.

“FIG. 11. This illustrates how to train a deep-learning-based network for denoising medical images using multiscale sparse representations. FIG. 12. (discussed below), and/or another system. You may also be offered additional, different or fewer acts.

“A plurality of training data sets are received at a first scale by an image processor at act 1102. Data capture is done with various scan settings and doses. The training image data sets include a database of high dose CT and MR scans. A copy of the data sets is also modified with synthetic noise to simulate low-dose. A ground truth is a copy of the image data sets that has not been subject to any noise.

“At act 1104, a plurality training data sets with noise is downsampled and converted by the image processor to a plurality training image sets at a second level and into a plurality training image sets at a third level. Downsampling, for example, reduces the scale by two. For example, a first scale is 512; a second scale is 256; a third scale is 128, etc. You may also use downsampling, factors or scales in other ways.

“Act 1106 allows the image processor to train a first deep neural system with a plurality training image sets at the first level, a second neural network with a plurality training image sets at second scale, and finally a third neural network with a plurality training image sets at third scale. This is used to collectively denoise an image using sparse representations at multiscale scales. The parameters of deep neural networks are, for example, randomly initialized before they go to work. One embodiment trains the lowest scale first, then the intermediate and final scales. This embodiment focuses on increasing the scales from the lower ones. In another embodiment, all three scales can be trained simultaneously. In various embodiments, deep neural networks include multiple layers in deep learning. This includes multiple layers for feature decomposition. An embodiment of deep neural networks is trained to learn noise corrections to the training image data sets rather than the noise coefficients.

“At act 1108, an image processor upsamples denoised data obtained at the second and third scales. The denoised data can be upsampled back at the first scale, for example.

“At act 1101, the image processor applies a learned linear filter to denoised data at the first scale, and denoised data upsampled at the second and third levels. A high-pass filter may be applied to image data at the first and second levels, while a low pass filter will be applied to image data at the third level. You can also use other filters, such as a low pass filter to image data at the second scale.

“At act 11, the image processor adds the filtered data to get final denoised images. The summation is a learned weighted sum. You can also use other summations. You can use other summations, for example, to combine the denoised data from the third and second scales.

“At act 1114 the image processor compares final denoised images with target data to update the weights of deep neural networks. Backpropagation is used to update the weights. All weights in the neural networks can be updated based upon the comparison.

“At act 1116 the trained deep neural network are stored in a memory by the image processor.”

“FIG. “FIG.12” illustrates a block diagram for an example system for denoising medical images. The system comprises a medical scanner 1202, an imaging processor 1204, and a display 1206. The medical scanner 1202 includes the image processor 1204 (or the display 206). Alternately, the image processor 1204 or the display 1202 can be part of a server, workstation, or computer that is connected to the medical scanner 1202.

The medical scanner 1202 can scan a patient. The scanner 1202 contains controllers, pulse generators and a radio frequency system. It also includes coils for CT, MR, or other scanning modalities. The medical scanner can generate a 2-D or 3-D scan of the patient.

The image processor 1204 can be described as a general processor, digital signal processor, graphics processor unit, application specific integrated device, field programmable array, digital circuit or analog circuit. It also includes other devices for image reconstruction and denoising that are now or later developed. The image processor 1204 can be used as a single device or as a part of a group of devices. Parallel or sequential division of processing can be used for implementations that use more than one device. The processor 1204 could have different functions. For example, one device may do denoising and another performs reconstruction. The image processor 1204 in an embodiment is a control processor, or another processor of the medical scanner 1202. The stored instructions that allow the image processor 1204 to perform various acts are what make it work. The configuration of the image processor 1204 can be done by software, hardware, or firmware.

“The image processor 1204 can denoise an image of a patient using a machine-learnt system that uses multiscale sparse representations. Multiscale sparse representations of the image are created using parameters that were trained on layers of sparse autoencoders with image data at different resolutions. In a feed forward network, the layers of sparse-denoising autoencoders had to be unfolded and then trained separately. The sparse denoising 3D scanners have three-dimensional decomposition filters. You may also have trained the image processor 1204 with deeper learning such as using the scan’s receive noise level or any other metadata to the scan as input for training.

The display 1206 can be used as a CRT or LCD, plasma, projector or printer. Display 1206 is used to display denoised images of patients after reconstruction and denoising.

“Various improvements described in this invention may be combined or used separately. While the accompanying drawings have been used to illustrate the invention, it should be understood that one skilled in the art may make other modifications or changes to these embodiments without departing from its scope or spirit.

Summary for “Denoising medical imagery by learning sparse representations of images with a deep unfolding method”

Noise is an inevitable part of image acquisition. In X-ray imaging for example, the reduction of radiation exposure of patients comes at the expense of increased noise in the image. This tradeoff is particularly apparent when multiple images are taken at once, as in the case of monitoring interventional surgery (e.g., cardiac catheterization using X-ray fluoroscopy). A high-quality reconstruction is required for monitoring interventional surgery. It also provides an efficient way to reduce noise levels in real-time. Computed tomography (CT), imaging is used to monitor surgery in real time. CT imaging reconstructs medical images using multiple X-ray projections of the patient in multiple orientations.

“As we have discussed, CT image acquisition and other xray imaging modalities require that images are acquired for CT or other x-ray imaging modities. This is because there is a balance between radiation dose and signal-to noise ratio. The acquired images can be denoised if low-dose radiation has been used. Different image reconstruction and denoising methods may produce clearer, more understandable images during interventional surgery. Simple averaging filters can be used to process real-time data, but blurred edges or other details are common. Advanced algorithms can also be used to reduce signal dependent noise (e.g. block-matching, block-matching, and 3D filtering clipped BM3Dc), among others. Independent additive noise (e.g. adaptive variational denoising, block-matching, and 3D filtering BM3D), dictionary learn (K-SVD), etc .).”

“The BM3D algorithm, for example, achieves excellent image denoising results. The BM3D method is based upon providing a sparse representation of images in a transform domain. This sparse representation can be enhanced by grouping like 2D image fragments (e.g. image blocks) into 3D dataarrays that are filtered with collaborative filtering. The collaborative filtering produces a 3D estimate that contains filtered image blocks. The filtered blocks are repositioned and averaged over any overlapped blocks. The BM3D algorithm can be extended and improved further, including the BM4D algorithm that uses the same approach to 3D image data.

“Pure data-driven deep learning has been used for CT denoising and other imaging modalities. However, pure data-driven approaches suffer from a lack in flexibility due to their learned dependency to acquisition parameters (e.g. the noise level). Deep learning can also become too computationally costly when applied to large 3D volumes in close real-time.

“The present embodiments concern denoising medical pictures. The following embodiments include methods and apparatuses for machine learning sparse images with deep unfolding, and deploying the machine-learnt network to denoise the medical images. Iterative thresholding can be performed by a deep neural network. Each layer is trained as an iterative shrinkage algorithm. Randomly initialize the deep neural network and train it independently using a patch-based method to learn sparse representations of image data for denoising. The layers of the deep neural networks are rolled into a feed forward network that is trained from end to end. Machine learning sparse image representations using deep unfolding may reduce computational cost, allowing denoising images to be done in real-time.

“In a first embodiment, a method for denoising medical images in a computer-tomography (CT), system is disclosed. This method involves scanning a patient using a CT system to create CT image data. The CT image data is then denoised with an image processor. A deep-learnt multiscale filter network is applied to the CT data to decompose the CT data into sparse representations at different scales. Deep-learnt multiscale network filters, including a number of trained sparse decoding autoencoders. The CT image data is recursively downloaded to generate CT image data. After that, the denoised image data at each scale are resampled to full resolution. Finally, the CT data set with the final denoised image CT data sets is summed. This method also displays an image created from the final denoised CT data set.

“In a second aspect, there is a method for training a deep learning based network for denoising medical images using multiscale sparse representations. The method involves receiving a plurality training image data set at a first scale, and downsampling them into a plurality training image data set at a second. The image processor creates a first deep neural networks with the plurality training image sets at the first and second scales, respectively, to collectively denoize medical images using sparse representations at multiscales. The method also involves upsampling denoised data from the second-scale back to the first and applying a learned linear filter to denoize image data at the first and second scales. To obtain the final denoised image data, a summation is done on the denoised data. Randomly initializing the weights for deep neural networks during training is done. The method then compares the final denoised images with target data to update the weights for the first and second deep networks by backpropagation. Deep-learning-based networks are saved as the trained deep neural network.

“In a third aspect, the system is used to denoise medical images. The system comprises a scanner that can scan an image of a patient, and an image processor that denoises the image using machine-learnt multiscale sparse representations. Multiscale sparse representations of image data include layers of sparse autoencoders that have been trained with image data at various resolutions in an unfolded independent feed forward network. A display is also included to display the denoised patient image.

“The following claims define the invention. Nothing in this section should be construed as limiting those claims. Additional aspects and benefits of the invention will be discussed below, in conjunction with the preferred embodiments. They may be claimed later independently or together.

“Embodiments can be used to denoise image data by machine learning sparse representations of the data using a deep-learning and unfolding approach. Iterative thresholding can be achieved using a deep neural net. This is done by dividing a shrinkage algorithm into layers within the deep neural networks. Each layer of the network corresponds to an iteration in the iterative shrinkage algorithm. Each layer has its own parameters that are initialized and trained separately from the others. Each layer can be used as a multiscale denoising autoencoder, which is a coder that works on different resolutions and scales of image data. Each layer takes the image data and decomposes it into a sparse representation. The threshold coefficients of this representation remove noise from the image data. Finally, the reconstruction is done back into the original denoised representation. Each layer or iteration is then rolled into a feed forward network of multiscale autoencoders. Each layer or scale is independently initialized using image patches.

Iterative thresholding is the basis of deep learning image denoising techniques, such as the multiscale autoencoders discussed above. Iterative thresholding assumes that the learned transform domain (i.e. a dictionary D) contains essential image information. This image information is represented by a small amount of high-magnitude coordinators and that noise is distributed uniformly over the transform domain in a large number low-magnitude coordinators. A non-linearity shrinkage function that applies element-wise to the transform domain is used to denoise the image data. The low-magnitude coefficients are set to zero. The process is repeated iteratively in order to create a clean proximal map of the image data using a set sparse representations.

Traditional deep learning approaches are slow to convergence making them inapplicable for real time applications (e.g., surgical interventions and the like). The Dictionary D parameters are used to initialize the datasets. This makes it difficult for the trained network to adapt to the specific datasets. These embodiments offer a method of deep learning that overcomes some of the limitations of traditional deep learning methods. They do this by dividing the thresholding iterations into independent, trainable and randomly initialized layers. This could be used to create networks multiscale denoising automaticencoders or other deep learning networks. To reduce the computational cost of denoising image data, a multiscale patch based sparse representation is learned. Image processor speed is improved by reducing the computational cost, which allows for real-time scanning or denoising. Furthermore, images can be displayed and reconstructed with greater accuracy. This allows for more precise diagnosis and treatment. Patients are safer because radiation doses are lower.

The following description is given with reference to 3D CT volumes for denoising, but the same principles can be applied to any 2D or 3D imaging modalities. Machine learning techniques can be applied to 2D imaging modalities, such as processing image data slide-by-slide using 2D decomposition filter. 3D decomposition filter is used to learn the sparse representations of 3D volumes. The 3D decomposition filter can provide better results than using volumetric information, but it is computationally more expensive. The machine-learnt network’s denoising capabilities are described in relation to medical images (e.g. X-Ray fluoroscopy images and 2D CT slices, 3D volumes of CT volumes, and the like). However, these embodiments can be applied to all types of image data.

“Referring back to the example of 3D CT volume. The denoising problem can be expressed as the estimation of hidden image x in function of noisy scan y:ny=x (1)\nwhere ? (1)nwhere? represents the noise introduced to the scan. The noise could be, for example, The noise is not just white noise, as a low-pass kernel is used during CT reconstruction to transform the noise into a texture. It is also difficult to get a statistical description of noise in the image domain because it is not Gaussian in the raw measurement domain.

Deep-learning-based networks can be used to remove noise from 3D CT images and solve denoising problems. The network learns a sparse representation base (i.e., Dictionary D with image decomposition filters) by mapping corrupted input data to the corresponding optimal features for detecting denoising in a transform domain. To further strengthen network learning, adaptively learned threshold function values are used to denoising in the transform domain. The network is trained from real high-dose CT scans. Synthetic noise is used to simulate low-dose scans. Multiscale or layered approaches are used to capture important features on different scales and process large CT volumes quickly. Recursively, the CT volume data are downsampled, used with a denoising operation of a constant size per layer, and then trained at each scale independently.

“In different embodiments, each layer of a network is trained with a demoising autoencoder to determine the layer’s scale. A denoising autorcoder is generally a neural network (N), which has been trained using image pairs (y,x) through supervised learning. The denoising self-encoder is trained in order to convert noisy input y into transform domains and reconstruct input as close as possible to ground truth image (x) by removing noise?. The denoising self-encoder extracts the relevant features from noisy input y to reconstruct ground truth image (x). The mapping can be expressed as follows: ncircumflex (x) (y),?x? (2)nA denoising autorecoder is an example supervised learning. This allows the network to learn how to reconstruct noisy inputs using the ground truth images (i.e. a clean image). The network algorithm uses supervised learning to learn the actual statistics of noise, rather than using an approximate model. If the ground-truth image data is from a high-dose clinical scan, there will be noise in the ground data. This allows the network algorithm learn to denoise an output while keeping the noise texture of ground truth data. The preservation of the noise texture allows for reconstruction of natural-looking images that have a higher perceived quality.

Traditional methods of learning noise models may have the disadvantage of being tied to a particular dose or scanner setting. It is difficult to deploy a trained network in a clinical environment, where doses can be adjusted routinely (e.g. to adjust a dose for the patient’s mass etc.). The network adjusts to noise levels by denoising sparse images and changing the threshold values for the coefficients of the transform domain. An autoencoder that transforms domain denoisers may be described as: ncircumflex above (x) =W?h(Wy). (3)nWhere W is a convolutional decomposition operator that can be trained, and W? W is a trainable reconstruction operation and h is a sparsity-inducing activation operator. By requiring that the reconstruction operator W=WT be used, the number of parameters available is decreased. This drives W towards a narrow frame. The trained network can operate on any scan setting and dose by removing sparse representations of images. One network can be used for many scans and different patients.

“FIG. 1. illustrates an example for a spare denoising self-encoder to denoising sparse images. Refer to FIG. 1. Referring to FIG.

FIG. 1. Input 101 is a noisy image In, and output 103 a denoised Ir image. The input 101 is corrupted to train the denoising coder 100. This could be done by adding noise such as modeling noise from low-dose CT scans or a noise distribution that mimics real-world noise. Ground truth data is used in a pair of training images to represent the uncorrupted input. Decomposition block 101 maps the corrupted input 101 to a sparse representation of the image using a variety of trainable weights. To remove noise, the sparse representation of the image is thresholded using thresholding block107’s trainable thresholding filter. A reconstruction block 109 maps to output 103. This reconstructs the sparse image representation that has been denoised. Reconstructed output 101 is in the same format as input 101, using the reconstruction block109 of the same shape and decomposition block105. This results in denoised image Ir. To better adapt the parameters to 101 input data, the Dictionary D and thresholding functions of thresholding block107 are randomly initialized during training.

“The decomposition block105 is used as an edge filter to generate sparse images in a transfer domain. The initial filter weights for the edge filter are generated from a random zero mean Gaussian distribution. They do not have any distinct structure, and they are trained to adapt to the training datasets (i.e. noisy images In), as long as the loss decreases over training. Nearly all of the filter coefficients are trained and adjusted, with clear edge structures visible in the transfer domain.

“The thresholding block107 is used to remove noise. As discussed below, the thresholding functions can be provided as shrinkage function, such as a nonnegative garrote function. The shrinkage function takes the sparse representation for each input and shrinks filter coefficients according the noise level (e.g., each input has a different noise level as measured by the standard deviation of the ground truth). The denoising network is able to adapt to different noise levels. In the transfer domain, for example, there is a small number of strong coefficients that correspond to edge structures, and a larger number of weaker coefficients that correspond to noise. The thresholding function reduces noise by setting the smaller coefficients at zero.

“Referring to FIG. 1. The denoising autoencoder 100 has been trained on the noise image input 101. The decomposition, thresholding, and reconstruction blocks 105 are then trained for each patch. This is done by repeating training for each patch for multiple training images within a training dataset. Sharp edges are better reconstructed when the autoencoder 100 is trained with image patches. Image patches can be a compromise. Although the denoising algorithm 100 can be trained on larger images (e.g. the entire image), it will reduce noise better. However, processing large patches requires more computation and may not be suitable for real-time applications. To increase processing speed, any size patch can be used with the denoising autoencoder100, such as 5×5 or 5×5.

A multiscale transform can be used to reduce noise and computational cost by using smaller patches. An architecture with multiscale decomposition is created to speed 3D processing and adapt to noise levels.

“FIGS. 2A-2B are examples of networks of multiscale autoencoders for denoising. Referring to FIG. FIG. 2A shows a multiscale architecture with multiple denoising automaters 217A. This network 200 is referred to. Each denoising autoencoder 217A is equipped with a decomposition, thresholding and reconstruction block as described above in FIG. 1. Each autoencoder for denoising is trained independently on image patches at a different scale. As a first layer in the network 200, you will find, for example, reconstruction block 209, thresholding block 217B, and decomposition block 205. Recursive block 217B contains one or more levels of denoising self-encoders. Recursive block 217B can be used to represent any number of denoising autoencoders. Each level has a separate trained decomposition block, thresholding and reconstruction block.

“Each level in the network 200 downsamples image data using a low pass filter (LPF), such LPF 213 by a factor two as shown in block 215. Other factors can be used. The same patch size is used for each level of subsampling. This allows the downsampled levels of denoising to cover a larger area of the image. An input noise level (i.e., noise std.) The thresholding blocks at each level of the multiscale architecture are provided with an input noise level (i.e., noise std.). This allows the network to adapt to different noise levels through deeper learning and incorporating additional information into its learning process.

“Due to downsampling the network is trained using relatively large image patches while still using a simple sparse representation for computational efficiency. The downsampling also means that image regions with lower gradient energy might show stronger edges at lower scales. This makes the filters trained for the different scales very different. While sharing filter weights among scales would reduce computational complexity and trainable parameters, it is better to not share filter parameters among scales. This allows different layers to adapt to each scaling separately, resulting in more accurate denoising, image reconstructions, and so on. The thresholding functions for each scale can also be trained separately.

“To generate the reconstruction output 203 (i.e. the denoised Ir), the summation block 223, which combines the outputs from each scale, is used. Each scale’s outputs are upsampled in the same way as the downsampling operations. Network 200 can train the summation block 223, which is a weighted sum. Each scale’s outputs are subject to additional high-pass and low-pass filters HPF211 and LPF221. After upsampling to block 219 with the same factor, LPF 221 passes low-spatial frequency image data that was denoised at scale original, HPF 211 passes the high spatial frequency data.

Refer to FIG. 2B, the recursive blocks 217B and 217B are expanded to show a three-level network 200 that includes three denoising automatencoders who have been trained at different levels. Recursive block 217B, for example, is replaced with two denoising and low-pass filtering autoencoders. After the input 101 has been downsampled at LPF 213, it is fed into an intermediate-scale autoencoder, which includes a separate trainable decomposition block 223, thresholding blocks 225 and reconstruction blocks 227. The 101-bit downsampled input is then downsampled at LPF 213. It is fed into an intermediate scale autoencoder, which includes another separately trainable block 223, thresholding block 235, and reconstruction block 237. The intermediate output can be reconstructed by adding the low and intermediate outputs to summation block 241 following the same procedure as above. Network 200 can train the summation block 241 as a weighted sum. The HPF 229 passes intermediate spatial frequencies image data that has been denoised at a downsampled level, while LPFT 239 passes low spatial frequency data that has been denoised at a downsampled level. This is after upsampling to a downsampled size by the same factor LPF 231.”

“Specifically, low-pass filtering is done by lowpass wavelet degradation performed by convolution at LPF 213, followed by further downsampling to LPF 231. Wavelet reconstruction is achieved by successive upsampling, transposed convolution using LPFT 239 or LPFT 221. The two lower scales are summed using a trainable weighted amount 241, then a summation using the highest scale at the trainable weighted amount 223. The sum of LPFT 221, the HPF reconstruction 211 and the thresholding function realizes almost perfect reconstruction.

Referring to the 3D CT scan example, we will compare traditional CT denosizing to learn the sparse multiscale image representations. Traditional 2D CT denoising was done with filters measuring 17 by 17 pixels. These filters are too computationally costly to be applied to large 3D CT volumes (e.g., at most 5123 voxels). Autoencoders that do not denoize, such as those with multiscale decompositions as shown in FIGS. 2A-2B use recursive downsampling rather than the larger filter size. The traditional 2D filter size would be applied to a 3D CT scan. This would result in a larger filter size (17 by 17 by 17 pixels) Furthermore, each scale can be trained independently, allowing for greater accuracy than traditional CT denoising. This is because each scale uses different filter parameters.

“FIG. 2C shows the smaller patch sizes at three scales. The multiscale sparse code network is used to code at the three scale levels shown in FIG. 2C. Downsampling allows the same size patch to be able to control a greater portion of the input 201. This is illustrated in scales 1-3 in FIG. 2C. 3D CT Example: The multiscale sparse code network uses three levels of decomposition, such as blocks 205, 223, and 233. Each sparse-denoising autoencoder maps input data to hidden representations using a convolutional layer in patches (5 by 5 by 5-voxel patches in 3D). A 2D example uses 25 filter Kernels (5 by 5 pixels in 2D). These 25 filters correspond with the elements of W to learn’s decomposition operator. After obtaining 25 feature maps (e.g. the representation coefficients), they are thresholded using a non-negative garrote operation hgarrote. The input is then reconstructed to the same shape by using a transposed Convolutional Layer with the same dictionary filter elements. (e.g. corresponding to the convolution with 25 filters of 5 by 5) Each scale level has a different operator with the same size filter kernels W1. . . Wn is the learned value, where n is the number of scale levels. Filter kernels of size 5 x 5 correspond to the processing of the original network input using filters of 10 x 10 on scale level 2, and 20 x 20 on scale 3 which have a scaling factor of 2.

“The networks in the above-described embodiments are only provided with one thresholding block per branch. The network may be made deeper by integrating interscale or interfeature correlations. However, this will improve performance while still maintaining low noise levels and low computational cost. Interscale denoising can be done after the output of 200 multiscale autoencoders.

“FIG. “FIG. The network 200 is extended by an additional autoencoder 300. This includes an additional filtering block and thresholding block as well as a reconstruction block. Based on the summated output of the network 200, the additional autoencoder 300 applies an extra thresholding block between different scales (e.g. interscaling). Additional thresholding functions denoise again based on input noise level and learn an appropriate scale to the interscaling.

“Additionally the networks described above in FIGS. The layers 1-3 can be trained by deep networks using deep unfolding. This involves stacking multiple layers of a chain and using a deep folding approach. Unfolding is the process of replacing a fixed number of iterations or cycles with a finite structure. FIG. FIG. 4 shows an example of how to learn sparse image representations using deep unfolding. FIG. FIG. 4 shows three iterations, or cycles, of a network multiscale autoencoders that are connected into a feed forward network. A feed-forward network is composed of three layers of autoencoders connected by weighted average between them. The architecture of each iteration 400 to 402 corresponds to a layer in a neural network such as FIG. 200, which is the multiscale decoding network. 2B. 2B. The weighted sum, or recombination, after each layer is calculated between the weighted input of the previous layer, the weighted output of the current layer and the weighted noise input to the entire network.

“Image denoising can be greatly improved by sparse image representation-based algorithms for denoising, formulated as an unfolded-feed-forward neural network. The network can learn more denoising parameters by using a randomly-initialized dictionary and thresholding functions, as discussed above. The 3D CT data example shows how the denoising algorithms are being rolled out into a feed forward network. To run as few iterations possible (e.g. realize convergence after n=3iterations instead of? The n iterations are rolled into a feed forward network as shown in FIG. 4. The feed-forward network then learns to optimize weights and thresholds for each iteration. As a trained convolutional neural net (CNN), the feed-forward network can be used to perform new CT scans.

“FIG. “FIG.5” illustrates an example Garrote thresholding function. The threshold function can be used in accordance with the embodiments described herein as an empirical Wiener filter, which is a Wiener-like filter that does not know the target variance. FIG. FIG. 5 shows a plot of the non-negative garrote function that is used to threshold sparse image representation coefficients. FIG. FIG. 5 illustrates both soft and hard thresholding functions. These functions can also be used in other embodiments. Based on the observation that noise is a major factor in the image data’s sparse representation, the threshold function applies to the image. Large coefficients are primarily responsible for the main image features, such as edges and borders. The thresholding function will denoise the image data by setting the small coefficients at zero. FIG. 4 shows an example of a feed-forward neural network. 4. A separately trained thresholding function is used to set the small transform coefficients to zero for each architecture iteration 402, 406 and keep the large transform coefficients. This results in a denoised estimate of input X0.

FIG. 5 shows the non-negative garrote function. 5. This function may be able to overcome the weaknesses of both the hard and soft thresholding functions. The soft thresholding shrinkage has a larger bias because of the shrinkage large coefficients. Hard thresholding shrinkage, on the other hand, is not continuous and incompatible with backpropagation training. The noise level is a factor that affects the non-negative garrote function. A thresholding level of k is used to force sparsity on each representation transformation coefficient zj by:

“z ^ j = h garrote ? ( z j ) = ( z j 2 – k ? ? ? 2 ) + Z j?, ( 4?)nin to get the thresholded coefficients circumflex above (z)j. The positive part function + can be defined as: nx +=max x, 0? (5)nThe input noise variance?2 allows training and testing at multiple dose settings. The thresholding value, k, is a trainable parameter. To avoid training in the flat region of 0 where backpropagation fails to produce gradients, it should be set very low as an initial value. The non-negative garrote function is used to train neural networks. This allows for the reconstruction and display of clean image data using denoised estimates. The network can adapt to various doses of scanning by using a learnable threshold value proportional the input noise level.

“FIG. FIG. 6 shows an example of how a trained network can be applied to achieve the same results as those discussed previously. FIG. FIG. 6 illustrates how networks that have been trained to denoise CT datasets using multiscale sparse representations of images do not produce texture artifacts as compared with previous methods (e.g. the prior art BM3D technique, which is discussed in the background). Rows A, B, and C show different images taken with a CT scanner. Column 1 contains ground truth data that can be used for evaluation and training. Ground truth data is captured using high dose scans (e.g., 100% dose), which results in high signal-to noise (SNR). Column 2 shows a lower dose scan (e.g. 30% dose), which results in noisy inputs with a lower ratio of SNR. The low dose simulation is done by using the raw high dose scan, and then introducing noise to the image based upon the imaging modality (e.g. Poisson’s noise or the like). Multiscale networks can learn sparse image mappings from high-dose imaging data that has been artificially corrupted without the need for prior information about the noise models.

Effective denoising algorithms are designed to reconstruct sharp edges of organ borders, medical instruments, and other objects. The reconstructed edges can often provide valuable information to physicians regarding the patient. Another goal of efficient denoising algorithms are to avoid artifacts in the image data. These include ringing effects along edges and splotchy or other artificial structures. Image artifacts can be a problem because they may hinder a physician’s ability or decision making based on the reconstructed images. FIG. FIG. 6 shows the results of applying both 3D and 2D multiscale sparse code networks to denoising synthetically-corroded CT slices/volumes. Columns 3, 4, and 5 show the results of applying a 2-dimensional multiscale network (e.g. FIG. 4), a 3D multiscale network (e.g., FIG. 4) and the prior art BM3D approach, respectively. The results of the 2D/3D multiscale network and prior art BM3D approach are comparable. Multiscale networks produce sharper edges and less artifacts in the reconstructed images. Row C column 4 arrows indicate more obvious boundaries/edges that were reconstructed using the 3D multiscale networks. Row B, column 5 also shows texture artifacts that were created by the prior art BM3D approach (i.e. vertical streak artifacts). These artifacts are not present in images reconstructed using the 2D/3D multiscale network (columns 3, and 4).

“In further embodiments based on the disclosure, the networks can be trained using an additional noise modeling for the imaging modeality. FIG. FIG. 7A shows a network consisting of autoencoders who have been trained using a noise model. FIG. FIG. 7A shows three autoencoders 700 to 702 704 in a feed forward network. Each autoencoder is weighted averaged. Other iterations are possible. FIG. FIG. 7A shows that a noise model is used as an input to each threshold block of the network of automatic encoders. The dose settings can be modified from scan to scan after the network has been trained. The noise model is included in training so that the autoencoders can be noise-aware and learn denoising parameters based upon the scan dose from the training datasets. The noise model can be used as an input to deeper learning. Based on real-world physics, the noise model can be presented as Poisson’s Noise. You can also use Gaussian noise, input scan, and dose settings.

“In further embodiments the networks can be trained using scan-specific metadata for the input image data. FIG. FIG. 7B shows a deep learning network that is trained with image acquisition metadata. This includes scan settings such as dose and x-ray source voltage. FIG. FIG. 7B shows a multiscale deep neural system, each layer being trained using scan metadata and input image data. The multiscale deep neural networks provide deeper learning, allowing the network adapt to different scan parameters. The deeper learning allows for the addition of fully connected controller networks 707, 709 and 711 to replace each autoencoder (e.g. FIG. 2B). FIG. FIG. 7B shows the output thresholds of each filter. For example, 75 trained filters with 25 filters for each scale. The fully connected controller network 705 inputs include the three image reconstruction kernel parameters (e.g. rho50 rho10 and and rho2), and the dose level. Other inputs include scanner geometry, through plan, slice size and spacing, as well as other metadata. For deeper learning and improved adaptability to a wider range imaging parameters such as region-of-interest (e.g. affecting image contrast), reconstruction resolution (e.g. affecting noise texture), and tube voltage (e.g.), additional or different inputs can be provided.

“In embodiments of training networks with or with the additional noise model imaging modality deployed and scan particular metadata for the input data set as discussed above with regard to FIGS. Further learning can be achieved by using 7A-7B. FIG. FIG. 8 shows an example of deep learning for fully connected controller networks. This example shows how a deeper network can be created by learning additional features in transform domains before reconstructing denoised image data at each scale. FIG. 8 shows an example of this. 8 shows an autoencoder that transforms noisy input 1 into output 2, with only one layer in its transform domain. The right side is FIG. FIG. 8 shows a deeper network that transforms noisy input 1 into output 3. An additional layer 3 is added to the transform domain. The transform domain may contain additional layers. Learning ability of the network is enhanced by adding layers to transform domains. FIG. 8 shows an example. 8 shows how the controller learns thresholds at a second level. The network learns 25 features per layer in this example. Therefore, the second layer doubles its learning abilities. The network’s ability to adapt to scans of different scans can be improved by adding layers to deepen learning.

“Each of these embodiments can be provided as a network trained in residual correction as shown in FIG. 9. The network is now trained to correct the image data, rather than directly on image data. This means that instead of training the network on how to extract the image data from the image, the network is taught to remove the noise from the image data. If there is no noise in a dataset, then it is not necessary to train the network to reconstruct image data. Training on corrections may be a faster option.

“FIG. 10. This is a method to denoise medical images using a computed-tomography (CT). FIG. 12. (discussed below), and/or another system. You may also be provided with additional, different or fewer acts.

“Act 1002 is when a CT scanner scans a patient to create CT image data. CT scans are performed using radiation that results in noise in the CT data. A CT scan can be performed in two or three dimensions, depending on whether it is done during an interventional surgery. The CT scan may also be captured using different doses and scan settings.

“At act 1004, CT image data are decomposed by an image processor. An image processor uses a deep-learnt network of filters to decompose the CT image data into sparse representations at multiple scales. As a trained network of sparse-denoising autoencoders and other deep learning networks, the deep-learnt multiscale filter network is used. Lower levels of deep-learnt network apply learned filters to image data that has been recursively reduced from CT images. This is in contrast to the previous levels of deep-learnt multiscale neural networks.

“The filters were independently trained and initialized randomly prior to training. Additionally, the threshold values of sparse-denoising autoencoders could be learned separately at each scale. For example, the filter weights were initialized using a random zero-mean Gaussian distribution. This provided no structure that could be used to better adapt the filter weights for training datasets. Independent learning enabled the parameters to be optimally adjusted to each scale’s training datasets. The filters were further trained using patches at each scale.

“The denoising learned filter maps the CT image data patch from the patient to sparse representations. The sparse representations are thresholded using a thresholding function in order to remove noise. Each autoencoder has a separate trained decomposition filter to transform an input into a sparse representation at various scales. A separately trained thresholding function is also available for each scale to remove any noise from the sparse representations. Finally, each block contains a corresponding reconstruction block to reconstruct a clean version. As inputs, the autoencoders might receive concatenation data from CT pixel and metadata.

“At act 1006, an image from the denoised CT data is rendered and displayed to a user. To update an image displayed during surgery, acts 1002, 1004 or 1006 can be repeated.

“FIG. 11. This illustrates how to train a deep-learning-based network for denoising medical images using multiscale sparse representations. FIG. 12. (discussed below), and/or another system. You may also be offered additional, different or fewer acts.

“A plurality of training data sets are received at a first scale by an image processor at act 1102. Data capture is done with various scan settings and doses. The training image data sets include a database of high dose CT and MR scans. A copy of the data sets is also modified with synthetic noise to simulate low-dose. A ground truth is a copy of the image data sets that has not been subject to any noise.

“At act 1104, a plurality training data sets with noise is downsampled and converted by the image processor to a plurality training image sets at a second level and into a plurality training image sets at a third level. Downsampling, for example, reduces the scale by two. For example, a first scale is 512; a second scale is 256; a third scale is 128, etc. You may also use downsampling, factors or scales in other ways.

“Act 1106 allows the image processor to train a first deep neural system with a plurality training image sets at the first level, a second neural network with a plurality training image sets at second scale, and finally a third neural network with a plurality training image sets at third scale. This is used to collectively denoise an image using sparse representations at multiscale scales. The parameters of deep neural networks are, for example, randomly initialized before they go to work. One embodiment trains the lowest scale first, then the intermediate and final scales. This embodiment focuses on increasing the scales from the lower ones. In another embodiment, all three scales can be trained simultaneously. In various embodiments, deep neural networks include multiple layers in deep learning. This includes multiple layers for feature decomposition. An embodiment of deep neural networks is trained to learn noise corrections to the training image data sets rather than the noise coefficients.

“At act 1108, an image processor upsamples denoised data obtained at the second and third scales. The denoised data can be upsampled back at the first scale, for example.

“At act 1101, the image processor applies a learned linear filter to denoised data at the first scale, and denoised data upsampled at the second and third levels. A high-pass filter may be applied to image data at the first and second levels, while a low pass filter will be applied to image data at the third level. You can also use other filters, such as a low pass filter to image data at the second scale.

“At act 11, the image processor adds the filtered data to get final denoised images. The summation is a learned weighted sum. You can also use other summations. You can use other summations, for example, to combine the denoised data from the third and second scales.

“At act 1114 the image processor compares final denoised images with target data to update the weights of deep neural networks. Backpropagation is used to update the weights. All weights in the neural networks can be updated based upon the comparison.

“At act 1116 the trained deep neural network are stored in a memory by the image processor.”

“FIG. “FIG.12” illustrates a block diagram for an example system for denoising medical images. The system comprises a medical scanner 1202, an imaging processor 1204, and a display 1206. The medical scanner 1202 includes the image processor 1204 (or the display 206). Alternately, the image processor 1204 or the display 1202 can be part of a server, workstation, or computer that is connected to the medical scanner 1202.

The medical scanner 1202 can scan a patient. The scanner 1202 contains controllers, pulse generators and a radio frequency system. It also includes coils for CT, MR, or other scanning modalities. The medical scanner can generate a 2-D or 3-D scan of the patient.

The image processor 1204 can be described as a general processor, digital signal processor, graphics processor unit, application specific integrated device, field programmable array, digital circuit or analog circuit. It also includes other devices for image reconstruction and denoising that are now or later developed. The image processor 1204 can be used as a single device or as a part of a group of devices. Parallel or sequential division of processing can be used for implementations that use more than one device. The processor 1204 could have different functions. For example, one device may do denoising and another performs reconstruction. The image processor 1204 in an embodiment is a control processor, or another processor of the medical scanner 1202. The stored instructions that allow the image processor 1204 to perform various acts are what make it work. The configuration of the image processor 1204 can be done by software, hardware, or firmware.

“The image processor 1204 can denoise an image of a patient using a machine-learnt system that uses multiscale sparse representations. Multiscale sparse representations of the image are created using parameters that were trained on layers of sparse autoencoders with image data at different resolutions. In a feed forward network, the layers of sparse-denoising autoencoders had to be unfolded and then trained separately. The sparse denoising 3D scanners have three-dimensional decomposition filters. You may also have trained the image processor 1204 with deeper learning such as using the scan’s receive noise level or any other metadata to the scan as input for training.

The display 1206 can be used as a CRT or LCD, plasma, projector or printer. Display 1206 is used to display denoised images of patients after reconstruction and denoising.

“Various improvements described in this invention may be combined or used separately. While the accompanying drawings have been used to illustrate the invention, it should be understood that one skilled in the art may make other modifications or changes to these embodiments without departing from its scope or spirit.

Click here to view the patent on Google Patents.

What is a software medical device?

The FDA can refer to software functions that include ” Software As a Medical Device” and “Software in a Medical Device(SiMD)”, which are software functions that are integral to (embedded in a) a medical device.

Section 201(h),?21 U.S.C. 321(h),(1) defines a medical device to be?an apparatus, implements, machine, contrivances, implant, in vitro regulator, or other similar or related articles, as well as a component or accessory. . . (b) is intended for diagnosis or treatment of disease or other conditions in humans or animals. (c) Is intended to alter the structure or function of human bodies or animals. To be considered a medical device, and thus subject to FDA regulation, the software must meet at least one of these criteria:

  • It must be used in diagnosing and treating patients.
  • It must not be designed to alter the structure or function of the body.

If your software is designed to be used by healthcare professionals to diagnose, treat, or manage patient information in hospitals, the FDA will likely consider such software to be medical devices that are subject to regulatory review.

Is Your Software a Medical Device?

FDA’s current oversight, which puts more emphasis on the functionality of the software than the platform, will ensure that FDA does not regulate medical devices with functionality that could be dangerous to patient safety. Examples of Device Software and Mobile Medical Apps FDA is focused on

  • Software functions that aid patients with diagnosed mental disorders (e.g., depression, anxiety, and post-traumatic stress disorder (PTSD), etc.) by providing “Skill of the Day”, a behavioral technique, or audio messages, that the user can access when they are experiencing anxiety.
  • Software functions that offer periodic reminders, motivational guidance, and educational information to patients who are recovering from addiction or smokers trying to quit;
  • Software functions that use GPS location data to alert asthmatics when they are near high-risk locations (substance abusers), or to alert them of potential environmental conditions that could cause symptoms.
  • Software that uses video and games to encourage patients to exercise at home.
  • Software functions that prompt users to choose which herb or drug they wish to take simultaneously. They also provide information about interactions and give a summary of the type of interaction reported.
  • Software functions that take into account patient characteristics, such as gender, age, and risk factors, to offer patient-specific counseling, screening, and prevention recommendations from established and well-respected authorities.
  • Software functions that use a list of common symptoms and signs to give advice about when to see a doctor and what to do next.
  • Software functions that help users to navigate through a questionnaire about symptoms and to make a recommendation on the best type of healthcare facility for them.
  • These mobile apps allow users to make pre-specified nurse calls or emergency calls using broadband or cell phone technology.
  • Apps that allow patients or caregivers to send emergency notifications to first responders via mobile phones
  • Software that tracks medications and provides user-configured reminders to improve medication adherence.
  • Software functions that give patients access to their health information. This includes historical trending and comparisons of vital signs (e.g. body temperature, heart rate or blood pressure).
  • Software functions that display trends in personal healthcare incidents (e.g. hospitalization rates or alert notification rate)
  • Software functions allow users to electronically or manually enter blood pressure data, and to share it via e-mail, track it and trend it, and upload it to an electronic or personal health record.
  • Apps that offer mobile apps for tracking and reminders about oral health or tools to track users suffering from gum disease.
  • Apps that offer mobile guidance and tools for prediabetes patients;
  • Apps that allow users to display images and other messages on their mobile devices, which can be used by substance abusers who want to quit addictive behaviors.
  • Software functions that provide drug interaction and safety information (side effects and drug interactions, active ingredient, active ingredient) in a report based upon demographic data (age and gender), current diagnosis (current medications), and clinical information (current treatment).
  • Software functions that allow the surgeon to determine the best intraocular lens powers for the patient and the axis of implantation. This information is based on the surgeon’s inputs (e.g., expected surgically induced astigmatism and patient’s axial length, preoperative corneal astigmatism etc.).
  • Software, usually mobile apps, converts a mobile platform into a regulated medical device.
  • Software that connects with a mobile platform via a sensor or lead to measure and display electrical signals from the heart (electrocardiograph; ECG).
  • Software that attaches a sensor or other tools to the mobile platform to view, record and analyze eye movements to diagnose balance disorders
  • Software that collects information about potential donors and transmits it to a blood collection facility. This software determines if a donor is eligible to collect blood or other components.
  • Software that connects to an existing device type in order to control its operation, function, or energy source.
  • Software that alters or disables the functions of an infusion pump
  • Software that controls the inflation or deflation of a blood pressure cuff
  • Software that calibrates hearing aids and assesses sound intensity characteristics and electroacoustic frequency of hearing aids.

What does it mean if your software/SaaS is classified as a medical device?

SaaS founders need to be aware of the compliance risks that medical devices pose. Data breaches are one of the biggest risks. Medical devices often contain sensitive patient data, which is why they are subject to strict regulations. This data could lead to devastating consequences if it were to become unprotected. SaaS companies who develop medical devices need to take extra precautions to ensure their products are safe.

So who needs to apply for FDA clearance? The FDA defines a ?mobile medical app manufacturer? is any person or entity who initiates specifications, designs, labels, or creates a software system or application for a regulated medical device in whole or from multiple software components. This term does not include persons who exclusively distribute mobile medical apps without engaging in manufacturing functions; examples of such distributors may include the app stores.

Software As Medical Device Patenting Considerations

The good news is that investors like medical device companies which have double exclusivity obtained through FDA and US Patent and Trademark Office (USPTO) approvals. As such, the exit point for many medical device companies is an acquisition by cash rich medical public companies. This approach enables medical devices to skip the large and risky go-to-market (GTM) spend and work required to put products in the hands of consumers.

Now that we have discussed the FDA review process, we will discuss IP issues for software medical device companies. Typically, IP includes Patents, Trademarks, Copyrights, and Trade secrets. All of these topics matter and should be considered carefully. However, we will concentrate on patents to demonstrate how careless drafting and lack of planning can lead to problems, namely unplanned disclosures of your design that can then be used as prior art against your patent application.

In general, you should file patent application(s) as soon as practicable to get the earliest priority dates. This will help you when you talk to investors, FDA consultants, prototyping firms, and government agencies, among others. Compliance or other documents filed with any government agency may be considered disclosure to third parties and could make the document public. In general, disclosures to third parties or public availability of an invention trigger a one year statutory bar during which you must file your patent application. Failure to file your application within the required time frame could result in you losing your right to protect your invention.

The information from your FDA application may find its way into FDA databases, including DeNovo, PMA and 510k databases and FDA summaries of orders, decisions, and other documents on products and devices currently being evaluated by the FDA. Your detailed information may be gleaned from Freedom of Information Act requests on your application. This risk mandates that you patent your invention quickly.

When you patent your medical device invention, have a global picture of FDA regulatory framework when you draft your patent application. Be mindful of whether your software/SaaS application discusses the diagnosing and treating patients or affecting the structure or function of the body and add language to indicate that such description in the patent application relates to only one embodiment and not to other embodiments. That way you have flexibility in subsequent discussions with the FDA if you want to avoid classification of your software/SaaS/software as a medical device. In this way, if you wish to avoid FDA registration and oversight, you have the flexibility to do so.

An experienced attorney can assist you in navigating the regulatory landscape and ensure that you comply with all applicable laws. This area of law is complex and constantly changing. It is important that you seek legal advice if you have any questions about whether or not your software should be registered with FDA.

Patent PC is an intellectual property and business law firm that was built to speed startups. We have internally developed AI tools to assist our patent workflow and to guide us in navigating through government agencies. Our business and patent lawyers are experienced in software, SaaS, and medical device technology. For a flat fee, we offer legal services to startups, businesses, and intellectual property. Our lawyers do not have to track time as there is no hourly billing and no charges for calls or emails. We just focus on getting you the best legal work for your needs.

Our expertise ranges from advising established businesses on regulatory and intellectual property issues to helping startups in their early years. Our lawyers are familiar with helping entrepreneurs and fast-moving companies in need of legal advice regarding company formation, liability, equity issuing, venture financing, IP asset security, infringement resolution, litigation, and equity issuance. For a confidential consultation, contact us at 800-234-3032 or make an appointment here.