【Paper Quick Read】Joint Message Passing and Autoencoders for Deep Learning

2024-07-11

This article comes from Huawei's Ottawa Wireless Advanced Systems Competence Centre and Wireless Technology Laboratory, and one of the authors is the famous Tong Wen.
insert image description here

1. Main Problems Faced by Global Transceivers with Self-Encoding Architecture

What is most inspiring to me in the article is the main problems faced by the global transceiver of the auto-encoding architecture:
Problem 1: Based on the stochastic gradient descent method, the back-propagation algorithm is used to train the autoencoder, which requires one or more microchannel model layers to connect the deep neural layer of the transmitter and the deep neural layer of the receiver. Since the real channel must contain many nonlinear components (such as digital or analog pre-distortion and conversion), and involves non-differentiable stages such as upsampling and downsampling, the model trained by the deep neural layer of the transceiver is based on the constructed channel rather than the real channel. In the real channel scenario, the model obtained in this way may cause performance loss in the inference stage.
insert image description here
Problem 2: All hidden layers or intermediate layers are trained based on the posterior probability of the input signal. In the autoencoder global transceiver, the first layer of the receiver's deep neural layer is an intermediate layer, and the input signal of this intermediate layer is susceptible to the current channel distortion. This effect will inevitably permeate all the deep neural layers of the receiver. If the degree of channel change is large enough to exceed the training expectations, it will cause the receiver to fail in the inference stage.
insert image description here

Problem 3: Lack of interpretability between neural layers, it is impossible to know which neurons and which connections between neural layers will effectively affect the final learning accuracy. Goodfellow et al. gave an example of a deep neural network classifier. Although the classifier was well trained with non-noisy images, it may still misclassify noisy panda images as gibbons. This example shows that the final decision of the deep neural network classifier depends heavily on some "critical paths" (referring to some pixels in the panda image, also known as "local features"). If the critical path is intact, the correct classification can be made; if the critical path is disturbed, the wrong classification will be made. At the same time, this misclassification caused by noise is only an occasional situation under the premise of additive random noise, which shows that deep neural networks rely on the assumption that the "critical path" remains intact after being processed by the noisy channel. Deep neural networks are susceptible to additive random noise, which is almost a fatal blow to their application in wireless transceiver design.
insert image description here

The essence of these three problems can be attributed to the same core problem, that is, the generalization performance of deep neural networks is too poor when facing random changes in wireless channels. No model (even a very superior channel model) can fully capture all possible scenarios of radio propagation, so the processing of out-of-distribution (OOD) samples or outliers is a practical problem that autoencoders always have to face.
What's worse, existing solutions to these problems face many obstacles because the proposed solutions must meet the practical requirements of wireless communication devices and infrastructure, such as low energy consumption, low latency, and low overhead. On the one hand, in a dynamic environment, the cost of the autoencoder transceiver to accumulate, enhance, and retrain itself is too high; on the other hand, the entire process of accumulation, enhancement, and retraining itself violates the "Once-for-All" strategy of deep neural networks, that is, learning once and being effective for a long time, and thus cannot meet the actual needs and energy consumption requirements well.

In wireless scenarios, outliers are usually caused by random changes in the channel. In the inference phase, if the channel is changing and deviates from the channel model used in the training phase, the problem of outliers is particularly prominent. As the inference proceeds, more outliers will appear, which will affect the distribution shape of the received signal. Bengio attributed the poor generalization performance of deep learning to this. There are some remedies, such as additional training, including transfer training, attention-based recurrent networks, or reinforcement learning. However, faced with the requirements of low energy consumption, low latency and low control overhead for future wireless communications, these remedies become impractical and lack feasibility.

The article also analyzes the solution ideas for the MPA method proposed in the article, with the focus on the black part below:
“First, the channel model needs to be simplified to achieve differentiability, but this simplification hurts the performance of the autoencoder transceiver. The performance hit is because the channel model used to train the autoencoder is a simplified model, not the true model. That is, there is an offset between the simplified channel model used in the training phase and the true channel processed in the inference phase, and this offset leads to performance loss. If the offset increases beyond the desired level, the entire autoencoder transceiver will fail. There are two remedies to mitigate this performance degradation.The first is to use reinforcement learning to continuously record the channel status and continuously train the policy DNN and/or evaluation DNN. However, in terms of dimensional complexity, reinforcement learning is too complex for wireless systems because the dimensions processed by reinforcement learning are actually much larger than those of AlphaGo. Therefore, the adjustment mechanism based on reinforcement learning is not feasible. The second is to use a generative adversarial network (GAN) to learn as many channel scenarios as possible into a large deep neural network model. However, this is an empirical method and it cannot be proven that this method can cover all channel scenarios.。

In consideration of the above issues, the autoencoder with MPA takes a different technical path. In the inference phase, MPA adjusts the coefficients of the dimension reduction layer in the current channel measurement function for each data transmission, so adaptive inference uses a coarse channel model in the training phase, which we call "coarse learning". If coarse learning simulates the same or similar channel model for both training and inference, it is difficult to prove the advantage of coarse learning, but this advantage can be demonstrated in actual field tests.

Second, the autoencoder with MPA can work jointly with the channel model based on generative adversarial network.From experience, the actual conditions of most channels depend on the user location and the environment topology, such as high-rise buildings, hills, roads, etc. References proposed using conditional generative adversarial networks to model unknown channels and achieved good performance. We can use this method to build a channel model to provide good support for the training phase.。

During the inference phase, we propose to rely on channel estimation from pilots, channel measurement feedback or channel reciprocity to obtain the latest channel conditions. As we all know, MPA also benefits from sparsity and can tolerate bias and offset well (this is why LDPC decoder works effectively). From this perspective, it is not necessary to measure the full channel dimension, but only some dimensions, and our scheme still has good robustness in overall performance even with certain estimation errors. In addition, the residuals can be processed by the receiving deep neural layer with high error tolerance. Since the DR layer has been adjusted during the inference and training phases, it can be used as a precoder for the entire transmit chain, so there is no need to train the receiving deep neural layer. This not only brings energy-saving benefits, but also has a huge advantage in extending the battery life of user devices."

2. Speed reading

In fact, I personally still have a skeptical attitude towards the method proposed in the article. Let's take a quick look at the method in the article.

Article Summary

This paper proposes an autoencoder transceiver based on the Message Passing Algorithm (MPA) to solve the problem of poor generalization performance of traditional autoencoders when dealing with random channel changes. By introducing MPA in the autoencoder, the authors realized a flexible transceiver that can provide good generalization performance in different usage scenarios. This method allows coarse learning in the training phase and adaptive reasoning in the reasoning phase.

Main issues solved

Generalization performance issues: When facing random channel changes, traditional autoencoder transceivers have poor generalization performance because the neurons are fixed once trained.
Deviation between the model and the true channel: The use of autoencoders trained based on stochastic gradient descent and back-propagation algorithms relies on constructed channel models rather than real channels, which may lead to performance loss in the inference phase.
Adaptability to channel changes: The global transceiver of the autoencoder may cause the receiver to fail when the channel changes exceed the training expectations.
Out-of-distribution sample processing：The random changes of wireless channels lead to out-of-distribution samples or outlier problems. Existing solutions are difficult to meet the low energy consumption, low latency and low overhead requirements of wireless communication devices.

Main methods

Message Passing Algorithm (MPA)：Introduce the MPA function to achieve adaptive adjustment through the precoder layer and improve the generalization performance of the transceiver when the channel changes dynamically.
Dimensionality reduction layer：Insert the dimension reduction layer into the autoencoder framework, perform linear dimension reduction transformation, and iteratively adjust the dimension reduction layer coefficients through MPA.
Standalone MPA Iteration: Use forward iteration (similar to nonlinear support vector machines) and backward iteration (similar to attention deep neural networks) to independently adjust the dimension reduction layer, without relying on the back propagation of the original autoencoder.
Global Cascade Learning: Through a series training scheme, the dimension reduction layer and the deep neural layer are trained separately to achieve coarse learning and adaptive reasoning.
Coarse Learning and Adaptive Reasoning: A simplified channel model is used for coarse learning in the training phase, while the dimension reduction layer is adjusted through MPA in the inference phase to adapt to the current channel measurement situation.

Through these methods, we aim to improve the performance and generalization ability of autoencoder transceivers under conditions of random channel variations.
insert image description here
For the MPA method of the article, you can get a general idea by looking at Figures 16 and 17.
The MPA layer is mainly added to complete a dimensional transformation between the transmission vector and the channel. Then, during training, the MPA layer is frozen first. After the overall transmission and reception training is completed, the MPA layer is iteratively trained. The MPA layer can be regarded as a precoding mapping for transmission. The specific dimension can be obtained by measuring the channel. Here, the common multipath assumption is still adopted for the channel. The training of the MPA layer relies on the attention between the received signal and the transmission vector.Attention deep neural network is an effective way to measure the similarity between two features in different dimensionsIt should be noted that the number of attentions is smaller than the number of received signals, i.e., L

Technology Sharing