PulseModel Convolution understating #1147

grazder · 2023-08-10T09:58:39Z

grazder
Aug 10, 2023

Hello! I'm interested in PulsedModel, which saves information about previous inputs. And I'm interested in understanding and reimplementing it into torch model.

So, for example on every input we have tensor [batch_size=1, n_channels=1, time=1, inp_dim]
And like in PulsedModel we want to get output tensor after convolution of incoming tensor considering information from previous steps

Can you explain what PulsedModel is doing for example for this code:

class Conv2dNormAct(nn.Sequential):
    def __init__(
        self,
        in_ch: int = 1,
        out_ch: int = 16,
        kernel_size: Union[int, Iterable[int]] = (3, 3),
        fstride: int = 1,
        dilation: int = 1,
    ):
        layers = []

        layers.append(nn.ConstantPad2d((0, 0, kernel_size[0] - 1, 0), 0.0))
        layers.append(
            nn.Conv2d(
                in_ch,
                out_ch,
                kernel_size=kernel_size,
                padding=(0, 0),
                stride=(1, fstride),  # Stride over time is always 1
                dilation=(1, dilation),  # Same for dilation
                bias=False,
            )
        )
        
        layers.append(torch.nn.ReLU(out_ch))

        super().__init__(*layers)

model = Conv2dNormAct()
inp_tensor = torch.rand(1, 1, 1, 32)

model(inp_tensor).shape # ([1, 16, 1, 30])

Taken from - https://github.com/Rikorose/DeepFilterNet/blob/12fe14af0790b4dfa537aa6011b082a0bfe609a2/DeepFilterNet/df/modules.py#L18

grazder · 2023-08-10T10:11:58Z

grazder
Aug 10, 2023
Author

I see that paddings after first will be replaced with previous inputs. Am I right?
Does PulsedModel do something else?

0 replies

grazder · 2023-08-10T10:58:58Z

grazder
Aug 10, 2023
Author

May be you have some pseudocode for this operation? including paddings, strides, kernel-size, etc

0 replies

kali · 2023-08-10T11:16:48Z

kali
Aug 10, 2023
Maintainer

Have you tried dumping your example network to onnx and run the pulsification through tract command line ? It will show you the pulsified network.

2 replies

grazder Aug 10, 2023
Author

I'll try, thanks

vadimkantorov Aug 10, 2023

@kali I guess the question is how PulsePad / Delay nodes operate internally and what is their semantics?

What is the semantics of input_stream.delay, input_stream.overlap?

How does tract determine when the convs can run? (I guess, it should depend on the full receiptive field of the model, right? or does each layer take these decisions?)

In other words, what is the overall semantics of Pulsification? :) So far we've been referring to #496 (comment).

The transformation is "local" in some sense, isn't it? (i.e. considers and transforms one module at a time)? Should it be implementable via just making a PulsedConv1d module in PyTorch?

We are wondering whether it should be feasible / not very difficult to code some Python/PyTorch pseudocode for pulsification of Conv (stride is 1 or 2, kernel_size 3, padding is 1 or 0)? the end goal is reproducing the streaming setup of @Rikorose's denoiser in pure Python / PyTorch for more flexibility

kali · 2023-08-10T14:46:16Z

kali
Aug 10, 2023
Maintainer

Very good questions. The pulsifications semantics are pretty clear in my head but I never took the time to write about them somewhere public. Let's try to discuss them a bit.

The gist of it: pulsification transforms a stateless "causal" network into a statefull network that will perform the same computation with some delay.

For instance if your training network takes a 1D input of 10 and performs a convolution with a kernel size of 3, your output will be of length 8.

Input:    0 1 2 3 4 5 6 7 8 9
Output: 012 123 234 345 456 567 678 789 

where xyz is the result of the convolution applied to x, y and z

Now let's say you want to pulse this network with a pulse size of 4. During the first turn, we know the value of only 4 input frames, so we can only compute the first two output frames. One option would be to output less frames in the first pulse:

Input:    0 1 2 3 | 4 5 6 7 | 8 9
Output: 012 123 | 234 345 456 567 | 678 789

But that is NOT what tract is doing. The implementation sticks to a principle: fixed size tensors. So instead the first output pulse will be partially invalid:

Input:    0 1 2 3 | 4 5 6 7 | 8 9 ? ?
Output: ??0 ?01 012 123 | 234 345 456 567 | 678 789 89? 9??

Then it's up to the caller to skip these two invalid frames. When tract pulsifies the network, it computes the overall delay (2 in our example) and stores it as a property of the model.

Note that the network delay is intrinsic of the network. If you think of convolutions, it's the global receptive field - 1. And the pulse is a value picked by the model integrator.

So how is it implemented ? The nice thing is, it is more or less composable. If your network is pulsifiable, you can pulsify it operator per operator. Let's have a look at the valid convolution case. It turns out that pulsifying the convolution can be done without altering the convolution code itself...

Network input:    0 1 2 3 | 4 5 6 7 | 8 9 ? ?
    v
 Delay.    delay=0, overlap=2    
    v       ? ? 0 1 2 3 | 2 3 4 5 6 7 | 6 7 8 9 ? ?
 Conv
    v 
Output: ??0 ?01 012 123 | 234 345 456 567 | 678 789 89? 9??

In this case, a delay operator with an overlap of 2 is inserted before the convolution will store the last two frames of each pulse and prepend them to the next pulse over and over. And that's about it. The convolution operator here is the same as the one in the original network. tract will "tag" the output network with a "delay" of 2 as it can tells that the conversion introduced two invalid frames.

(There is a bit of a difficulty for giving semantics to the "delay" and "length" for the tensors between the Delay and the Convolution, but we pretend not to be aware it. We think of this Delay+Conv more or less as an atomic thing.)

Delay has two main parameters, overlap and delay (a lot of things are called delay, right?). The delay parameter is necessary in some circumstances to offset the output without overlapping, but it's not super frequent. With a delay of 2, you get:

Network input:    0 1 2 3 | 4 5 6 7 | 8 9 ? ?
    v
 Delay.    delay=2, overlap=0    
    v       ? ? 0 1 | 2 3 4 5 | 6 7 8 9

If we have more convolution layers, then it just composes nicely, and tensors delay are just additive.

Things can get a bit more complicated with padded convolution: we need to convert them to valid convolutions first, prepending a padding operator, then pulsify the padding separately (actually it is a bit more complicated because this approach could sometimes lead to unnecessary delays).

Recurring operators are super easy to pulsify: they just need to learn how to skip the invalid frames coming to their input (so the Scan operators has a "skip" property). So while they have not seen "skip" frames, they are just operating as usual, but keep their initial state unchanged. (It is frequent to have RNN ops following CNN ops, so the convolution part will introduce a delay that the RNN must know about).

I hope this helps :)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PulseModel Convolution understating #1147

{{title}}

Replies: 4 comments 2 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

PulseModel Convolution understating #1147

grazder Aug 10, 2023

Replies: 4 comments · 2 replies

grazder Aug 10, 2023 Author

grazder Aug 10, 2023 Author

kali Aug 10, 2023 Maintainer

grazder Aug 10, 2023 Author

vadimkantorov Aug 10, 2023

kali Aug 10, 2023 Maintainer

grazder
Aug 10, 2023

Replies: 4 comments 2 replies

grazder
Aug 10, 2023
Author

grazder
Aug 10, 2023
Author

kali
Aug 10, 2023
Maintainer

grazder Aug 10, 2023
Author

kali
Aug 10, 2023
Maintainer