-
-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Broadcast performance #617
Comments
Explain too many clients? If you have a large number of connections there will be some slowdown. In that instance the developer should look into having multiple ws servers, and limit the number of clients per instance. You could also start looking into a more distributed style (ws servers connecting to each other for broadcasting) for this which would add a little lag to some clients for broadcasts, but it would overall increate performance. |
Socket.io is also use https://github.com/socketio/socket.io-adapter/blob/master/index.js#L127 |
Broadcast performance could be optimized quite a bit. We could optimize this by framing the message only once and then send that frame to all connected clients. The only problem is that there isn't an API to do this right now. |
Such an api would be great. What I would really like is to see:
or maybe just supply the ability to prepare a message, and then an alternative to send called sendPrepared. |
@mafrost yeah, the first step is to extract Once this is done a very basic "optimized" broadcast could be as simple as this: const frame = frameMessage(1, 'message', true, true, true, false);
for (const client of wss.clients) client._socket.write(frame); |
Loving it! |
The This method allows to implement a slightly more efficient broadcast. const data = Buffer.from('message');
const list = WebSocket.Sender.frame(data, {
readOnly: false,
mask: false,
rsv1: false,
opcode: 1,
fin: true
});
wss.clients.forEach((ws) => {
if (ws.readyState === WebSocket.OPEN) {
list.forEach((buf) => ws._socket.write(buf));
}
}); The example assumes that permessage-deflate is disabled. When permessage-deflate is enabled data can be queued so it is not safe to write to the socket directly as this can change the order of messages or even put a spurious frame in the middle of a fragmented message. Use this only if know what you are doing. |
I know that this is an old thread, but I just wanted to give my 2 cents. @lpinca , I believe that what @diegoaguilar was trying to ask is that in case of a large number of clients, Consider the app with a lot of clients, all listening to the messages coming from the server and only few of them pushing changes to the server. With blocking broadcast It would be also nice to be able to define priorities for sending and receiving, since in many cases I would be interested in getting the message from the client as fast as possible and broadcasting the message to all registered clients with lower priority. |
@ssljivic this shouldn't be an issue. A loop with 500k clients on a single server is already unrealistic. |
@lpinca I agree that single server has its capacity and can handle max N operations per second. That is not an issue. The issue is that broadcasting with for loop is blocking the node until the for loop is done. In a better implementation broadcast iteration over client would be async, thus allowing other ops to be processed by Node in between. The overall server capacity would still be the same, but this would allow some other ops to be executed earlier than later. |
@ssljivic assume that you have 100k clients. It will take ~2 ms to iterate through them const arr = new Array(100000).fill();
const time = process.hrtime();
for (let i = 0; i < arr.length; i++) {
arr[i] = 0;
}
const diff = process.hrtime(time);
console.log('%d ns', diff[0] * 1e9 + diff[1]);
// => 2440435 ns my point is that blocking the event loop for such a small time is not an issue. The library does not force you to use a blocking loop. You can implement an async loop and use it if it's better for your use case. |
The original topic of this issue is great. I wish I had found it in February before doing a pretty deep investigation to reach the same conclusions: the frame does not change and can be pre-comupted. Below is the code that I am using to power a realtime financial market data feed with many tens of thousands of clients for yahoo finance. I will try to submit as a PR when I have a chance, but sharing in case anybody else wants to use it or make the PR + Tests...
I did benchmark (in microseconds) Promise vs straight callback and did not find that the Promise construction created any significant overhead relative to socket writing. |
Non-portable Linux notes for over-optimizers with scaling problems like myself... Separately, I have a proof-of-concept implementation of a "copy-once" broadcast. It is similar to the above code that I shared, except that it avoids copying that same byte array (the RFC-6455 frame) to the Kernel send buffers for each client. If you're using The idea is to copy the data to the kernel buffer one time, and then instruct the kernel to send that kernel-data to each client. This is how Apache sends large files to clients, and how Kafka achieves high I/O. Basically you use sendfile(2) instead of write(2) to transmit to a socket. The flow is as follows:
Of course all of these are chained together via callback/promise and the file is never really written to disk because it's deleted too quickly. It just lives in the OS page cache for a few milliseconds. I ran this in production for a week and it worked fine. It is implemented as a small Add-on in C++. I was not able to do a proper benchmark while it was live, though it appeared ~20% faster. It's still on my TODO list to benchmark it. Let me know if anybody is interested in the code... |
@adamkaplan nice. There is no API for an optimised broadcast because it really depends on how the data should be broadcasted (#617 (comment)).
Thank you for sharing your experience and code. I think the idea behind #617 (comment) is great and an easy to use module to do that (even if it's not cross platform) would be useful to many people. |
Point taken. If it is not destined to become an API feature, then at least a doc/wiki entry explaining this option. I understand the concern against giving people the loaded gun to shoot themselves. Use requires a very specific set of requirements are met. In my case, the idea is certainly that the message is identical per client (i.e. the price of Apple on Nasdaq is the same for everyone). The data is not changed at all. Your point about per message deflate is very good – and I'm going to check this ASAP. I think it's set to off. |
Agreed, |
Well that was a painful thread. Yes, per-message deflate is off. My messages, being market data, are extremely compact protobufs anyway (60-100 B). My scaling issues are of the "firehose" sort... compression would just slow it down. If 50,000 clients connect, probably 40,000 of them want boring stuff like Apple and S&P500 updates (out of the 100k+ available securities). That's why I'm so interested in broadcast functionality. |
|
@ggazzo I want to caution that after running in limited production, I don't broadcast with this method anymore. The example code above (#617 (comment)) worked well enough with some minor tweaks since then. It is also a lot less complex and scary. I am running a fairly massive realtime market data deployment here with over 200,000 active connections on 175 cores per data center, and around 35% CPU utilization. If you still think that the C++ add-on is worth exploring given the scale above, let me know. It will take some effort/time to decouple the add-on code from the larger proprietary system which cannot be shared (even though it isn't being used, it is integrated). |
@adamkaplan I reached a similar code by my self before finding this thread, so I was wondering how much performant the sorry but I'm not sure if I got what you mean with |
The system is running on in the cloud so the system is measured in CPU (and RAM). So 175 instances of 1 docker container with 1 CPU core each. |
@ggazzo Since you just want to compare, here is the sample code without any of the glue and configuration to make it compile and integrate. There is 1 version using only C++ and another version using V8 directly. https://gist.github.com/adamkaplan/8a6a35a5a914fdf1890ec29865c5ccb1 |
thanks @adamkaplan |
I just got curious on how broadcast would work, I see a
forEach
loop after.clients
is this really iterating over an array of clients? What if they're too many clients.The text was updated successfully, but these errors were encountered: