-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Idea: publish a message to workers when a queue moves from empty to non-empty #34
Comments
The reason this doesn't exist is because of scheduled / recurring jobs. And the inefficiency is very manageable. As part of a recent investigation I ran some benchmarks against a snapshot of one of your production redis servers and Longer polling intervals also don't hurt as much as you might think. All they affect is the amount of time jobs wait in queues, but generally not throughput. In the specific cases of the ruby and python clients, if a worker successfully pops a job, it will try popping another after it completes that job. It's only when a queue has no jobs available that the sleep interval is invoked. So for busy queues, the penalty for polling is very small. Fortunately, most pipelines are robust against additional latency, especially for the difference between say, 10 seconds or 60 seconds. Or 5 minutes, for that matter. If you have 100 worker processes and a 5-minute polling interval, on average a worker will pop every 3 seconds. If you then dump a lot of jobs in those queues, the workers will all be busy within 5 minutes and stay busy until all the jobs are done. So all that interval will have changed is that the whole giant job batch will finish a couple minutes later than it normally would have, with 1/30th the number of pop requests. |
Can we break the current worker mode? The sync mode fits to qless. However, we have to fork lot of workers for high volumes jobs. Sound the scalability is not good. My question is that Can we use the async mode with qless? |
(Copied from a mailing list thread): Hi Yan, there are a couple of options here. The first is that you can avoid using one of the built-in qless workers. Then you'd write some code that essentially called That said, this sounds more like a new worker model. Where the ruby client has a As one data point, many of our crawlers use the python qless client, often with about 8 processes and 50 greenlets per process. Depending on how lightweight your requests are, there's no reason you couldn't use hundreds or thousands of greenlets in a single process. Though I'm less familiar with the Fiber-compatible HTTP-request libraries of ruby, I imagine that similar behavior is attainable. Ultimately, my suggestion is to make a new worker which inherits from the forking worker, but each child process uses either a thread- or fiber-pool to run each job. That way you still get the same semantics of pop-run-complete for each job, but without the overhead you were hoping to avoid. Such a worker would be welcome in the qless repo :-) |
Hi! Dan, Thank you for replying. Our project is using java. So we have threads as workers. We do not fork the new process. The scalability for the first solution is very good. However, in this case, a worker (thread) can pick up many many jobs without finishing them. The UI provided by qless could show that lot of jobs under a worker. Also, the "complete" a job needs to pass in the worker's name. I am not sure if it is unrelated to the real application worker (thread, process etc) or not. For example, if the real application thread (worker) is stuck. The qless needs to pass the job to other thread (worker). Based on your suggestion, we can have the thread (worker) which pulls the jobs. Then it dispatches all http requests from the jobs to async http library. After dispatched, it waits for the reply from HTTP server (callback from asyn http library) and update the qless job status. This is better than synch mode but slightly worse than the first one in scalability. |
The Given that you are tied to using callbacks, it seems reasonable to me to have one thread |
Thanks a lot! |
In our workers the general pattern is to to the following in a loop:
This is pretty inefficient and results in steady redis traffic even when there's no work to do.
I'd like to propose an alternate model:
sleep
(with no arg) to sleep indefinitely, until it's subscriber gets ajobs_available
message, at which time it would useThread#run
to wake up the worker.This would result in much less redis traffic and would allow workers to get jobs immediately when they are put on the queue rather than waiting through the 5 second (or whatever) sleep we currently use.
One "gotcha" with this, though: with scheduled/recurring jobs, jobs are put on the queue when a worker calls
pop
orpeek
, causing qless to make the state of that stuff consistent. so if nothing callspop
orpeek
, it'll never move the scheduled job to the waiting state and never notify workers. Thus, we may want to do something like use a long sleep (rather than an indefinite one) or consider having the parent process callpeek
periodically.The text was updated successfully, but these errors were encountered: