You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ignore any warnings and no need to restart the session
Do this in a cell:
import apache_beam as beam
from apache_beam.runners.portability.prism_runner import PrismRunner
with beam.Pipeline(runner=PrismRunner()) as p:
N =5
p | "Create Elements" >> beam.Create(range(N)) | "Squares" >> beam.Map(lambda x: x**2) | "Print" >> beam.Map(print)
the first run should work well but if you run the cell second time, the below error occurs:
OSError: [Errno 26] Text file busy: '/root/.apache_beam/cache/prism/bin/apache_beam-v2.61.0-prism-linux-amd64'
Check the running process:
/root/.apache_beam/cache/prism/bin/apache_beam-v2.61.0-prism-linux-amd64 is still running. If you kill this process, step 5 will run fine.
Issue Priority
Priority: 2 (default / most bugs should be filed as P2)
Issue Components
Component: Python SDK
Component: Java SDK
Component: Go SDK
Component: Typescript SDK
Component: IO connector
Component: Beam YAML
Component: Beam examples
Component: Beam playground
Component: Beam katas
Component: Website
Component: Infrastructure
Component: Spark Runner
Component: Flink Runner
Component: Samza Runner
Component: Twister2 Runner
Component: Hazelcast Jet Runner
Component: Google Cloud Dataflow Runner
The text was updated successfully, but these errors were encountered:
Thank you for filing a clear issue. When you spoke to me offline, it wasn't clear this was a problem on second invocations of a pipeline run that caused the problem.
I don't see any harm in having the prism process persist. It's set to auto close when Python exits, which is probably a long lived interpreter in this context.
But it is a problem that we aren't taking advantage of the existing caching mechanism probably.
I think all that we need to do is set a class level handle that keeps the existing server instance on the first startup of prism, and then retain that for subsequent invocations, and it avoids the issue of trying to start a new one.
That happens when a binary is attempted to be modified while the binary is still running.
This better explains the symptoms:
The process is already executing because Python hasn't exited yet.
The re-run tries to download and start another instance.
Because of 1, the writing in 2 fails.
So, the solution there is fixing the caching logic to correctly use the cache. That's one bug.
Then we'd need to use the idle_timeout flag like we do for Java to avoid having a long set of prism processes sitting around the colab instance forever.
What happened?
Steps to reproduce the issue:
/root/.apache_beam/cache/prism/bin/apache_beam-v2.61.0-prism-linux-amd64
is still running. If you kill this process, step 5 will run fine.Issue Priority
Priority: 2 (default / most bugs should be filed as P2)
Issue Components
The text was updated successfully, but these errors were encountered: