-
-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Question] Maze Dense Reward #175
Comments
|
Somewhat related: the description of the maze environments says |
Gymnasium-Robotics/gymnasium_robotics/envs/maze/maze_v4.py Lines 374 to 381 in 8606192
You are correct, can you make a PR to fix it? |
I would be interested in picking this issue up if no one is working on it. |
Hey @kuds, Currently no one is watching on it. |
Quick update: I have forked the Gymnasium Robotics repository (my repo) and have implemented a new version of the Point Maze (now version 4) with an updated rewards system. I will verify the implementation with SAC and try the Ant Maze. Here is a link to my Google Colab Notebook with my experimental results: Googe Colab Notebook Let me know if you would like to see anything else or have any questions! best_model_point_maze_ppo-step-0-to-step-10000.1.mp4 |
Could you try both linear the exponential distance ( |
Sure, give me a week or so to get those coded and tested. I will report back once I have those numbers and charts ready. |
Question
Looking at the dense reward function for Maze Env:
return np.exp(-np.linalg.norm(desired_goal - achieved_goal))
The agent seems to prefer sitting the ball as close as possible to the goal without touching it after optimisation.
This makes sense given there is no bonus for reaching the reward and the reward is positive for all time steps.
Why is the dense reward formulated this way?
The text was updated successfully, but these errors were encountered: