Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Maze Dense Reward #175

Open
llewynS opened this issue Sep 11, 2023 · 8 comments
Open

[Question] Maze Dense Reward #175

llewynS opened this issue Sep 11, 2023 · 8 comments
Labels
good first issue Good for newcomers

Comments

@llewynS
Copy link

llewynS commented Sep 11, 2023

Question

Looking at the dense reward function for Maze Env:

return np.exp(-np.linalg.norm(desired_goal - achieved_goal))

The agent seems to prefer sitting the ball as close as possible to the goal without touching it after optimisation.

This makes sense given there is no bonus for reaching the reward and the reward is positive for all time steps.

Why is the dense reward formulated this way?

@llewynS llewynS changed the title [Question] Question title [Question] Maze Dense Reward Sep 11, 2023
@Kallinteris-Andreas
Copy link
Collaborator

Kallinteris-Andreas commented Dec 30, 2023

  1. Are you using continuing_task=True (which is the default)?
  2. Are you resetting about termination=True?
  3. Have you experimented with other reward functions?

@onnoeberhard
Copy link

Somewhat related: the description of the maze environments says the returned reward is the negative Euclidean distance between the achieved goal position and the desired goal. This is wrong (it is the exponential of the negative distance).

@Kallinteris-Andreas
Copy link
Collaborator

@onnoeberhard

def compute_reward(
self, achieved_goal: np.ndarray, desired_goal: np.ndarray, info
) -> float:
distance = np.linalg.norm(achieved_goal - desired_goal, axis=-1)
if self.reward_type == "dense":
return np.exp(-distance)
elif self.reward_type == "sparse":
return (distance <= 0.45).astype(np.float64)

You are correct, can you make a PR to fix it?
You can use the Gymnasium/MuJoCo as reference https://gymnasium.farama.org/main/environments/mujoco/ant/#rewards

@kuds
Copy link

kuds commented Dec 22, 2024

@Kallinteris-Andreas

I would be interested in picking this issue up if no one is working on it.

@Kallinteris-Andreas
Copy link
Collaborator

Hey @kuds, Currently no one is watching on it.
End of planning to include maze v6 in the new release To fix various bugs.
If you wanna start working on it, go for it.

@kuds
Copy link

kuds commented Jan 16, 2025

@Kallinteris-Andreas

Quick update: I have forked the Gymnasium Robotics repository (my repo) and have implemented a new version of the Point Maze (now version 4) with an updated rewards system. I will verify the implementation with SAC and try the Ant Maze.

Here is a link to my Google Colab Notebook with my experimental results: Googe Colab Notebook

Let me know if you would like to see anything else or have any questions!

best_model_point_maze_ppo-step-0-to-step-10000.1.mp4

@Kallinteris-Andreas
Copy link
Collaborator

Could you try both linear the exponential distance (-distance and -np.exp(distance) + 1) to see if there is a qualitative difference

@kuds
Copy link

kuds commented Jan 20, 2025

Sure, give me a week or so to get those coded and tested. I will report back once I have those numbers and charts ready.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants