-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
td3_implementation analysis #10
Labels
bug
Something isn't working
Comments
[TEST 1]
[TEST 2]
[TEST 3]
|
[TEST 4]
[TEST 5]
[TEST 6]
👀 I think the my implementation of
|
[TEST 10]
before a = agent.make_action(obs,t)
action = np.argmax(a) if is_discrete else a
# do step on gym at t-step
new_obs, reward, done, info = env.step(action)
# store the results to buffer
agent.memorize(obs, a, reward, done, new_obs)
# should've memorize action w/ noise!! after a = agent.make_action(obs,t)
action = np.argmax(a) if is_discrete else a
# do step on gym at t-step
new_obs, reward, done, info = env.step(action)
# store the results to buffer
agent.memorize(obs, action, reward, done, new_obs) but, consequently, doesn't work.. |
[TEST 12]
|
최종결과가 어떻게 됐을까요? 궁금합니다 ..! |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
the first TD3 implementation do not work well..
so.. have to analysis each part of the differences from ddpg
The text was updated successfully, but these errors were encountered: