-
-
Notifications
You must be signed in to change notification settings - Fork 196
Tutorial: NumPy deep reinforcement learning with Pong from pixels #35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
@melissawm Note that For context, here's how
|
Don't worry, @8bitmp3 , this is because we should add gym as a dependency for the repo and the CI will pick that up. For now it doesn't matter, we'll do that later if it's alright with you. |
Hi @8bitmp3 ! Just to let you know - I'm reviewing this but feel like this is going to require some reworking to get right. Because it's a complicated subject, I'm trying to figure out ways to make it simpler and maybe reorganize some things. I'll let you know as soon as I have reached a nice balance :) |
Hello, @8bitmp3 ! I am finally back, sorry it took me so long. I took my time also because I wanted to read this carefully and I have to say it is a really cool project :) Here's a slightly modified version of the tutorial. Again, those are suggestions. You can see if they make sense to you. The code apparently works - I've tested comparing to your original and get very similar results. A few points:
Last thing: it would be very helpful to note some expected values for each episode, because the "-21" values are not very encouraging and don't seem to be the right answer if you're not paying attentions to the expected results :) I hope this all makes sense and we'll certainly need to do a couple more passes to get it right, but that's a really interesting tutorial, so thanks again! Please reach out if you have any questions. |
Thank you again @melissawm for the awesome feedback! 🥳
DONE:
@melissawm PTAL thanks! 👍 |
Hi @8bitmp3 , thanks for the explanations! Yes, they do make sense. I'll do a thorough re-read now. |
content/tutorial-deep-reinforcement-learning-with-pong-from-pixels.ipynb
Outdated
Show resolved
Hide resolved
content/tutorial-deep-reinforcement-learning-with-pong-from-pixels.ipynb
Outdated
Show resolved
Hide resolved
content/tutorial-deep-reinforcement-learning-with-pong-from-pixels.ipynb
Outdated
Show resolved
Hide resolved
content/tutorial-deep-reinforcement-learning-with-pong-from-pixels.ipynb
Outdated
Show resolved
Hide resolved
Overall I think this is great. I like the subject and content and it feels interesting. My only remaining concern is the length. I don't think having extra content in the bottom of the document should be a problem, though - to me it makes sense and I'm the sort of person who would like to read that extra bit of info :) Others may disagree, though. Last comment: it would be nice to follow @bjnath 's template at least for the first part of the document (What you'll learn, What you'll do...) because it makes the content we have here in the repo cohesive and the users know what to expect. Thanks again! |
Thanks for all the awesome feedback @melissawm 😃 Really appreciate it. "...teaching RL is hard, and there are so many ways for teaching deep RL to go wrong" - from the foreword in the Grokking Deep RL book (the book uses PyTorch). This tutorial attempts to explain the ins and outs of the "vanilla" policy gradient method in-depth using mostly NumPy. And, given all the background literature—including research papers and books—that I 🔍 scanned through in preparation for this tutorial, I think I've minimized the need for extra googling for readers (I hope, at least). Also, this tutorial is something that I wish I'd come across earlier when researching (googling) this topic. And, on top of it all, we aren't using a library/framework like TensorFlow or PyTorch, that RL researchers use, which makes a bunch of steps so much easier to write. But this is NumPy from scratch 👍 If you want to learn something in-depth, teach it and/or do it in NumPy 🤗 (I think those are the words by @iamtrask)
Ok! I think the first paragraph and the table of contents cover most of this—I tried following Ben's structure like in my other tutorial:
I'll try to think of some ways to enhance the intro! 👍 |
Thanks, @8bitmp3 ! I think we're getting there! I don't see any further issues right now. When we feel it's ready I'll merge and convert to the new repo format. Cheers! 🎉 |
@8bitmp3: I just did a commit to this PR updating the file to match the .md format. I also added a note about reducing the amount of training steps because of our CI. Please let me know if this makes sense. If you want to open the .md file as a notebook, you just need to install jupytext as a python package, and use either jupyter classic or jupyter-lab and open the markdown file "as a notebook" (there are different options depending on the interface you are using, but it's the same idea). I'll also need to add |
Looks good @melissawm Thank you. I also found a repetition ("First, First") and updated the diagram (one of the arrows should be pointing to the outer layer): |
Great! I think all that is left is to fix the README so this document is listed there, and fix |
@melissawm I've updated the YAML file and README but I'm getting merge conflicts which I can't resolve 🤔 |
…Zero, make minor changes
…; other minor changes
…ith Pong from pixels tutorial
Hmm yeah the fact that the author for all of the commits has been re-done is surprising to me too (though at least @8bitmp3 is preserved as the actual author, so it's not wrong). Maybe the author was modified during a rebase? Either way this looks fine to me :) |
Thank you. As long as some people find this tutorial useful -> 👍 @rossbar @melissawm |
Of that I have no doubt! 🎉 |
So just to clarify, I checked out this PR using |
No that's definitely what I do in this situation as well, I guess I've just never paid attention to what that does to the commiter/author bubble icons on GitHub after doing so. |
content/tutorial-deep-reinforcement-learning-with-pong-from-pixels.md
Outdated
Show resolved
Hide resolved
There are also a couple of things that crept in during the re-formatting (I think). @8bitmp3 would you be willing to fix them? I'll mark them and you can include this typo fix in the same commit. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After that, you also need to add your new document to site/index.md
, and Sphinx should be happy.
content/tutorial-deep-reinforcement-learning-with-pong-from-pixels.md
Outdated
Show resolved
Hide resolved
content/tutorial-deep-reinforcement-learning-with-pong-from-pixels.md
Outdated
Show resolved
Hide resolved
content/tutorial-deep-reinforcement-learning-with-pong-from-pixels.md
Outdated
Show resolved
Hide resolved
Co-authored-by: Melissa Weber Mendonça <[email protected]>
Co-authored-by: Melissa Weber Mendonça <[email protected]>
Co-authored-by: Melissa Weber Mendonça <[email protected]>
Anything to keep Sphinx happy. @melissawm Am I doing this right? 😃 Here's the diff: ---
maxdepth: 1
---
content/cs231_tutorial
content/tutorial-svd
content/mooreslaw-tutorial
content/save-load-arrays
content/tutorial-deep-learning-on-mnist
+ content/tutorial-deep-reinforcement-learning-with-pong-from-pixels
content/tutorial-x-ray-image-processing |
Ah! I got it - MyST parser apparently doesn't like the dollar sign inside the code block. I tried escaping but it ends up throwing errors no matter what I try. I tried all the documented options but something seems to go wrong and I don't know where. If you have ideas, that would be great! Also, I noticed that when building locally I was not seeing the images, turns out you need to give a different path. So in both places where the
|
@melissawm Updated 2x to
Let me know if this works. Note that the image may not render on github.com |
Thanks @melissawm |
Thank you, @8bitmp3 ! 🎉 |
Hi @melissawm @mattip @bjnath 👋
All feedback welcome. Thank you all!