0% found this document useful (0 votes)
30 views1 page

Name: Bilal Abbasi Questions

Bilal Abbasi has 10 questions regarding reinforcement learning techniques, including when rewards and penalties are received, how agents share experiences in A3C, why Q-learning is used in A3C actor-critic, how deep exploitation can be done safely across perimeter networks, how multiple worker threads function if training servers are interconnected, and the differences and accuracy of intelligence versus brute force exploitation modes.

Uploaded by

Emma Khan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views1 page

Name: Bilal Abbasi Questions

Bilal Abbasi has 10 questions regarding reinforcement learning techniques, including when rewards and penalties are received, how agents share experiences in A3C, why Q-learning is used in A3C actor-critic, how deep exploitation can be done safely across perimeter networks, how multiple worker threads function if training servers are interconnected, and the differences and accuracy of intelligence versus brute force exploitation modes.

Uploaded by

Emma Khan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

Name: Bilal Abbasi

Questions

1. During reinforcement learning at which level we get reward or


panelty?

2. During Deep Exploit, How would we come to know that the server
we are exploiting is only Accessed thorough another server & As
A3C is in Beta version, so if server-to-server Access is Big does it
becomes relaible?

3. Why we use reinforcement learning technique , i.e why not


supervised and unsupervised learning.

4. As in A3C Experience of Agents is independent so during


reinforcement learning if experience of one Agent gets relatively
high than other so does they can share information as previously.

5. why we use Q-learning Algo in A3C Actor critic i.e why not other?

6. As, perimeter networks act as a filter & trap security voilations


than how deep exploit is done safely as different perimetr
networks act in different networks?

7. How more than 1 worker thread works if training servers are


interconnected ?

8. What is the Main difference between Intelligence Mode and Brute


force mode i.e when Intellegence mode is used and when other?

9. Upto how much extent does the exploitation using A3C is


Accurate?

10. How supervised and unsupervised learning is related with


reinforcement learning?

You might also like