2019年5月21日火曜日

Enjoy Reinforcement Learning in Tic-tac-Toe

This app has won the "2019 June's MIT APP INVENTOR OF THE MONTH" award. 
http://appinventor.mit.edu/explore/app-month-winners-2019

The purpose of this app is to familiarize you with Reinforcement Learning (a type of artificial intelligence). This was created using MIT App Inventor. In particular, it takes advantage of the recently released Generic Event blocks.

As an example, we use Tic-Tac-Toe, a well-known and easy game. The main feature of this app is that you can adjust the strength of your opponent computer player according to the degree of reinforcement learning and enjoy the match.

The computer player learns the situation when it loses in the match, so it will become gradually stronger as the match is played many times. This app is published below and its appearance is as shown in Fig1.

MIT App Inventor Gallery (source file .aia)
ai2.appinventor.mit.edu/?galleryId=5141416460812288


3 cases of computer vs. human match

There are three cases of computer-human competition, but the last case-3 is the most attractive.

(case-1) Play against a weak computer player
-> The computer player has no knowledge, so if you don't make mistakes, you will easily win.
(case-2) Play against a very strong computer player
-> The computer player has enough learned information, so it is difficult for a human to win.
(case-3) Adjust the strength of the computer player
-> The strength of computer player can be adjusted by the degree of learning.

The battle procedure in case-1 and case-2 is described in Fig.2 and Fig.3 respectively.



Adjust the strength of the computer player (case-3)

The procedure to enjoy the match in this case-3 is in Fig.4. In order to do this, you need to make the computer learn Reinforcement Learning. For example, it takes about 30 minutes for learning to make the computer about 85% likely to be strong (that is, 85% chance of not losing). Is it too long?

Don't worry, you can interrupt learning at any time by LONG clicking the button "Reinf. Learning". Even if the power is turned off, you can continue the learning by turning on the power and clicking the button "Reinf. Learning" again. However, please keep in mind that if you click on "from scratch", computer's learning results will disappear.


The computer gets progressively stronger as learning progresses. The situation is shown in Fig5. You can use this figure as a guide to predict the required learning time (or number of games). Enjoy the game!



This application includes a text file like Fig. 6 as an asset file. This is a learning result obtained by playing about 37,000 games. Please refer to the above Fig.5. FIg.6 exemplifies what the learning content means.


In this application, we used TinyDB to store and retrieve reinforcement learning results and also used newly-provided general-purpose event blocks to operate buttons. Thanks to them, as shown in Fig. 7, the app was created very efficiently.



This app was created inspired by the following documents:
  • https://ecraft2learn.github.io/ai/AI-Teacher-Guide/chapter-6.html
  • Donald Michie, http://people.csail.mit.edu/brooks/idocs/matchbox.pdf
  • https://we-make-money-not-art.com/menace-2-an-artificial-intelligence-made-of-wooden-drawers-and-coloured-beads/
  • Oliver Child, http://chalkdustmagazine.com/features/menace-machine-educable-noughts-crosses-engine/

Comments on version upgrade

As soon as you use this app, you will want to add the following features:
  1. Specifying which one of the computer and the human takes the first move. This is not difficult, but this time I omitted it to simplify the structure of the program.
  2. Saving the results of Reinforcement Learning and making it available to other people. This is also relatively easy to achieve. Intermediate results of Reinforcement Learning are stored in TinyDB. Therefore, when necessary, the contents of TinyDB can be stored as a text file and retrieved to the outside. To use it for further Reinforcement Learning, convert the text file to TinyDB format. However, generally speaking, this conversion requires a little care. This is because the format of the value in TinyDB elements (tag, value) is various, such as a single element, list, or list of lists.

0 件のコメント:

コメントを投稿