Matchbox Educable Noughts And Crosses Engine

From Wikitia
Jump to navigation Jump to search

The Matchbox Educable Noughts And Crosses Engine (sometimes incorrectly referred to as the Machine Educable Noughts and Crosses Engine), shortened as MENACE, was an analogue computer made up of 304 matchboxes designed and built by Donald Michie in the 1960's. It was designed to play against human opponents in games of Tic-tac-toe, by returning a move for any given state of play, and would "learn" to refine its strategy over time through reinforcement learning.[1]

Origin

Donald Michie had been on the team decrypting the German Tunny Code during World War II. Fifteen years later, he wanted to further display his mathematical and computational prowess with an early Convolutional Neural Network. Since computer equipment was not available for uses such as these[2], he decided to display and demonstrate Artificial Intelligence in a more esoteric format, and constructed a working model out of matchboxes and beads.

MENACE was reportedly constructed as the result of a bet with a fellow computer science colleague, who postulated that such a machine was impossible.[3]

Composition

MENACE itself was made up of 304 matchboxes glued together in an arrangement similar to a chest-of-drawers[4]. Each box had a code number, which was keyed into a chart. This chart had drawings of Tic-tac-toe game grids with differing configurations of X's, O's and empty squares, all corresponding to possible permutations a game could go through as it progressed. After removing duplicate arrangements (ones that were simply rotations or mirror-images of other configurations) MENACE used 304 permutations in its chart, and that many matchboxes.[5]

Each individual matchbox tray contained a collection of colored beads.[6] Each color represented a move on a square on the game grid, and so matchboxes with arrangements where positions on the grid were already taken would not have beads for that position.As well as this, at the front of the tray were two extra pieces of card in a "V" shape[4], the point of the "V" pointing at the front of the matchbox. Michie and his artificial intelligence team called MENACE's algorithm "Boxes",[7] after the apparatus used for the machine. The first stage "Boxes" operated in five phases, each setting a definition and a precedent for the rules of the algorithm in relation to the game.[8]

Operation

Playing a Game

MENACE played first[9][5], as O, since all matchboxes represented permutations that only one playing as O could ever see.

To retrieve MENACE's choice of move, the opponent or operator first found the matchbox that matched the current game state, or a rotation/mirror-image of it. For example, at the start of a game, this would be the matchbox for an entirely empty grid. The tray would be removed, and lightly shaken so as to move the beads around. Then, the bead that had rolled into the point of the "V" shape at the front of the tray was the move MENACE had chosen to make. It's color was then used as the position to play on, and after accounting for any rotations or flips needed based on the chosen matchbox configuration's relation to the actual grid, the O would be placed on that square. Then the player performed their move, the new state was read, a new move selected, and so on, until a win, loss or draw for MENACE.

Finishing a game

When the game had finished, the human player/operator observed the game's outcome, which would be one of three possibilities:

  • The Human Player had won.
  • The Computer Player had won.
  • It was a draw.

As a game was played, every matchbox that was used for MENACE's turn had its tray returned to it slightly open, and the bead used kept aside, so that MENACE's choice of moves and the game states they belonged to were recorded. Once the game was done, if MENACE had won, it would then receive a "reward" for its victory. The removed beads showed the sequence of the winning moves. These were returned to their respective trays, easily identified by the open tray, as well as three bonus beads of the same color. In this way, in future games, MENACE would be much more likely to repeat those winning moves, reinforcing a winning strategy. If it lost, the removed beads were not returned, "punishing" MENACE, and meaning that in future it would be less likely, or incapable, if that color of bead became absent, to repeat the moves that caused a loss.

Results in practice

Optimal strategy

Noughts and Crosses has a well-known optimal strategy.[10] It involves strategic placing in order to block the other player while simultaneously taking the win. However, in an ordinary game of Noughts and Crosses, if both players lead this strategy, it will always end in a draw.[11] This creates a stagnation. If the human player is familiar with the optimal strategy, and the MENACE can quickly learn it, then the games will eventually only end in draws. When the computer begins, it will slowly eschew the odds:

Amount of turns to win Combinations Probability
5 Turns 1440 0.6%
6 Turns 5328 2.1%
7 Turns 47952 18.8%
8 Turns 72576 28.4%
9 Turns 81792 32.1%
Draw 46080 18.1%

But to a player playing with the optimal strategy against the computer, the odds of draw change to 100%. In Donald Michie's official tournament against MENACE, he used optimal strategy, and he and the computer began to draw consistently after twenty games. Playing the optimal strategy will return a slightly slower increase. From Michie's tournament[12], we observe the following milestones:

  • Michie began by consistently opening with "Variant 0":
  • At 15 games, MENACE abandoned all non-corner openings.
  • At just over 20, Michie switched to consistently using "Variant 1":
  • At 60, he returned to Variant 0.
  • In the late 70s, he moved to "Variant 2":
  • At 110, he switched to "Variant 3":
  • At 135, he switched to "Variant 4":
  • At 190, he returned to Variant 1.
  • At 210, he returned to Variant 0.

The trend in changes of beads in the "2" boxes runs:

Variant Match number Bead change in "2" box
Variant 0 0 0
Variant 1 20 -5
Variant 0 60 5
Variant 2 70 10
Variant 3 110 20
Variant 4 135 25
Variant 1 190 100
Variant 0 210 120

Correlation

Depending on the strategy employed by the human player, MENACE will produce a different trend on scatter graphs of wins.[13] Using a random turn from the human player will result in an almost perfect positive trend. Playing the optimal strategy will return a slightly slower increase.

Variations

As aforementioned, MENACE runs on a reinforcement punishment algorithm[14], removing beads that create a losing strategy and duplicating those that help it to draw and win. However, a couple of proposed variations of MENACE employ other systems.

Punish draws

In the default punishment system, the computer is rewarded for draws. In this system, draws are punished by removing beads. The result is similar to default, but displays a more ruthless form of cognitive AI training.

Increase starting beads

Michie suggested this as a way to increase MENACE's capacity for learning strategy in a different way. This variation returns a very similar graph to default settings.

Legacy

Donald Michie's MENACE proved that a computer could "learn" from failure and success in order to become good at a task[15]. It also used what would become core principles within the field of machine learning before they had been properly theorised. For example, the combination of how MENACE starts with equal numbers of types of beads in each matchbox, and how these will then be selected at random, creates a learning behavior similar to weight initialization in modern artificial neural networks. MENACE's idea of a self-reinforcing algorithm has now been replicated countless times, and changed the face of Computational thinking.

Computer simulation

After the resounding reception of MENACE, Michie was invited to the US Office of Naval Research, where he was commissioned to build a "Boxes"-running program for an IBM Personal Computer, for use at Stanford University.[16] Michie went on to create a simulation program of MENACE on a Pegasus 2 computer with the aid of D. Martin[17]

Modern recreations

There have been multiple recreations of MENACE in more recent years, both in its original, physical form and as a computer program[5][18]. Although not as a functional computer, in examples of demonstration, MENACE has been used as a teaching aid[19][20][21][22] for various prestigious Neural Network classes, including a well-publicised demonstration from Cambridge Researcher Matthew Scroggs.[23]

In the media

        

References

  1. "Menace: the Machine Educable Noughts And Crosses Engine". Chalkdust. 2016-03-13. Retrieved 2020-05-17.
  2. http://www.cdpa.co.uk/UoP/HoC/Lectures/HoC_07b.PDF
  3. "Daily Telegraph obituary for Donald Michie". Daily Telegraph. 2007-07-09.{{cite news}}: CS1 maint: url-status (link)
  4. 4.0 4.1 The Science Book, Second Edition, Dorling Kindersley Ltd., 2015, pg. 288
  5. 5.0 5.1 5.2 https://warwick.ac.uk/fac/sci/dcs/research/em/publications/web-em/10/cs405_-_1362452_-_menaceem.pdf
  6. core.ac.uk - The Machine Learning Revolution in AI by Luc De Raedt https://core.ac.uk/download/pdf/80808274.pdf
  7. Muggleton, Stephen (2007-07-10). "Obituary for Donald Michie, an article in The Guardian from 2007". theguardian.com.{{cite web}}: CS1 maint: url-status (link)
  8. Russel, David (2012). Springer Professional - Extract from "The BOXES Methodology",. https://www.springerprofessional.de/en/introduction-to-boxes-control/1967928: Springer London. ISBN 9781849965279. {{cite book}}: External link in |location= (help)CS1 maint: location (link)
  9. https://we-make-money-not-art.com/menace-2-an-artificial-intelligence-madegetgetting ridting rid-of-wooden-drawers-and-coloured-beads/
  10. "How to Win at Tic Tac Toe". wikiHow. Retrieved 2020-05-17.
  11. "Tic-Tac-Toe Strategy". Stephen Ostermiller. 2004-06-15. Retrieved 2020-05-17.
  12. Trial and Error , Michie Donald , Penguin Science Surveys 1961 Vol 2
  13. Science, ODSC-Open Data (2018-10-23). "How 300 Matchboxes Learned to Play Tic-Tac-Toe Using MENACE". Medium. Retrieved 2020-05-17.
  14. Meyer, Jeremy (2009-11-27). "Artima - Tic-Tac-Toe Training "..the human player would "punish" the machine by throwing away the last bean that was taken out of a matchbox."". artima.com.{{cite web}}: CS1 maint: url-status (link)
  15. Dumas, Jacques-Pierre (Jp). "IoT and machine learning are driving network transformation". itbrief.com.au. Retrieved 2020-06-12.
  16. "Professor Donald Michie". 2007-07-08. ISSN 0307-1235. Retrieved 2020-06-11.
  17. "Experiments on the mechanization of Game Learning Part 1. Characterization of the model and its parameters" (PDF). Retrieved 2020-06-01.
  18. Scaruffi, Piero (2016). Intelligence is not Artificial - Why the Singularity is not coming any time soon and other Meditations on the Post-Human Condition and the Future of Intelligence. https://www.scaruffi.com/singular/download.pdf. p. 30. ISBN 978-0-9765531-9-9. {{cite book}}: External link in |location= (help)CS1 maint: location missing publisher (link)
  19. Zhao, Yibo (1 December 2013). "Machine Educable Engine on Noughts And Crosses in Modelling Study". University of Warwick.{{cite web}}: CS1 maint: url-status (link)
  20. "AI Topics.. Tic-Tac-Toe strategy in Computational Thinking, Introduction, MENACE".{{cite web}}: CS1 maint: url-status (link)
  21. Ute Schmid - "Interactive Learning with Mutual Explanations" (How Humans and Machine Learning Systems can Profit From Each Other) - University of Bamberg, Germanyhttps://www.universiteitleiden.nl/binaries/content/assets/science/dso/ute-schmid_mutualexplleiden-1.pdf
  22. "Inspiring the Next Generation of Computer Scientists | King's Worcester". King's Worcester. 2019-11-11. Retrieved 2020-06-12.
  23. Matthew Scroggs Lecture on MENACE - https://www.youtube.com/watch?v=hK25eXRaBdc

External links

This article "Matchbox Educable Noughts And Crosses Engine" is from Wikipedia. The list of its authors can be seen in its historical. Articles taken from Draft Namespace on Wikipedia could be accessed on Wikipedia's Draft Namespace.