This project is a Self-Reinforcement Learning Agent through AI Feedback model created to compete in the CSCI 561 Artifial Intelligence Graduate Class AI competition.
The task was to make an AI model leveraging minimax, alpha-beta pruning, and a smart evaluation algorithm to beat the other classmates at an enlarged version of Othello. My model placed in the top 10% of the class. The innovation with this model was that I used ChatGPT as a game critic and a master AI to help my model learn from. This lead to much faster convergence to the optimal set of weights. By the end of training, the model was consistently beating GPT4-Turbo 85% of the time.