I made a simple Python game. A screenshot is below: Basically, a paddle moves left and right catching particles. Some make you lose points while others make you gains points.
This is my first Deep Q Learning Project, so I probably messed something up, but here is what I have:
model = Sequential() model.add(Dense(200, input_shape=(4,), activation='relu')) model.add(Dense(200, activation='relu')) model.add(Dense(3, activation='linear')) model.compile(loss='categorical_crossentropy', optimizer='adam')
The four inputs are X position of player, X and Y position of particle (one at a time), and the type of particle. Output is left, right, or don't move.
Here is the learning algorithm:
def learning(num_episodes=500): y = 0.8 eps = 0.5 decay_factor = 0.9999 for i in range(num_episodes): state = GAME.reset() GAME.done = False eps *= decay_factor done = False while not done: if np.random.random() < eps: #exploration a = np.random.randint(0, 2) else: a = np.argmax(model.predict(state)) new_state, reward, done = GAME.step(a) #does that step #reward can be -20, -5, 1, and 5 target = reward + y * np.max(model.predict(new_state)) target_vec = model.predict(state)[0] target_vec[a] = target model.fit(state, target_vec.reshape(-1, 3), epochs=1, verbose=0) state = new_state
After training, this usually results in the paddle just going to the side and staying there. I am not sure if the NN architecture (units and hidden layers) is appropriate for given complexity. Also, is it possible that this is failing due to the rewards being very delayed? It can take 100+ frames to get to the food, so maybe this isn't registering well with the neural network.
I only started learning about reinforcement learning yesterday, so would appreciate advice!