Johannes Karl Arnold

Künstliche Intelligenz I (CS188)

Notes based on the course Künstliche Intelligenz I, which itself is based on CS 188 at UC Berkeley.


(Passive) Reinforcement Learning

Similar to a Markov Decision Problem with states \(s \in S\) and actions \(a \in A\), with \(T(s, a, s^\prime)\) and \(R(s, a, s^\prime)\) being unknown!

Passive Reinforcement Learning

The agent learns from its experiences.

Model-Based

Step 1: Learn empirical MDP model

Step 2: Solve the learned MDP

Use value iteration as with normal MDPs. Repeat steps one and two as needed.

Markov Decision Problems

Markov Decision Processes consist of:

MDP quantities are

Search

Heuristics of \(A^\ast\)

Admissability: heuristic cost is less than or equal to the actual cost to the goal.

\begin{equation} h(A) \leq \text{actual cost from A to G} \end{equation}

Consistency: heuristic “arc” cost is less than or equal to the actual cost of each arc.

\begin{equation} h(A) - h(C) \leq \text{cost}A\text{to}~C \end{equation}

\(\alpha / \beta\) Pruning

An extension of the Minimax algorithm. It is a depth-first-search which saves time and space by “pruning” branches where there already are established minimums or maximums.