Künstliche Intelligenz I (CS188)

Notes based on the course Künstliche Intelligenz I, which itself is based on CS 188 at UC Berkeley.

(Passive) Reinforcement Learning

7/5/23

Similar to a Markov Decision Problem with states \(s \in S\) and actions \(a \in A\), with \(T(s, a, s^\prime)\) and \(R(s, a, s^\prime)\) being unknown!

States and actions must be tried out

Passive Reinforcement Learning

The agent learns from its experiences.

Model-Based

Step 1: Learn empirical MDP model

Count outcomes \(s^\prime\) for each \(s, a\)
Normalize to give an estimate \(\hat{T}(s, a, s^\prime)\)
Discover each \(\hat{R}(s, a, s^\prime)\) when we experience \((s, a, s^\prime)\)

Step 2: Solve the learned MDP

Use value iteration as with normal MDPs. Repeat steps one and two as needed.

Markov Decision Problems

6/21/23

Markov Decision Processes consist of:

Set of states \(s_n \in S\)
Start state \(s_0\)
Set of actions \(a_n \in A\)
Transition functions \(T(s,a,s^\prime)\)
- Gives the probability of transitioning from state \(s\) to \(s^\prime\) when taking action \(a\)
Reward functions \(R(s,a,s^\prime)\)
- Typically, a positive value is “good,” a negativ is “punishment”
Discount factor \(\gamma\)
- Coefficient that determines how strongly the next iteration is influenced by the previous
- Helps the algorithms converge

MDP quantities are

Search

4/27/23

Heuristics of \(A^\ast\)

Admissability: heuristic cost is less than or equal to the actual cost to the goal.

\begin{equation} h(A) \leq \text{actual cost from A to G} \end{equation}

Consistency: heuristic “arc” cost is less than or equal to the actual cost of each arc.

\begin{equation} h(A) - h(C) \leq \text{cost}A\text{to}~C \end{equation}

\(\alpha / \beta\) Pruning

An extension of the Minimax algorithm. It is a depth-first-search which saves time and space by “pruning” branches where there already are established minimums or maximums.