Künstliche Intelligenz I (CS188)
Notes based on the course Künstliche Intelligenz I, which itself is based on CS 188 at UC Berkeley.
(Passive) Reinforcement Learning
Similar to a Markov Decision Problem with states \(s \in S\) and actions \(a \in A\), with \(T(s, a, s^\prime)\) and \(R(s, a, s^\prime)\) being unknown!
- States and actions must be tried out
Passive Reinforcement Learning
The agent learns from its experiences.
Model-Based
Step 1: Learn empirical MDP model
- Count outcomes \(s^\prime\) for each \(s, a\)
- Normalize to give an estimate \(\hat{T}(s, a, s^\prime)\)
- Discover each \(\hat{R}(s, a, s^\prime)\) when we experience \((s, a, s^\prime)\)
Step 2: Solve the learned MDP
Use value iteration as with normal MDPs. Repeat steps one and two as needed.
Markov Decision Problems
Markov Decision Processes consist of:
- Set of states \(s_n \in S\)
- Start state \(s_0\)
- Set of actions \(a_n \in A\)
- Transition functions \(T(s,a,s^\prime)\)
- Gives the probability of transitioning from state \(s\) to \(s^\prime\) when taking action \(a\)
- Reward functions \(R(s,a,s^\prime)\)
- Typically, a positive value is “good,” a negativ is “punishment”
- Discount factor \(\gamma\)
- Coefficient that determines how strongly the next iteration is influenced by the previous
- Helps the algorithms converge
MDP quantities are
Search
Heuristics of \(A^\ast\)
Admissability: heuristic cost is less than or equal to the actual cost to the goal.
\begin{equation} h(A) \leq \text{actual cost from A to G} \end{equation}
Consistency: heuristic “arc” cost is less than or equal to the actual cost of each arc.
\begin{equation}
h(A) - h(C) \leq \text{cost}A\text{to}~C
\end{equation}
\(\alpha / \beta\) Pruning
An extension of the Minimax algorithm. It is a depth-first-search which saves time and space by “pruning” branches where there already are established minimums or maximums.