Descriptif
A la fin du cours, les élèves devront pouvoir formuler un problème d'optimisation stochastique, en caractériser les solutions, étudier leur complexité et trouver éventuellement des solutions sous-optimales.
Le cours est illustré par des travaux pratiques en salle informatique, à l'aide du logiciel Scilab.
- les 18 premières heures du cours constituent la partie ENSTA du cours,
- les 12 heures suivantes sont des compléments apportés dans le cadre du Master,
- l'examen commun a lieu lors de la dernière séance.
Le cours sera donné en anglais.
- Contrôle : 3
- Travaux dirigés en salle info : 2
- Stage de communication : 29
effectifs minimal / maximal:
10/35Diplôme(s) concerné(s)
Format des notes
Numérique sur 20Littérale/grade européenPour les étudiants du diplôme Diplôme d'Ingénieur de l'Ecole Nationale Supérieure de Techniques Avancées
Vos modalités d'acquisition :
- le rattrapage est obligatoire si :
- Note initiale < 6
- le rattrapage peut être demandé par l'étudiant si :
- 6 ≤ note initiale < 10
- Crédits ECTS acquis : 1.5 ECTS
- Scientifique acquis : 1.5
Le coefficient de l'UE est : 1.5
La note obtenue rentre dans le calcul de votre GPA.
L'UE est évaluée par les étudiants.
Programme détaillé
1. Dynamic programming equation of deterministic control problems.
Markov chains: Fokker-Planck and Kolmogorov equations, graph and communication classes. Problems with additive or discounted criteria, with constant or non constant discount factors, and with finite horizon.
2. Markov chains: stopping time, strong Markov property. Problems with additive or discounted criteria, and stopping time or infinite horizon. Equivalence between criteria with discount factors and criteria with stopping time.
Homogeneous Markov chains: forward Kolmogorov equation, reduction to a sequence of i.i.d. random variables.
3. Introduction to Markov decision processes (MDP) with complete state observation.
Problems with additive or discounted criteria and finite horizon, with or without an exit time. Markovian strategies, feedback policies given by the dynamic programming equation.
Properties of the dynamic programming operator (monotonicity, homogeneity, sup-norm nonexpansivity, convexity).
4. MDP with infinite horizon. Contraction of the dynamic programming operator. Application to the dynamic programming equation and to the convergence of value iteration and policy iteration algorithms.
5. Long-term behaviour of Markov chains: recurrence, invariant measure, ergodic theorem. Evaluation of mean-payoff/ergodic criteria or long-term risk sensitive criteria.
6. Practical work on the PageRank optimization. This practical work will informally use results which will be developed in next lectures.
Training session on dynamic programming with finite horizon: a Dam management problem.
7. Modelization and solution of several problems among stock management, machine maintenance, portfolio selection, Yield management, transportation, PageRank optimisation.
The aim of this lecture is to illustrate the techniques of the previous lectures, and also to motivate the introduction of new ones, which will be partly studied in the following lectures.
End of the ENSTA program.
8. MDP with mean-payoff/ergodic criteria or long-term risk sensitive criteria: ergodic dynamic programming equation, vanishing discount approach.
The irreducible or unichain case: existence and uniqueness of a solution, value and policy iterations.
9. Constrained MDP. Formulation in terms of occupation measures and linear programming. Relaxed strategies.
10. Introduction to multichain mean-payoff MDP and/or to zero-sum games associated to risk sensitive criteria, and/or Max-plus algebra for deterministic control problems.
11. Problems with partial observation: reduction to a MDP with an infinite state space. The dynamic programming equation.
12. Gittins index for multiarmed bandits.
13. Control: written exam