MLHEP 2016 lectures slides
This year my team at Yandex organized MLHEP (Machine Learning in High Energy Physics) summer school in Lund, Sweden.
There were two tracks: basic and advanced, lasting for three days + 2 days on neural networks for both tracks together.
School was accompanied by two kaggle challenges: one for both tracks and one for advanced. This is the most producive way to try and learn techniques in practice.
Just as a year ago, I gave lectures for basic track. Previous materials were enriched with new topics and more explanations.
Also, I’ve added many visualizations and animations compared to the previous year.
This 3-day course is the shortest course of machine learning, and it
still gives nice introduction into some advanced topics!
Day 1
Introduction to machine learning terminology. Applications within High Energy Physics and outside HEP.
- Basic problems: classification and regression.
- Nearest neighbours approach and spacial indices
- Overfitting (intro)
- Curse of dimensionality
- ROC curve, ROC AUC
- Bayes optimal classifier
- Density estimation: KDE and histograms
- Parametric density estimation
- Mixtures for density estimation and EM algorithm
- Generative approach vs discriminative approach
- Linear models:
- Linear decision rule, intro to logistic regression
- Linear regression
Day 2
- Linear models: logistic regression
- Polynomial decision rule and polynomial regression
- SVM (Support Vector Machine) and kernel trick
- Overfitting: two definitions
- Model selection
- Regularizations: L1, L2, elastic net.
- Decision trees
- Splitting criteria for classification and regression
- Overfitting in trees: pre-stopping and post-pruning
- Non-stability of trees
- Feature importance
- Ensembling
- RSM, subsampling, bagging.
- Random Forest
Day 3
- Ensembles
- AdaBoost
- Gradient Boosting for regression
- Gradient Boosting for classification
- Second-order information
- Losses: regression, classification, ranking
- Multiclass classification:
- ensembling
- softmax modifications
- Feature engineering and output engineering
- Feature selection
- Dimensionality rediction:
- PCA
- LDA, CSP
- LLE
- Isomap
- Hyperparameter optimization
- ML-based approach
- Gaussian processes
Day 4, part 1
Slides of Tatiana Likhomanenko on non-trivial applications of boosting in High Energy Physics.
Links
- All materials from school are available at MLHEP 2016 repository
- Official page at indico
- Kaggle competitions for school: exotic higgs and triggers