One Two
You are here: Home People Homepages TMVA TMVATutorial

TMVATutorial

Progress on Multivariate Data Analysis with TMVA

E. v. Törne

 

February, 2016


null


TMVA tutorial:
This tutorial is part of the 2012 Helmholtz workshop in Bonn. I gave earlier versions of this tutorial at the TMVA workshop and at the Terascale Statistics Tools School

TMVA tutorial  installation:
We will use the TMVA version provided by root 5.34 that you find on your computer. We create a directory for operations that is equipped with a few useful root macros. Please execute
cd $HOME; cp -r $ROOTSYS/tmva/test/ tmvatutorial; cd tmvatutorial; source setup.sh

If you only have a limited root installation this directory may not be available. In that case download tmvatutorial.tar and unpack it with tar xvf tmvatutorial.tar and then cd tmvatutorial; source setup.sh

and ignore any output that the script produces. You are now ready for exercise 1.


Exercise 1
Run your first job using the macro TMVAClassification.C Train the classifiers LD and BDT on the test data by running
root -l TMVAClassification.C\(\"LD,BDT\"\)
To use the TMVA collections of macros, type:
root -l TMVAGui.C
Use the macro button number 10 to display individual decision trees. 
Inspect the output of the training and evaluation which is stored in file TMVA.root. Next, run the reader application using root -l TMVAClassificationApplication.C\(\"LD,BDT\"\)

Exercise 2, classification task
The data for this exercise: testData.root
The example macro: TMVAExample.C
Preparation for TMVA exercise:
copy testData.root and TMVAExample.C to the tmvatutorial directory 
cd to this directory.
Inspect signal and background tree in the root file.
Edit TMVAExample.C and run it.
We will split into different groups.
Group A: Likelihood
Group B: Boosted Decision trees (BDT)

The task: Try to find the optimal classifier (as measured by the ROC curve integral)
Please note, to use the TMVA collection of macros, type:
root -l TMVAGui.C


Exercise 3, regression task
Regression analysis provides an estimate of one (or several) continuous observables based on input variables. In this exercise the data represent measurements in a toy-calorimeter. The observable to be estimated is the energy of the calorimeter cluster. All energies are given in GeV. The calorimeter is segmented into five thin layers followed by eight thicker layers. The calorimeter is imperfect in many ways, making the energy measurment more challenging. There are indications of leakage at the end of the calorimeter, dead regions and non-compensation. The data represents a ensemble of measurements from jets and from single particles. There is always just one cluster present in each event. The energy measurements of each layer are labelled e0 through e12. the sum over all layers is called esum. The true energy deposition in the training tree is called etruth. The quantity etruth is in principle our target variable. In practice it is better to target the correction factor for esum, namely the ratio etruth/esum.
Also available are the cluster center-of-gravity in eta and phi (variables eta and phi).
The data for this exercise: testDataReg.root
The example macro: TMVAExampleReg.C
Preparation for TMVA exercise:
copy testDataReg.root and TMVAExampleReg.C to directory tmvatest
Try to find a classifier that provides the smallest standard deviation of target vs estimated value. The macro TMVARegGui.C is the collection of macros for regression. Use this macro to display the average standard deviation. 


Documentation:
TMVA web site: http://tmva.sourceforge.net
TMVA manual: TMVA - Toolkit for Multivariate Data Analysis , A. Hoecker et al., arXiv:physics/0703039v5 [physics.data-an] (revised 2009)
TMVA workshop link 
My tips and tricks-talk at the TMVA workshop link


Responsible: Dr. Eckhard von Toerne, 
Last modified: 14/12/15

Document Actions
Personal tools