Ciao, I have pushed the version of our nanoaod Analyzer reading on multiple nanoaod files hosted in eospublic. The notebook (Zpeak_nano_multipledataset_v2.ipynb) equipped with:
a.) The Hadoop-Xrootd connector, declared in spark context (EOS.jar) to enable reading nanoaods files hosts in eospublic.
b.) The code structure is almost identical to Melo's z-mumu-cr.ipynb. By aggregating the derived quantities such as the kinematics of the reconstructed Z boson in a column and save the state variable (pass/fail muon selection).
c.) It is a work in progress, we have not figure how to implement the plotting end due to the known issues we have discussed.
a.) The weights of each processes does not save at first evaluation, looking for a better implementation.
b.) Plot a histogram with weight from a collection which facilitated with histogrammar features.
c.) If the plotting issue resolved, we will scale up the operation by topping up more nanoaod MC and datasetsa.
d.) Re-optimize the code and conduct an assessment on how much we gain from spark in term of time and efficiencies.
If the Zpeak exercise is a success, we can move forward to port other analysis into spark based analytical framework.
- Python 2.7
- numpy version 1.13.1
- jupyter 4.3+
- ipython version 5.7+
- histbook
- vega version 1.1
- uproot
pip install numpy==1.13
pip install jupyter --user
pip install ipython
pip install histbook --user
pip install vega==1.1 --user
pip install uproot --user
Note: you might need to upgrade pip to version 18 to get Jupyter 4.3 or above (pip install --upgrade pip
)
Once you have installed the prerequisites, set up the striped client
git clone http://cdcvs.fnal.gov/projects/nosql-ldrd striped
cd striped
python setup.py install --user
Now on a different directory, clone Coffea repository
git clone [email protected]:LPC-DM/CoffeaMaker.git
Launch Jypyter Notebook
cd CoffeaMaker/
jupyter notebook
It should open a new page in your default browser. If not , you can follow the link displayed in the terminal. At the page you will find the directories and files from the location you ran jupyter notebook. select one of the samples to run it.