



# Current and Future YARR in a Nutshell ITk DAQ Workshop - 27.05.19

Timon Heim - LBNL





UNIVERSITY OF





- Motivation and Goals
- Conceptual overview and design philosophy
  - Hardware controller
  - Front-End chip
  - Scans
- Data processing: decoding, histogramming, & analysis
- Next development steps



## Motivation



- Example IBL: lab testing → USBPix, stave testing → RCE, operation → ROD/BOC
  - Each system had a different code base (sw + fw)
  - SW entangled with fw, not easy to simply migrate
  - Expert knowledge and maturity of sw lost or not available at next stage
  - Had a huge impact on DAQ for operation and how many developers were needed to get it running
- Want to maintain a mature sw and have experts be able to apply their knowledge over a broad spectrum of test scales





- For ITk we should strive to have software base which can be used for small scale lab tests as well full detector operation
- SW should be common to Pixels and Strips
- This results in certain requirements:
  - Agnostic to hardware/firmware
  - Somewhat agnostic to Front-End chip type
  - Scalable in both senses (small and large)



## Where does <u>Freedow</u> YARR come from?

- YARR was originally designed as readout system using PCIe FPGA cards for FE-I4 (IBL)
- It tried to perform as much processing as possible in SW
- This resulted in minimum entanglement with FW, the PCIe card simply acted as a FIFO
- After some abstraction of the hardware and chip interface YARR seemed useable as a basis to be expanded and used as ITk SW
- However i.e. there are some remnant from the old days:
  - Certain things are general but still in the FE-I4 class
  - The hardware interface has certain functions which are driven by the features of the YARR-FW for PCIe (primarily naming)

# Conceptual Overview



#### YARR SW core:

**B** EXPERIMENT

- Common sw core used in small lab systems up to full detector readout
- Improvements to core transfer to all DAQ systems
- Well defined interfaces required



# Design Philosophies



- Simple firmware, smart software: the more we do in software, the less we are bound to specific features in hw/fw → hardware agnostic
- Keep it modular: the more often code and structures can be re-used, the better → Front-End chips are more alike than you might think
- Simplicity can be key: carefully balance performance and simplicity, we are bad at writing documentation and the code has to live for the next 10+ years, still don't want it to be slow of course
- Pipeline it: wherever possible data should only travel in one direction, avoid process interdependency → eases scaling

# Hardware Controller



#### Assumptions:

- Interaction with chip can be broken down into sending and receiving of data
  - Represented by the Tx and RxCore in the interface
- Sending of commands:
  - Small to medium bandwidth
  - Primarily configuration data
  - Broadcast wherever possible
  - Some support from firmware, but not necessary for everything
- Receiving of data:
  - Max. bandwidth
  - One data stream per front-end object
  - Will enter processing chain
  - Only rudimentary decoding done in hardware
- Sending either to one or all chips\*
- Receiving from all chips (demultiplexing in sw)\*

#### \*see future section





#### FiFo style sending of commands

virtual void writeFifo(uint32\_t) = 0; virtual void releaseFifo() = 0; virtual void setCmdEnable(uint32\_t) = 0; virtual uint32\_t getCmdEnable() = 0; virtual bool isCmdEmpty() = 0;

#### Trigger interface

```
virtual void setTrigEnable(uint32_t value) = 0;
virtual uint32_t getTrigEnable() = 0;
virtual void maskTrigEnable(uint32_t value, uint32_t mask) = 0;
virtual bool isTrigDone() = 0;
virtual void setTrigConfig(enum TRIG_CONF_VALUE cfg) = 0;
virtual void setTrigFreq(double freq) = 0; // in Hz
virtual void setTrigCnt(uint32_t count) = 0;
virtual void setTrigTime(double time) = 0; // in s
virtual void setTrigWordLength(uint32_t length) = 0; // From Msb
virtual void setTrigWord(uint32_t *word, uint32_t length) = 0; // 4 words, start at Msb
virtual void toggleTrigAbort() = 0;
```

Interface to a buffer which can be send with a programmed frequency. Can do this from either sw or fw (but better timing via fw).

writeFifo() can be buffered in

SW to increase package size,

until releaseFifo() is called

https://gitlab.cern.ch/YARR/YARR/blob/master/src/libYarr/include/TxCore.h





#### Reading data FiFo style

virtual void setRxEnable(uint32\_t val) = 0; virtual void maskRxEnable(uint32\_t val, uint32\_t mask) = 0;

```
virtual RawData* readData() = 0;
virtual void flushBuffer() {}
```

```
virtual uint32_t getDataRate() = 0;
virtual uint32_t getCurCount() = 0;
virtual bool isBridgeEmpty() = 0;
```

#### SW will read until FiFo is empty

```
Raw data object
```

```
class RawData {
    public:
        RawData(uint32_t arg_adr, uint32_t *arg_buf, unsigned arg_words);
        ~RawData();
```

```
uint32_t adr;
uint32_t *buf;
unsigned words;
LoopStatus stat;
```

};

If data give as pointer, does not copy data.

https://gitlab.cern.ch/YARR/YARR/blob/master/src/libYarr/include/RxCore.h https://gitlab.cern.ch/YARR/YARR/blob/master/src/libYarr/include/RawData.h

Timon Heim



# Front-End Chip Implementation



- Chip only needs to implement a config file interface and basic  $\bullet$ configuration routines
- Advanced functions determined by scan needs which are not generic
- There will be one object for each chip
- A virtual copy of the chip config is saved within the object
- Wherever possible register should be referred to by object and not by string (can't avoid this fully)

https://gitlab.cern.ch/YARR/YARR/blob/master/src/libYarr/include/FrontEnd.h





- Scans typically do the following:
  - Configure all activated chips
  - Run a loop actions as nested structure:
    - Loop over parameter
      - Activate portion of pixels
        - Inject & Trigger O(100) times
        - Read data
- This is facilitated by the scan engine
- Most loop actions are custom for each chip, there are some more general though





- All data from one innermost loop iteration is packaged and meta data describing the current loop state is added
- These data packages are then run through the processing chain
- Most scans are fully described a-priori, except tunings:
  - Tunings require a parameter change which depends on the analysis outcome
  - FeedbackLoops facilitate the interface to the analysis and allow the analysis to change parameters
  - Usually use a "hot or cold" scheme, where the analysis only determines the direction and the LoopAction applies the correct parameter change (LoopAction is in charge of tuning Algorithm)



## An Example Scan



rrrrr



### Data Processing







### Data Pipeline EERKELEY LAE



**TULLIN** 







-k: Report known items (Scans, Hardware etc.)

#### Important command line arguments:

- -c : the connectivity tell scanConsole which chip is connected where and also points to the right chip config
- -r : the controller config tell scanConsole which controller to use and how to configure it
- -s: the scan config contains all necessary information to construct the scan



```
configs BERKELEY LAB
```

```
example_rd53a_setup.json
1
   {
  1
        "chipType" : "RD53A",
 2
        "chips" :
 3
 4
                "config" : "configs/rd53a_test.json",
 5
                "tx" : 0,
 6
                "rx" : 0,
  7
                "enable" : 1,
 8
                "locked" : 0
 9
            },
10
11
12
                "config" : "configs/rd53a_test_1.json",
13
                "tx" : 1,
                "rx" : 1,
14
15
                "enable" : 0,
16
                "locked" : 0
17
18
19 }
```

1 specCfg.json 1 { 2 "ctrlCfg" : { 3 "type": "spec", "cfg" : { 4 "specNum" : 0, 5 "spiConfig" : 541200, 6 "autoZero" : { 7 "word" : 1549575846, 8 "interval" : 500 9 10 }, 11 "cmdPeriod" : 6.25e-9 12 } 13 } 14 }

.....

"config": { 52 "max": 50, 53 54 "min": 0, 55 "step": 1, 56 "nSteps": 25 57 },
"loopAction": "Rd53aCoreColLoop" 58 59 }, 60 "config": { 61 "count": 100, 62 63 "delay": 56, "extTrigger": false, 64 65 "frequency": 18000, "noInject": false, 66 "time": 0, 67 "edgeMode": true 68 69 }, "loopAction": "Rd53aTriggerLoop" 70 71 Ι, 72 "loopAction": "StdDataLoop" 73 74 75 ], "name": "DigitalScan", 76 "prescan": 77 "InjEnDig": 1, 78 "InjAnaMode": 0, 79 "LatencyConfig": 58, 80 "GlobalPulseRt": 16384, 81 "SyncVth": 500, 82 "LinVth" : 500, 83 84 "DiffVth1": 500 85

"config":

},

},

"max": 64, "min": 0,

"step": 1

"loopAction": "Rd53aMaskLoop"

42 43 44

45

46 47

48

49

50

51

86 87 }

19

# CONFIGS II BERKELEY LAB



"algorithm": "TotMap",

"config": {}



19

20

21

22

23

24

}**,** 

"1": {

}, "2": {





# Work In Progress



# Scaling It Up



#### • How to scale this up?

- Break pipeline into pieces distributed over multiple machines
- Have multiple scan engines delivering data to a central or multiple central data processing servers
- Requires:
  - Orchestration of scan engines and data processors, distribution of configuration to all sub-processors (RPC)
  - Serialisation of data in between processes (IPC)

https://indico.cern.ch/event/609081/contributions/2636091/attachments/1483038/2300644/ItkWeek\_SW\_20170626.pdf

### The current scan operation model being assumed

- INFN
- For performing a scan, for each hardware, there is a software process which exclusively governs the control of configuration and trigger (the one that TxCore() and RxCore() are equipped). This part of the software module is referred to as "Scan Engine".
- Each scan engine is agnostic to the presence of the other hardwares.
- Each hardware board works in parallel between the start and the end of the scan, but they do the same scan task.
  Expected to be applicable
- The organization of multiple scan engines is synchronization of the configuration and states (in terms of slow control) via high-level messaging.
- The data processing (histogramming and analysis) may be performed locally, or delegated to the specialized computing farm allocated in the downstream.
   (Arbitrarieness of the arrangement should be ensured.)



these work independently (asynchronously)

### Data flow design



- Wish to have flexibility of grouping of the function modules within a process. 0
- Object data need to support serialization.



# Generic Data Processor



- Already heavily rely on nlohmann::json, convenient format also for serialisation
- Use <u>msgpack</u> to serialise json object
- Where json is too costly in terms of memory or bandwidth, usually already have handy RawData format
- By performing some optimisation to nlohmann::json could even be used as a histogram container (see recent work from Matthias)







- FELIX is in many ways similar to YARR-PCIe as it is trying to stay agnostic and just shuffle data from and to the chip
- However primarily difference is that one does not interact directly with FELIX, but rather NetIO
- NetIO is an IPC package and enables subscription to single data channels
- NetIO has been successfully implemented as a hardware controller, however the interface is somewhat unoptimised towards it



e-link0

e-link1

e-link2

**RxCore** 

### NetIO optimisation BERKELEY LAE



Histogrammer

Histogrammer

Histogrammer

If data is available

already demultiplexed,

should just pass it on

Otherwise hw specific

RxCore takes care of

demux

Currently assuming data comes through single interface, hence demultiplexing in DataProcessor. For NetIO we have to aggregate again because of this.

e-link0 DataProcessor Histogrammer e-link1 **RxCore** DataProcessor -> Histogrammer

DataProcessor



#### local DB:

- Possibly gitDB based (prototype exists but needs some scrutinisation)
- Via git features can be used to sync configs over multiple machines or even to other institutes (aka remotes)
- All stored files are json based, plane file editing still possible

#### **Production DB:**

- Should only store good and interpreted QC data (result based on input from multiple scans)
- Can retrieve configs from last step (or before) to local DB
- QC Analysis can also take other source of data into account (e.g. pictures)



# Further Outlook



- Target supporting larger system tests:
  - Distributed processing
  - O(100) chip operation
  - FELIX
- Develop and document routines for QC
  - Interaction with database
  - Also interesting for Strips as we will run surface tests with FELIX and have to compare to previous QC
- Test and benchmark detector-level operation of the code
  - Pulling/Pushing configurations from DB
  - Crashing sub-processes
- Develop and implement SW ROD





# Backup







#### • Gitlab: <u>https://gitlab.cern.ch/YARR/YARR</u>





## Loop Actions



