#include <perfmon/pfmlib.h> #include <perfmon/pfmlib_montecito.h> int pfm_mont_is_ear(unsigned int i); int pfm_mont_is_dear(unsigned int i); int pfm_mont_is_dear_tlb(unsigned int i); int pfm_mont_is_dear_cache(unsigned int i); int pfm_mont_is_dear_alat(unsigned int i); int pfm_mont_is_iear(unsigned int i); int pfm_mont_is_iear_tlb(unsigned int i); int pfm_mont_is_iear_cache(unsigned int i); int pfm_mont_is_etb(unsigned int i); int pfm_mont_support_opcm(unsigned int i); int pfm_mont_support_iarr(unsigned int i); int pfm_mont_support_darr(unsigned int i); int pfm_mont_get_event_maxincr(unsigned int i, unsigned int *maxincr); int pfm_mont_get_event_umask(unsigned int i, unsigned long *umask); int pfm_mont_get_event_group(unsigned int i, int *grp); int pfm_mont_get_event_set(unsigned int i, int *set); int pfm_mont_get_event_type(unsigned int i, int *type); int pfm_mont_get_ear_mode(unsigned int i, pfmlib_mont_ear_mode_t *mode); int pfm_mont_irange_is_fine(pfmlib_output_param_t *outp, pfmlib_mont_output_param_t *mod_out);
The Itanium 2 900 (Montecito) processor specific functions presented here are mostly used to retrieve the characteristics of an event. Given a opaque event descriptor, obtained by pfm_find_event or its derivatives, they return a boolean value indicating whether this event support this feature or is of a particular kind.
The pfm_mont_is_ear() function returns 1 if the event designated by i corresponds to a EAR event, i.e., an Event Address Register type of events. Otherwise 0 is returned. For instance, DATA_EAR_CACHE_LAT4 is an ear event, but CPU_OP_CYCLES_ALL is not. It can be a data or instruction EAR event.
The pfm_mont_is_dear() function returns 1 if the event designated by i corresponds to an Data EAR event. Otherwise 0 is returned. It can be a cache or TLB EAR event.
The pfm_mont_is_dear_tlb() function returns 1 if the event designated by i corresponds to a Data EAR TLB event. Otherwise 0 is returned.
The pfm_mont_is_dear_cache() function returns 1 if the event designated by i corresponds to a Data EAR cache event. Otherwise 0 is returned.
The pfm_mont_is_dear_alat() function returns 1 if the event designated by i corresponds to a ALAT EAR cache event. Otherwise 0 is returned.
The pfm_mont_is_iear() function returns 1 if the event designated by i corresponds to an instruction EAR event. Otherwise 0 is returned. It can be a cache or TLB instruction EAR event.
The pfm_mont_is_iear_tlb() function returns 1 if the event designated by i corresponds to an instruction EAR TLB event. Otherwise 0 is returned.
The pfm_mont_is_iear_cache() function returns 1 if the event designated by i corresponds to an instruction EAR cache event. Otherwise 0 is returned.
The pfm_mont_support_opcm() function returns 1 if the event designated by i supports opcode matching, i.e., can this event be measured accurately when opcode matching via PMC32/PMC34 is active. Not all events supports this feature.
The pfm_mont_support_iarr() function returns 1 if the event designated by i supports code address range restrictions, i.e., can this event be measured accurately when code range restriction is active. Otherwise 0 is returned. Not all events supports this feature.
The pfm_mont_support_darr() function returns 1 if the event designated by i supports data address range restrictions, i.e., can this event be measured accurately when data range restriction is active. Otherwise 0 is returned. Not all events supports this feature.
The pfm_mont_get_event_maxincr() function returns in maxincr the maximum number of occurrences per cycle for the event designated by i. Certain Itanium 2 9000 (Montecito) events can occur more than once per cycle. When an event occurs more than once per cycle, the PMD counter will be incremented accordingly. It is possible to restrict measurement when event occur more than once per cycle. For instance, NOPS_RETIRED can happen up to 6 times/cycle which means that the threshold can be adjusted between 0 and 5, where 5 would mean that the PMD counter would be incremented by 1 only when the nop instruction is executed more than 5 times/cycle. This function returns the maximum number of occurrences of the event per cycle, and is the non-inclusive upper bound for the threshold to program in the PMC register.
The pfm_mont_get_event_umask() function returns in umask the umask for the event designated by i.
The pfm_mont_get_event_grp() function returns in grp the group to which the event designated by i belongs. The notion of group is used for L1D and L2D cache events only. For all other events, a group is irrelevant and can be ignored. If the event is an L2D cache event then the value of grp will be PFMLIB_MONT_EVT_L2D_CACHE_GRP. Similarly, if the event is an L1D cache event, the value of grp will be PFMLIB_MONT_EVT_L1D_CACHE_GRP. In any other cases, the value of grp will be PFMLIB_MONT_EVT_NO_GRP.
The pfm_mont_get_event_set() function returns in set the set to which the event designated by i belongs. A set is a subdivision of a group and is therefore only relevant for L1 and L2 cache events. An event can only belong to one group and one set. This partioning of the cache events is due to some hardware limitations which impose some restrictions on events. For a given group, events from different sets cannot be measured at the same time. If the event does not belong to a group then the value of set is PFMLIB_MONT_EVT_NO_SET.
The pfm_mont_get_event_type() function returns in type the type of the event designated by i belongs. The itanium2 9000 (Montecito) events can have any one of the following types:
The pfm_mont_irange_is_fine function returns 1 if the configuration description passed in outp, the generic output parameters and mod_out, the Itanium 2 9000 (Montecito) specific output parameters, use code range restriction in fine mode. Otherwise the function returns 0. This function can only be called after a call pfm_dispatch_events() which returned successfully and had the data structures pointed to by outp and mod_out as output parameters.
The pfm_mont_get_event_ear_mode() function returns in mode the EAR mode of the event designated by i. If the event is not an EAR event, then PFMLIB_ERR_INVAL is returned and mode is not updated. Otherwise mode can have the following values:
When the Itanium 2 9000 (Montecito) specific features are needed to support a measurement their descriptions must be passed as model-specific input arguments to the pfm_dispatch_events call. The Itanium 2 9000 (Montecito) specific input arguments are described in the pfmlib_mont_input_param_t structure and the output parameters in pfmlib_mont_output_param_t. They are defined as follows:
typedef struct {
unsigned int flags;
unsigned int thres;
} pfmlib_mont_counter_t;
typedef struct {
unsigned char opcm_used;
unsigned char opcm_m;
unsigned char opcm_i;
unsigned char opcm_f;
unsigned char opcm_b;
unsigned long opcm_match;
unsigned long opcm_mask;
} pfmlib_mont_opcm_t;
typedef struct {
unsigned char etb_used;
unsigned int etb_plm;
unsigned char etb_ds;
unsigned char etb_tm;
unsigned char etb_ptm;
unsigned char etb_ppm;
unsigned char etb_brt;
} pfmlib_mont_etb_t;
typedef struct {
unsigned char ipear_used;
unsigned int ipear_plm;
unsigned short ipear_delay;
} pfmlib_mont_ipear_t;
typedef enum {
PFMLIB_MONT_EAR_CACHE_MODE= 0,
PFMLIB_MONT_EAR_TLB_MODE = 1,
PFMLIB_MONT_EAR_ALAT_MODE = 2
} pfmlib_mont_ear_mode_t;
typedef struct {
unsigned char ear_used;
pfmlib_mont_ear_mode_t ear_mode;
unsigned int ear_plm;
unsigned long ear_umask;
} pfmlib_mont_ear_t;
typedef struct {
unsigned int rr_plm;
unsigned long rr_start;
unsigned long rr_end;
} pfmlib_mont_input_rr_desc_t;
typedef struct {
unsigned long rr_soff;
unsigned long rr_eoff;
} pfmlib_mont_output_rr_desc_t;
typedef struct {
unsigned int rr_flags;
pfmlib_mont_input_rr_desc_t rr_limits[4];
unsigned char rr_used;
} pfmlib_mont_input_rr_t;
typedef struct {
unsigned int rr_nbr_used;
pfmlib_mont_output_rr_desc_t rr_infos[4];
pfmlib_reg_t rr_br[8];
} pfmlib_mont_output_rr_t;
typedef struct {
pfmlib_mont_counter_t pfp_mont_counters[PMU_MONT_NUM_COUNTERS];
unsigned long pfp_mont_flags;
pfmlib_mont_opcm_t pfp_mont_opcm1;
pfmlib_mont_opcm_t pfp_mont_opcm2;
pfmlib_mont_ear_t pfp_mont_iear;
pfmlib_mont_ear_t pfp_mont_dear;
pfmlib_mont_ipear_t pfp_mont_ipear;
pfmlib_mont_etb_t pfp_mont_etb;
pfmlib_mont_input_rr_t pfp_mont_drange;
pfmlib_mont_input_rr_t pfp_mont_irange;
} pfmlib_mont_input_param_t;
typedef struct {
pfmlib_mont_output_rr_t pfp_mont_drange;
pfmlib_mont_output_rr_t pfp_mont_irange;
} pfmlib_mont_output_param_t;
The Itanium 2 9000 (Montecito) processor provides one per-event feature for counters: thresholding. It can be set using the pfp_mont_counters data structure for each event.
The thres indicates the threshold for the event. A threshold of n means that the counter will be incremented by one only when the event occurs more than n times per cycle.
The flags field contains event-specific flags. The currently defined flags are:
The pfp_mont_opcm1 and pfp_mont_opcm2 fields of type pfmlib_mont_opcm_t contain the description of what to do with the opcode matchers. The Itanium 2 9000 (Montecito) processor supports opcode matching via PMC32 and PMC34. When this feature is used the opcm_used field must be set to 1, otherwise it is ignored by the library. The Itanium 2 9000 (Montecito) processor implements two full 41-bit opcode matchers. As such, it is possible to match all instructions individually. It is possible to match a single instruction or an instruction pattern based on opcode or slot type. The slots are specified in:
Any combinations of slot settings is supported. To match all slot types, simply set all fields to 1.
The 41-bit opcode is specified in opcm_match and a 41-bit mask is passed in opcm_mask. When a bit is set in opcm_mask the corresponding bit is ignored in opcm_match.
The pfp_mont_iear field of type pfmlib_mont_ear_t describes what to do with instruction Event Address Registers (I-EARs). Again if this feature is used the ear_used must be set to 1, otherwise it will be ignored by the library. The ear_mode must be set to either one of PFMLIB_MONT_EAR_TLB_MODE, PFMLIB_MONT_EAR_CACHE_MODEto indicate the type of EAR to program. The umask to store into PMC10 must be in ear_umask. The privilege level mask at which the I-EAR will be monitored must be set in ear_plm which can be any combination of PFM_PLM0, PFM_PLM1, PFM_PLM2, PFM_PLM3. If ear_plm is 0 then the default privilege level mask in pfp_dfl_plm is used.
The pfp_mont_dear field of type pfmlib_mont_ear_t describes what to do with data Event Address Registers (D-EARs). The description is identical to the I-EARs except that it applies to PMC11 and that a ear_mode of PFMLIB_MONT_EAR_ALAT_MODE is possible.
In general, there are four different methods to program the EAR (data or instruction):
There are 4 methods to program the ETB and they are as follows:
Range restriction is implemented using the debug registers. There is a limited number of debug registers and they go in pair. With 8 data debug registers, a maximum of 4 distinct ranges can be specified. The same applies to code range restrictions. Moreover, there are some severe constraints on the alignment and size of the ranges. Given that the size of a range is specified using a bitmask, there can be situations where the actual range is larger than the requested range. For code ranges, Itanium 2 9000 (Montecito) processor can use what is called a fine mode, where a range is designated using two pairs of code debug registers. In this mode, the bitmask is not used, the start and end addresses are directly specified. Not all code ranges qualify for fine mode, the size of the range must be 64KB or less and the range cannot cross a 64KB page boundary. The library will make a best effort in choosing the right mode for each range. For code ranges, it will try the fine mode first and will default to using the bitmask mode otherwise. Fine mode applies to all code debug registers or none, i.e., you cannot have a range using fine mode and another using the bitmask. The Itanium 2 9000 (Montecito) processor somehow limits the use of multiple pairs to accurately cover a code range. This can only be done for IA64_INST_RETIRED and even then, you need several events to collect the counts. For all other events, only one pair can be used, which leads to more inaccuracy due to approximation. Data ranges can used multiple debug register pairs to gain more accuracy. The library will never cover less than what is requested. The algorithm will use more than one pair of debug registers whenever possible to get a more precise range. Hence, up to the 4 pairs can be used to describe a single range.
If range restriction is to be used, the rr_used field must be set to one, otherwise settings will be ignored. The ranges are described by the pfmlib_mont_input_rr_t structure. Up to 4 ranges can be defined. Each range is described in by a entry in rr_limits. Some flags for all ranges can be defined in rr_flags. Currently defined flags are:
The pfmlib_mont_input_rr_desc_t structure is defined as follows:
The library will provide the values for the debug registers as well as some information about the actual ranges in the output parameters and more precisely in the pfmlib_mont_output_rr_t structure for each range. The structure is defined as follows: