- Tumbleweed 6.17.1-92.48
- Leap-16.0
- Leap-15.6
PERF-MEM(1) | PERF-MEM(1) |
NAME¶
perf-mem - Profile memory accesses
SYNOPSIS¶
perf mem [<options>] (record [<command>] | report)
DESCRIPTION¶
"perf mem record" runs a command and gathers memory operation data from it, into perf.data. Perf record options are accepted and are passed through.
"perf mem report" displays the result. It invokes perf report with the right set of options to display a memory access profile. By default, loads and stores are sampled. Use the -t option to limit to loads or stores.
Note that on Intel systems the memory latency reported is the use-latency, not the pure load (or store latency). Use latency includes any pipeline queuing delays in addition to the memory subsystem latency.
On Arm64 this uses SPE to sample load and store operations, therefore hardware and kernel support is required. See linkperf:perf-arm-spe[1] for a setup guide. Due to the statistical nature of SPE sampling, not every memory operation will be sampled.
On AMD this use IBS Op PMU to sample load-store operations.
COMMON OPTIONS¶
-f, --force
-t, --type=<type>
-v, --verbose
-p, --phys-data
--data-page-size
RECORD OPTIONS¶
<command>...
-e, --event <event>
-K, --all-kernel
-U, --all-user
--ldlat <n>
On supported AMD processors: - /sys/bus/event_source/devices/ibs_op/caps/ldlat file contains '1'. - Supported latency values are 128 to 2048 (both inclusive). - Latency value which is a multiple of 128 incurs a little less profiling
overhead compared to other values. - Load latency filtering is disabled by default.
REPORT OPTIONS¶
-i, --input=<file>
-C, --cpu=<cpu>
-D, --dump-raw-samples
-s, --sort=<key>
And the default sort keys are changed to local_weight, mem, sym, dso, symbol_daddr, dso_daddr, snoop, tlb, locked, blocked, local_ins_lat.
-F, --fields=
In addition to the default fields, 'perf mem report' will provide the following fields to break down sample periods.
Please take a look at the OUTPUT FIELD SELECTION section for caveats.
-T, --type-profile
-U, --hide-unresolved
-x, --field-separator=<separator>
In addition, for report all perf report options are valid, and for record all perf record options.
OVERHEAD CALCULATION¶
Unlike linkperf:perf-report[1], which calculates overhead from the actual sample period, perf-mem overhead is calculated using sample weight. E.g. there are two samples in perf.data file, both with the same sample period, but one sample with weight 180 and the other with weight 20:
$ perf script -F period,data_src,weight,ip,sym 100000 629080842 |OP LOAD|LVL L3 hit|... 20 7e69b93ca524 strcmp 100000 1a29081042 |OP LOAD|LVL RAM hit|... 180 ffffffff82429168 memcpy
$ perf report -F overhead,symbol 50% [.] strcmp 50% [k] memcpy
$ perf mem report -F overhead,symbol 90% [k] memcpy 10% [.] strcmp
OUTPUT FIELD SELECTION¶
"perf mem report" adds a number of new output fields specific to data source information in the sample. Some of them have the same name with the existing sort keys ("mem" and "snoop"). So unlike other fields and sort keys, they’ll behave differently when it’s used by -F/--fields or -s/--sort.
Using those two as output fields will aggregate samples altogether and show breakdown.
$ perf mem report -F mem,snoop ... # ------ Memory ------- --- Snoop ---- # RAM Uncach Other HitM Other # ..................... .............. #
3.5% 0.0% 96.5% 25.1% 74.9%
But using the same name for sort keys will aggregate samples for each type separately.
$ perf mem report -s mem,snoop # Overhead Samples Memory access Snoop # ........ ............ ....................................... ............ #
47.99% 1509 L2 hit N/A
25.08% 338 core, same node Any cache hit HitM
10.24% 54374 N/A N/A
6.77% 35938 L1 hit N/A
6.39% 101 core, same node Any cache hit N/A
3.50% 69 RAM hit N/A
0.03% 158 LFB/MAB hit N/A
0.00% 2 Uncached hit N/A
SEE ALSO¶
linkperf:perf-record[1], linkperf:perf-report[1], linkperf:perf-arm-spe[1]