内存管理:内存分配器-slub调试
slub内存分配器是目前linux kernel中最常用的分配器,它基于slab分配器改造而来,保留了slab的设计思想、大部分的数据结构、完整的API接口,摒弃了slab的着色机制、优化了slab的管理数据结构,比slab更加简单和高效,目前已经取代slab分配器成为kernel的默认分配器。
Kernel中大部分动态的内存分配都来自slub分配器,分析kernel层面内存踩踏问题,几乎无法避免slub的调试功能。
SOURCE CODE TREE
/kernel/mm/slab.h : 三个分配器本地公共头文件
/kernel/mm/slab_common.c : 三个分配器公共源码
/kernel/mm/slab.c : slab分配器源码,最开始出现的分配器,是另外两个的基础
/kernel/mm/slub.c : slub分配器源码,当前主流首选分配器
/kernel/mm/slob.c : slob分配器源码,主要适用于内存有限的嵌入式设备
/kernel/include/linux/slub_def.h : slab分配器对外头文件
/kernel/include/linux/slab_def.h : slub分配器对外头文件
/kernel/include/linux/slab.h : 三个分配器公共对外头文件,提供对外接口
/kernel/include/linux/poison.h : 其中定义了slab debug用的内存填充值
MECHANISM
算法
数据结构
① padding区域:
1 2 | if (unlikely(s->flags & SLAB_POISON))
memset(start, POISON_INUSE, PAGE_SIZE << order);
|
这个区域是在该slab页被分配时,直接初始化填充的
② redzone区域:
③ object区域:
API
参考文档
RUNNING DEBUG
Debug Feature
[动态/静态] - slub可以动态开/关该feature而不必重新编译kernel(此时一般slub的调试feature默认是关闭状态),也可以静态开启该feature;而相比之下,slab开启调试则需要重新编译kernel
Tips:
1. All options can be set on a slab that does not contain objects.
2. If the slab already contains objects then sanity checks and tracing may only be enabled. The other options may cause the realignment of objects.
3. 以上,基本上可以通过尝试用echo修改"运行时debug节点"小节各个节点的值来确认,不能修改的一定是无法动态开关的
[非耦合] - slub可以只针对选中的某个slab cache开/关调试,而不影响其它slab cache的运行
[on runtime] - 可在/sys/kernel/slab/<slab name>路径动态开启
- 具体见下面"运行时debug节点"
- echo值(0,1,其它)到对应的节点下
[on compile] - 可在编译时通过CONFIG_SLUB_DEBUG_ON开启
- 具体见下面"CONFIG_SLUB_DEBUG_ON"一节
[on boot] - 也可在kernel command line可携带字符串参数开启
- "slub_debug" : 为所有slab cache开启全部的debug选项
- "slub_debug=<Debug-Options>" : 为所有slub cache开启该选项
- "slub_debug=<Debug-Options>,<slab name>" : 为选中的slub cache开启该选项
Debug-Options | description |
---|---|
F | Sanity checks on (enables SLAB_DEBUG_CONSISTENCY_CHECKS Sorry SLAB legacy issues) |
Z | Red zoning |
P | Poisoning (object and padding) |
U | User tracking (free and alloc) |
T | Trace (please only use on single slabs) |
A | Toggle failslab filter mark for the cache |
O | Switch debugging off for caches that would have caused higher minimum slab orders |
- | Switch all debugging off (useful if the kernel is configured with CONFIG_SLUB_DEBUG_ON) |
例如:
slub_debug=FZ : to boot just with sanity checks and red zoning
slub_debug=,dentry : to find an issue in the dentry cache
Debug Tools
[编译] - 默认编译kernel时,并不编译该工具,需要可以自行编译
- 代码路径:/kernel/tools/vm/slabinfo.c
- 编译 $(CROSS_COMPILE)gcc -Wall -Wextra -I../lib/ -o slabinfo slabinfo.c ../lib/api/libapi.a
Tips:或者直接cd到代码目录下设置好CROSS_COMPILE变量后 make slabinfo
[使用] - 可以获取slub cahe的统计数据,以及对slub cache的debug节点执行操,如Tips:通过slabinfo -h查看具体用法
- 默认情况命令只返回有数据的slub cache的信息
- 其中某些命令的执行,需要在kernel command line中开启slub debug.比如:未开启slab debug on时,获取不到stack tracking的信息
例如:
ace.an@tjandroid02:~/sprdroid8.1_trunk/kernel/tools/vm$ ./slabinfo -h
slabinfo 4/15/2011. (c) 2007 sgi/(c) 2011 Linux Foundation.
slabinfo [-ahnpvtsz] [-d debugopts] [slab-regexp]
-a|--aliases Show aliases
-A|--activity Most active slabs first
-d<options>|--debug=<options> Set/Clear Debug options
-D|--display-active Switch line format to activity
-e|--empty Show empty slabs
-f|--first-alias Show first alias
-h|--help Show usage information
-i|--inverted Inverted list
-l|--slabs Show slabs
-n|--numa Show NUMA information
-o|--ops Show kmem_cache_ops
-s|--shrink Shrink slabs
-r|--report Detailed report on single slabs
-S|--Size Sort by size
-t|--tracking Show alloc/free information
-T|--Totals Show summary information
-v|--validate Validate slabs
-z|--zero Include empty slabs
-1|--1ref Single reference
-N|--lines=K Show the first K slabs
-L|--Loss Sort by loss
-X|--Xtotals Show extended summary information
-B|--Bytes Show size in bytes
Valid debug options (FZPUT may be combined)
a / A Switch on all debug options (=FZUP)
- Switch off all debug options
f / F Sanity Checks (SLAB_DEBUG_FREE)
z / Z Redzoning
p / P Poisoning
u / U Tracking
t / T Tracing
Slab Cache Debug Node
- The /sys/kernel/slab directory contains a snapshot of the internal state of the SLUB allocator for each cache.
- Certain files may be modified to change the behavior of the cache (and any cache it aliases, if any).
Tips: 有该目录表示开启 slub分配器,该目录下各种目录都是并行的不同类型的slub分配器
- The aliases file is read-only and specifies how many caches have merged into this cache.
- The align file is read-only and specifies the cache's object alignment in bytes.
- The alloc_calls file is read-only and lists the kernel code locations from which allocations for this cache were performed.
- The alloc_calls file only contains information if debugging is enabled for that cache (see Documentation/vm/slub.txt).
- The alloc_fastpath file shows how many objects have been allocated using the fast path.
- It can be written to clear the current count.
- Available when CONFIG_SLUB_STATS is enabled.
- The alloc_from_partial file shows how many times a cpu slab has been full and it has been refilled by using a slab from the list of partially used slabs.
- It can be written to clear the current count.
- Available when CONFIG_SLUB_STATS is enabled.
- The alloc_refill file shows how many times the per-cpu freelist was empty but there were objects available as the result of remote cpu frees.
- It can be written to clear the current count.
- Available when CONFIG_SLUB_STATS is enabled.
- The alloc_slab file is shows how many times a new slab had to be allocated from the page allocator.
- It can be written to clear the current count.
- Available when CONFIG_SLUB_STATS is enabled.
- The alloc_slowpath file shows how many objects have been allocated using the slow path because of a refill or allocation from a partial or new slab.
- It can be written to clear the current count.
- Available when CONFIG_SLUB_STATS is enabled.
- The cache_dma file is read-only and specifies whether objects are from ZONE_DMA.
- Available when CONFIG_ZONE_DMA is enabled.
- The cpu_slabs file is read-only and displays how many cpu slabs are active and their NUMA locality.
- The file cpuslab_flush shows how many times a cache's cpu slabs have been flushed as the result of destroying or shrinking a cache, a cpu going offline, or as the result of forcing an allocation from a certain node.
- It can be written to clear the current count.
- Available when CONFIG_SLUB_STATS is enabled.
- The ctor file is read-only and specifies the cache's object constructor function, which is invoked for each object when a new slab is allocated.
- The deactivate_empty file shows how many times an empty cpu slab was deactivated.
- It can be written to clear the current count.
- Available when CONFIG_SLUB_STATS is enabled.
- The deactivate_full file shows how many times a full cpu slab was deactivated.
- It can be written to clear the current count.
- Available when CONFIG_SLUB_STATS is enabled.
- The deactivate_remote_frees file shows how many times a cpu slab has been deactivated and contained free objects that were freed remotely.
- It can be written to clear the current count.
- Available when CONFIG_SLUB_STATS is enabled.
- The deactivate_to_head file shows how many times a partial cpu slab was deactivated and added to the head of its node's partial list.
- It can be written to clear the current count.
- Available when CONFIG_SLUB_STATS is enabled.
- The deactivate_to_tail file shows how many times a partial cpu slab was deactivated and added to the tail of its node's partial list.
- It can be written to clear the current count.
- Available when CONFIG_SLUB_STATS is enabled.
- The destroy_by_rcu file is read-only and specifies whether slabs (not objects) are freed by rcu.
- The free_add_partial file shows how many times an object has been freed in a full slab so that it had to added to its node's partial list.
- It can be written to clear the current count.
- Available when CONFIG_SLUB_STATS is enabled.
- The free_calls file is read-only and lists the locations of object frees if slab debugging is enabled (see Documentation/vm/slub.txt).
- The free_fastpath file shows how many objects have been freed using the fast path because it was an object from the cpu slab.
- It can be written to clear the current count.
- Available when CONFIG_SLUB_STATS is enabled.
- The free_frozen file shows how many objects have been freed to a frozen slab (i.e. a remote cpu slab).
- It can be written to clear the current count.
- Available when CONFIG_SLUB_STATS is enabled.
- The free_remove_partial file shows how many times an object has been freed to a now-empty slab so that it had to be removed from its node's partial list.
- It can be written to clear the current count.
- Available when CONFIG_SLUB_STATS is enabled.
- The free_slab file shows how many times an empty slab has been freed back to the page allocator.
- It can be written to clear the current count.
- Available when CONFIG_SLUB_STATS is enabled.
- The free_slowpath file shows how many objects have been freed using the slow path (i.e. to a full or partial slab).
- It can be written to clear the current count.
- Available when CONFIG_SLUB_STATS is enabled.
- The hwcache_align file is read-only and specifies whether objects are aligned on cachelines.
- The min_partial file specifies how many empty slabs shall remain on a node's partial list to avoid the overhead of allocating new slabs.
- Such slabs may be reclaimed by utilizing the shrink file.
- The object_size file is read-only and specifies the cache's object size.
- The objects file is read-only and displays how many objects are active and from which nodes they are from.
- The objects_partial file is read-only and displays how many objects are on partial slabs and from which nodes they are from.
- The file objs_per_slab is read-only and specifies how many objects may be allocated from a single slab of the order specified in /sys/kernel/slab/cache/order.
- The order file specifies the page order at which new slabs are allocated.
- It is writable and can be changed to increase the number of objects per slab.
- If a slab cannot be allocated because of fragmentation, SLUB will retry with the minimum order possible depending on its characteristics.
- When debug_guardpage_minorder=N (N > 0) parameter is specified (see Documentation/admin-guide/kernel-parameters.rst), the minimum possible order is used and this sysfs entry can not be used to change the order at run time.
- The order_fallback file shows how many times an allocation of a new slab has not been possible at the cache's order and instead fallen back to its minimum possible order.
- It can be written to clear the current count.
- Available when CONFIG_SLUB_STATS is enabled.
- The partial file is read-only and displays how long many partial slabs there are and how long each node's list is.
- The poison file specifies whether objects should be poisoned when a new slab is allocated.
- The reclaim_account file specifies whether the cache's objects are reclaimable (and grouped by their mobility).
- The red_zone file specifies whether the cache's objects are red zoned.
- The file remote_node_defrag_ratio specifies the percentage of times SLUB will attempt to refill the cpu slab with a partial slab from a remote node as opposed to allocating a new slab on the local node.
- This reduces the amount of wasted memory over the entire system but can be expensive.
- Available when CONFIG_NUMA is enabled.
- The sanity_checks file specifies whether expensive checks should be performed on free and, at minimum, enables double free checks.
- Caches that enable sanity_checks cannot be merged with caches that do not.
- The shrink file is written when memory should be reclaimed from a cache.
- Empty partial slabs are freed and the partial list is sorted so the slabs with the fewest available objects are used first.
- The slab_size file is read-only and specifies the object size with metadata (debugging information and alignment) in bytes.
- The slabs file is read-only and displays how long many slabs there are (both cpu and partial) and from which nodes they are from.
- The store_user file specifies whether the location of allocation or free should be tracked for a cache.
- The total_objects file is read-only and displays how many total objects a cache has and from which nodes they are from.
- The trace file specifies whether object allocations and frees should be traced.
- Writing to the validate file causes SLUB to traverse all of its cache's objects and check the validity of metadata.
Debug Output
===============================================
BUG <slab cache affected>: <What went wrong>
-----------------------------------------------
INFO: <corruption start>-<corruption_end> <more info>
INFO: Slab <address> <slab information>
INFO: Object <address> <object information>
INFO: Allocated in <kernel function> age=<jiffies since alloc> cpu=<allocated by cpu> pid=<pid of the process>
INFO: Freed in <kernel function> age=<jiffies since free> cpu=<freed by cpu> pid=<pid of the process>
Bytes b4 <address> : <bytes>
Object <address> : <bytes>
Redzone <address> : <bytes>
Padding <address> : <bytes>
stackdump
FIX <slab cache affected>: <corrective action taken>
Tips:
1. Object allocation / free information is only available if SLAB_STORE_USER is set for the slab. slub_debug sets that option.
2. Redzone information is only available if SLAB_RED_ZONE is set.
按照以上output模板,输出到kernel的log中的数据可以分成4个部分:
[头部] - INFO区域
- 这个区域的每一行以INFO起始,基本描述都在上面output模板中标记清晰
[中间] - 内存区域
- Bytes b4:Object区域开始前一小部分内存区域,现在kernel已经用Redzone区域替代了
- Object:真正分配给task的区域,在free时填充6b
- Redzone:保护区,在free时填充bb,在malloc时填充cc
- Padding:为了下一个object对齐补位的区域,填充5a
[中间] - 调用栈区域
- 出现错误时函数调用栈
[尾部] - 问题自动处理区域
- 为了系统继续运行,slub会尝试修复被覆盖的区域,然后正常跑后续流程。注意并不是每种情况下都能修复被覆盖的内存。
例如:
参考文档
CONFIG_SLUB_DEBUG_ON
用于在编译时静态开启slub的所有debug选项,针对所有slab cache均生效
[依赖] - 如下2个配置选项
- CONFIG_SLUB
- CONFIG_SLUB_DEBUG
参考文档
KASAN
TODO
参考文档
KMEMCHECK
TODO
参考文档
KMEMLEAK
TODO
参考文档
OTHERS
TODO
参考文档