内存管理:内存分配器-slub调试


slub内存分配器是目前linux kernel中最常用的分配器,它基于slab分配器改造而来,保留了slab的设计思想、大部分的数据结构、完整的API接口,摒弃了slab的着色机制、优化了slab的管理数据结构,比slab更加简单和高效,目前已经取代slab分配器成为kernel的默认分配器。
Kernel中大部分动态的内存分配都来自slub分配器,分析kernel层面内存踩踏问题,几乎无法避免slub的调试功能。

click here back to Homepage
click here back to Category
click here back to Linux Kernel

本文着重记录slub分配器原生的debug手段


SOURCE CODE TREE

/kernel/mm/slab.h : 三个分配器本地公共头文件
/kernel/mm/slab_common.c : 三个分配器公共源码
/kernel/mm/slab.c : slab分配器源码,最开始出现的分配器,是另外两个的基础
/kernel/mm/slub.c : slub分配器源码,当前主流首选分配器
/kernel/mm/slob.c : slob分配器源码,主要适用于内存有限的嵌入式设备
/kernel/include/linux/slub_def.h : slab分配器对外头文件
/kernel/include/linux/slab_def.h : slub分配器对外头文件
/kernel/include/linux/slab.h : 三个分配器公共对外头文件,提供对外接口
/kernel/include/linux/poison.h : 其中定义了slab debug用的内存填充值


MECHANISM

算法

原理

性能

缺陷

数据结构

关键数据结构源码

数据结构关系框图

关键字段填充
① padding区域:

1
2
if (unlikely(s->flags & SLAB_POISON))
    memset(start, POISON_INUSE, PAGE_SIZE << order);

这个区域是在该slab页被分配时,直接初始化填充的
② redzone区域:

③ object区域:

API

TODO

参考文档

kernel vm slub


RUNNING DEBUG

Debug Feature

Feature的特点
[动态/静态] - slub可以动态开/关该feature而不必重新编译kernel(此时一般slub的调试feature默认是关闭状态),也可以静态开启该feature;而相比之下,slab开启调试则需要重新编译kernel

Tips:
1. All options can be set on a slab that does not contain objects.
2. If the slab already contains objects then sanity checks and tracing may only be enabled. The other options may cause the realignment of objects.
3. 以上,基本上可以通过尝试用echo修改"运行时debug节点"小节各个节点的值来确认,不能修改的一定是无法动态开关的

[非耦合] - slub可以只针对选中的某个slab cache开/关调试,而不影响其它slab cache的运行
Feature的开启
[on runtime] - 可在/sys/kernel/slab/<slab name>路径动态开启

  • 具体见下面"运行时debug节点"
  • echo值(0,1,其它)到对应的节点下

[on compile] - 可在编译时通过CONFIG_SLUB_DEBUG_ON开启

  • 具体见下面"CONFIG_SLUB_DEBUG_ON"一节

[on boot] - 也可在kernel command line可携带字符串参数开启

  • "slub_debug" : 为所有slab cache开启全部的debug选项
  • "slub_debug=<Debug-Options>" : 为所有slub cache开启该选项
  • "slub_debug=<Debug-Options>,<slab name>" : 为选中的slub cache开启该选项
Debug-Options description
F Sanity checks on (enables SLAB_DEBUG_CONSISTENCY_CHECKS Sorry SLAB legacy issues)
Z Red zoning
P Poisoning (object and padding)
U User tracking (free and alloc)
T Trace (please only use on single slabs)
A Toggle failslab filter mark for the cache
O Switch debugging off for caches that would have caused higher minimum slab orders
- Switch all debugging off (useful if the kernel is configured with CONFIG_SLUB_DEBUG_ON)
例如:
slub_debug=FZ : to boot just with sanity checks and red zoning
slub_debug=,dentry : to find an issue in the dentry cache

Debug Tools

slabinfo的使用
[编译] - 默认编译kernel时,并不编译该工具,需要可以自行编译

  • 代码路径:/kernel/tools/vm/slabinfo.c
  • 编译 $(CROSS_COMPILE)gcc -Wall -Wextra -I../lib/ -o slabinfo slabinfo.c ../lib/api/libapi.a

Tips:或者直接cd到代码目录下设置好CROSS_COMPILE变量后 make slabinfo

[使用] - 可以获取slub cahe的统计数据,以及对slub cache的debug节点执行操,如Tips:通过slabinfo -h查看具体用法

  • 默认情况命令只返回有数据的slub cache的信息
  • 其中某些命令的执行,需要在kernel command line中开启slub debug.比如:未开启slab debug on时,获取不到stack tracking的信息
例如:
ace.an@tjandroid02:~/sprdroid8.1_trunk/kernel/tools/vm$ ./slabinfo -h
slabinfo 4/15/2011. (c) 2007 sgi/(c) 2011 Linux Foundation.

slabinfo [-ahnpvtsz] [-d debugopts] [slab-regexp]
-a|--aliases           Show aliases
-A|--activity          Most active slabs first
-d<options>|--debug=<options> Set/Clear Debug options
-D|--display-active    Switch line format to activity
-e|--empty             Show empty slabs
-f|--first-alias       Show first alias
-h|--help              Show usage information
-i|--inverted          Inverted list
-l|--slabs             Show slabs
-n|--numa              Show NUMA information
-o|--ops                Show kmem_cache_ops
-s|--shrink            Shrink slabs
-r|--report             Detailed report on single slabs
-S|--Size              Sort by size
-t|--tracking          Show alloc/free information
-T|--Totals            Show summary information
-v|--validate          Validate slabs
-z|--zero              Include empty slabs
-1|--1ref              Single reference
-N|--lines=K           Show the first K slabs
-L|--Loss              Sort by loss
-X|--Xtotals           Show extended summary information
-B|--Bytes             Show size in bytes

Valid debug options (FZPUT may be combined)
a / A          Switch on all debug options (=FZUP)
-              Switch off all debug options
f / F          Sanity Checks (SLAB_DEBUG_FREE)
z / Z          Redzoning
p / P          Poisoning
u / U          Tracking
t / T          Tracing

Slab Cache Debug Node

/sys/kernel/slab

  • The /sys/kernel/slab directory contains a snapshot of the internal state of the SLUB allocator for each cache.
  • Certain files may be modified to change the behavior of the cache (and any cache it aliases, if any).

Tips: 有该目录表示开启 slub分配器,该目录下各种目录都是并行的不同类型的slub分配器

/sys/kernel/slab/<slab name>/aliases

  • The aliases file is read-only and specifies how many caches have merged into this cache.

/sys/kernel/slab/<slab name>/align

  • The align file is read-only and specifies the cache's object alignment in bytes.

/sys/kernel/slab/<slab name>/alloc_calls

  • The alloc_calls file is read-only and lists the kernel code locations from which allocations for this cache were performed.
  • The alloc_calls file only contains information if debugging is enabled for that cache (see Documentation/vm/slub.txt).

/sys/kernel/slab/<slab name>/alloc_fastpath

  • The alloc_fastpath file shows how many objects have been allocated using the fast path.
  • It can be written to clear the current count.
  • Available when CONFIG_SLUB_STATS is enabled.

/sys/kernel/slab/<slab name>/alloc_from_partial

  • The alloc_from_partial file shows how many times a cpu slab has been full and it has been refilled by using a slab from the list of partially used slabs.
  • It can be written to clear the current count.
  • Available when CONFIG_SLUB_STATS is enabled.

/sys/kernel/slab/<slab name>/alloc_refill

  • The alloc_refill file shows how many times the per-cpu freelist was empty but there were objects available as the result of remote cpu frees.
  • It can be written to clear the current count.
  • Available when CONFIG_SLUB_STATS is enabled.

/sys/kernel/slab/<slab name>/alloc_slab

  • The alloc_slab file is shows how many times a new slab had to be allocated from the page allocator.
  • It can be written to clear the current count.
  • Available when CONFIG_SLUB_STATS is enabled.

/sys/kernel/slab/<slab name>/alloc_slowpath

  • The alloc_slowpath file shows how many objects have been allocated using the slow path because of a refill or allocation from a partial or new slab.
  • It can be written to clear the current count.
  • Available when CONFIG_SLUB_STATS is enabled.

/sys/kernel/slab/<slab name>/cache_dma

  • The cache_dma file is read-only and specifies whether objects are from ZONE_DMA.
  • Available when CONFIG_ZONE_DMA is enabled.

/sys/kernel/slab/<slab name>/cpu_slabs

  • The cpu_slabs file is read-only and displays how many cpu slabs are active and their NUMA locality.

/sys/kernel/slab/<slab name>/cpuslab_flush

  • The file cpuslab_flush shows how many times a cache's cpu slabs have been flushed as the result of destroying or shrinking a cache, a cpu going offline, or as the result of forcing an allocation from a certain node.
  • It can be written to clear the current count.
  • Available when CONFIG_SLUB_STATS is enabled.

/sys/kernel/slab/<slab name>/ctor

  • The ctor file is read-only and specifies the cache's object constructor function, which is invoked for each object when a new slab is allocated.

/sys/kernel/slab/<slab name>/deactivate_empty

  • The deactivate_empty file shows how many times an empty cpu slab was deactivated.
  • It can be written to clear the current count.
  • Available when CONFIG_SLUB_STATS is enabled.

/sys/kernel/slab/<slab name>/deactivate_full

  • The deactivate_full file shows how many times a full cpu slab was deactivated.
  • It can be written to clear the current count.
  • Available when CONFIG_SLUB_STATS is enabled.

/sys/kernel/slab/<slab name>/deactivate_remote_frees

  • The deactivate_remote_frees file shows how many times a cpu slab has been deactivated and contained free objects that were freed remotely.
  • It can be written to clear the current count.
  • Available when CONFIG_SLUB_STATS is enabled.

/sys/kernel/slab/<slab name>/deactivate_to_head

  • The deactivate_to_head file shows how many times a partial cpu slab was deactivated and added to the head of its node's partial list.
  • It can be written to clear the current count.
  • Available when CONFIG_SLUB_STATS is enabled.

/sys/kernel/slab/<slab name>/deactivate_to_tail

  • The deactivate_to_tail file shows how many times a partial cpu slab was deactivated and added to the tail of its node's partial list.
  • It can be written to clear the current count.
  • Available when CONFIG_SLUB_STATS is enabled.

/sys/kernel/slab/<slab name>/destroy_by_rcu

  • The destroy_by_rcu file is read-only and specifies whether slabs (not objects) are freed by rcu.

/sys/kernel/slab/<slab name>/free_add_partial

  • The free_add_partial file shows how many times an object has been freed in a full slab so that it had to added to its node's partial list.
  • It can be written to clear the current count.
  • Available when CONFIG_SLUB_STATS is enabled.

/sys/kernel/slab/<slab name>/free_calls

  • The free_calls file is read-only and lists the locations of object frees if slab debugging is enabled (see Documentation/vm/slub.txt).

/sys/kernel/slab/<slab name>/free_fastpath

  • The free_fastpath file shows how many objects have been freed using the fast path because it was an object from the cpu slab.
  • It can be written to clear the current count.
  • Available when CONFIG_SLUB_STATS is enabled.

/sys/kernel/slab/<slab name>/free_frozen

  • The free_frozen file shows how many objects have been freed to a frozen slab (i.e. a remote cpu slab).
  • It can be written to clear the current count.
  • Available when CONFIG_SLUB_STATS is enabled.

/sys/kernel/slab/<slab name>/free_remove_partial

  • The free_remove_partial file shows how many times an object has been freed to a now-empty slab so that it had to be removed from its node's partial list.
  • It can be written to clear the current count.
  • Available when CONFIG_SLUB_STATS is enabled.

/sys/kernel/slab/<slab name>/free_slab

  • The free_slab file shows how many times an empty slab has been freed back to the page allocator.
  • It can be written to clear the current count.
  • Available when CONFIG_SLUB_STATS is enabled.

/sys/kernel/slab/<slab name>/free_slowpath

  • The free_slowpath file shows how many objects have been freed using the slow path (i.e. to a full or partial slab).
  • It can be written to clear the current count.
  • Available when CONFIG_SLUB_STATS is enabled.

/sys/kernel/slab/<slab name>/hwcache_align

  • The hwcache_align file is read-only and specifies whether objects are aligned on cachelines.

/sys/kernel/slab/<slab name>/min_partial

  • The min_partial file specifies how many empty slabs shall remain on a node's partial list to avoid the overhead of allocating new slabs.
  • Such slabs may be reclaimed by utilizing the shrink file.

/sys/kernel/slab/<slab name>/object_size

  • The object_size file is read-only and specifies the cache's object size.

/sys/kernel/slab/<slab name>/objects

  • The objects file is read-only and displays how many objects are active and from which nodes they are from.

/sys/kernel/slab/<slab name>/objects_partial

  • The objects_partial file is read-only and displays how many objects are on partial slabs and from which nodes they are from.

/sys/kernel/slab/<slab name>/objs_per_slab

  • The file objs_per_slab is read-only and specifies how many objects may be allocated from a single slab of the order specified in /sys/kernel/slab/cache/order.

/sys/kernel/slab/<slab name>/order

  • The order file specifies the page order at which new slabs are allocated.
  • It is writable and can be changed to increase the number of objects per slab.
  • If a slab cannot be allocated because of fragmentation, SLUB will retry with the minimum order possible depending on its characteristics.
  • When debug_guardpage_minorder=N (N > 0) parameter is specified (see Documentation/admin-guide/kernel-parameters.rst), the minimum possible order is used and this sysfs entry can not be used to change the order at run time.

/sys/kernel/slab/<slab name>/order_fallback

  • The order_fallback file shows how many times an allocation of a new slab has not been possible at the cache's order and instead fallen back to its minimum possible order.
  • It can be written to clear the current count.
  • Available when CONFIG_SLUB_STATS is enabled.

/sys/kernel/slab/<slab name>/partial

  • The partial file is read-only and displays how long many partial slabs there are and how long each node's list is.

/sys/kernel/slab/<slab name>/poison

  • The poison file specifies whether objects should be poisoned when a new slab is allocated.

/sys/kernel/slab/<slab name>/reclaim_account

  • The reclaim_account file specifies whether the cache's objects are reclaimable (and grouped by their mobility).

/sys/kernel/slab/<slab name>/red_zone

  • The red_zone file specifies whether the cache's objects are red zoned.

/sys/kernel/slab/<slab name>/remote_node_defrag_ratio

  • The file remote_node_defrag_ratio specifies the percentage of times SLUB will attempt to refill the cpu slab with a partial slab from a remote node as opposed to allocating a new slab on the local node.
  • This reduces the amount of wasted memory over the entire system but can be expensive.
  • Available when CONFIG_NUMA is enabled.

/sys/kernel/slab/<slab name>/sanity_checks

  • The sanity_checks file specifies whether expensive checks should be performed on free and, at minimum, enables double free checks.
  • Caches that enable sanity_checks cannot be merged with caches that do not.

/sys/kernel/slab/<slab name>/shrink

  • The shrink file is written when memory should be reclaimed from a cache.
  • Empty partial slabs are freed and the partial list is sorted so the slabs with the fewest available objects are used first.

/sys/kernel/slab/<slab name>/slab_size

  • The slab_size file is read-only and specifies the object size with metadata (debugging information and alignment) in bytes.

/sys/kernel/slab/<slab name>/slabs

  • The slabs file is read-only and displays how long many slabs there are (both cpu and partial) and from which nodes they are from.

/sys/kernel/slab/<slab name>/store_user

  • The store_user file specifies whether the location of allocation or free should be tracked for a cache.

/sys/kernel/slab/<slab name>/total_objects

  • The total_objects file is read-only and displays how many total objects a cache has and from which nodes they are from.

/sys/kernel/slab/<slab name>/trace

  • The trace file specifies whether object allocations and frees should be traced.

/sys/kernel/slab/<slab name>/validate

  • Writing to the validate file causes SLUB to traverse all of its cache's objects and check the validity of metadata.

Debug Output

Output数据模板

===============================================
BUG <slab cache affected>: <What went wrong>
-----------------------------------------------

INFO: <corruption start>-<corruption_end> <more info>
INFO: Slab <address> <slab information>
INFO: Object <address> <object information>
INFO: Allocated in <kernel function> age=<jiffies since alloc> cpu=<allocated by cpu> pid=<pid of the process>
INFO: Freed in <kernel function> age=<jiffies since free> cpu=<freed by cpu> pid=<pid of the process>

Bytes b4 <address> : <bytes>
Object <address> : <bytes>
Redzone <address> : <bytes>
Padding <address> : <bytes>

stackdump

FIX <slab cache affected>: <corrective action taken>

Tips:
1. Object allocation / free information is only available if SLAB_STORE_USER is set for the slab. slub_debug sets that option.
2. Redzone information is only available if SLAB_RED_ZONE is set.

Output数据解析
按照以上output模板,输出到kernel的log中的数据可以分成4个部分:
[头部] - INFO区域

  • 这个区域的每一行以INFO起始,基本描述都在上面output模板中标记清晰

[中间] - 内存区域

  • Bytes b4:Object区域开始前一小部分内存区域,现在kernel已经用Redzone区域替代了
  • Object:真正分配给task的区域,在free时填充6b
  • Redzone:保护区,在free时填充bb,在malloc时填充cc
  • Padding:为了下一个object对齐补位的区域,填充5a

[中间] - 调用栈区域

  • 出现错误时函数调用栈

[尾部] - 问题自动处理区域

  • 为了系统继续运行,slub会尝试修复被覆盖的区域,然后正常跑后续流程。注意并不是每种情况下都能修复被覆盖的内存。
例如:

参考文档

kernel vm slub
kernel ABI test


CONFIG_SLUB_DEBUG_ON

用于在编译时静态开启slub的所有debug选项,针对所有slab cache均生效
[依赖] - 如下2个配置选项

  • CONFIG_SLUB
  • CONFIG_SLUB_DEBUG

参考文档

kernel vm slub


KASAN

TODO

feature

参考文档

kernel vm slub


KMEMCHECK

TODO

feature

参考文档

kernel vm slub


KMEMLEAK

TODO

feature

参考文档

kernel vm slub


OTHERS

TODO

feature

参考文档

kernel vm slub

@2018-01-29 15:55
Comments
Write a Comment