在描述进程的结构体task_struct中有一个类型为css_set的成员cgroups,它描述了进程的所有cgroup信息,从前面的分析文章中我们已经知道通过task_struct->cgroups可以找到进程的所有不同cgroup控制器的信息。

当我们新创建一个进程时,新进程的task_struct->cgroups的值继承自其父进程。此后,如果我们将新创建的进程添加到一个新的cgroup中时,就需要重新给task_struct->cgroups赋值,这个值要么是一个已经存在的css_set结构的指针,要么是新创建的css_set的结构的指针。

所以,我们就需要通过进程的cgroup信息,快速查找其对应的css_set结构是否存在,如果不存在的话就去创建它。linux kernel 使用一个hash表来完成这个工作。本文主要分析该hash表的实现。

注意:本文基于3.10.0-862.el7.x86_64版本kernel进行分析。

数据结构

1
2
3
4
5
6
7
/*
 * hash table for cgroup groups. This improves the performance to find
 * an existing css_set. This hash doesn't (currently) take into
 * account cgroups in empty hierarchies.
 */
#define CSS_SET_HASH_BITS       7
static DEFINE_HASHTABLE(css_set_table, CSS_SET_HASH_BITS);

其中CSS_SET_HASH_BITS的值为7,所以定义的css_set_table这个hash数组由128个元素(1«7)。

1
2
crash> whatis css_set_table
struct hlist_head css_set_table[128];

css_set有成员hlist,用来将具有相同hash值的css_set链接到一起,如下:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
struct css_set {

	...
        /*
         * List running through all cgroup groups in the same hash
         * slot. Protected by css_set_lock
         */
        struct hlist_node hlist;
	...
};

hash 方法

此种场景下,如果进程的cgroup信息都相同的话,他们的css_set也应该相同,所以css_sethash方法如下:

使用各个cgroupcgroup_subsys_state的地址来计算hash值,计算方法如下:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
static unsigned long css_set_hash(struct cgroup_subsys_state *css[])
{
        int i;
        unsigned long key = 0UL;

        for (i = 0; i < CGROUP_SUBSYS_COUNT; i++)
                key += (unsigned long)css[i];
        key = (key >> 16) ^ key;

        return key;
}

注意:这里的返回值key肯定是一个大于127的数字,而我们的css_set_table数组只有128个,其实不用担心,再hash_add方法中会自动根据hash数组的大小进行调整的。

向css_set_table中添加css_set

当系统初始化时,需要将init_css_set添加到css_set_table中:

1
2
3
        /* Add init_css_set to the hash table */
        key = css_set_hash(init_css_set.subsys);
        hash_add(css_set_table, &init_css_set.hlist, key);  

find_css_set函数中,如果没有找到css_set,新建一个新的css_set后,需要将这个新的css_set添加到css_set_table

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
/*
 * find_css_set() takes an existing cgroup group and a
 * cgroup object, and returns a css_set object that's
 * equivalent to the old group, but with the given cgroup
 * substituted into the appropriate hierarchy. Must be called with
 * cgroup_mutex held
 */
static struct css_set *find_css_set(
        struct css_set *oldcg, struct cgroup *cgrp)
{
	....
        read_lock(&css_set_lock);
        res = find_existing_css_set(oldcg, cgrp, template);
        if (res)
                get_css_set(res);
        read_unlock(&css_set_lock);
	// 找到了直接返回
        if (res)
                return res;

	// 没有找到,新创建一个css_set
        res = kmalloc(sizeof(*res), GFP_KERNEL);
        if (!res)
                return NULL;

	...
	...
        css_set_count++;

        /* Add this cgroup group to the hash table */
        key = css_set_hash(res->subsys);
        hash_add(css_set_table, &res->hlist, key);

        write_unlock(&css_set_lock);

        return res;
}

cgroup_load_subsys函数中,由于添加了新的cgroup子系统,导致系统上所有css_set的hash值都发生了变化,都必须重新计算css_set的hash,并添加到适当的hash数组中。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
/**
 * cgroup_load_subsys: load and register a modular subsystem at runtime
 * @ss: the subsystem to load
 *
 * This function should be called in a modular subsystem's initcall. If the
 * subsystem is built as a module, it will be assigned a new subsys_id and set
 * up for use. If the subsystem is built-in anyway, work is delegated to the
 * simpler cgroup_init_subsys.
 */
int __init_or_module cgroup_load_subsys(struct cgroup_subsys *ss)
{
	...
        write_lock(&css_set_lock);
        hash_for_each_safe(css_set_table, i, tmp, cg, hlist) {
                /* skip entries that we already rehashed */
                if (cg->subsys[ss->subsys_id])
                        continue;
                /* remove existing entry */
                hash_del(&cg->hlist); // 删除旧的
                /* set new value */
                cg->subsys[ss->subsys_id] = css;
                /* recompute hash and restore entry */
                key = css_set_hash(cg->subsys); // 添加新的
                hash_add(css_set_table, &cg->hlist, key);
        }
        write_unlock(&css_set_lock);
	...
}

cgroup_unload_subsys的情况跟cgroup_load_subsys类似。

删除css_set_table中的css_set

当一个cgroup的最后一个进程退出时,该进程对应的css_set就没有用了,需要从css_set_table中删除,其实现位于函数__put_css_set:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
static void __put_css_set(struct css_set *cg, int taskexit)                                                                                    
{
        struct cg_cgroup_link *link;
        struct cg_cgroup_link *saved_link;
        /*
         * Ensure that the refcount doesn't hit zero while any readers
         * can see it. Similar to atomic_dec_and_lock(), but for an
         * rwlock
         */
        if (atomic_add_unless(&cg->refcount, -1, 1))
                return;
        write_lock(&css_set_lock);
        if (!atomic_dec_and_test(&cg->refcount)) {
                write_unlock(&css_set_lock);
                return;
        }

        /* This css_set is dead. unlink it and release cgroup refcounts */
        hash_del(&cg->hlist);
        css_set_count--;
	...
	...
}

在css_set_table查找一个css_set

find_existing_css_set函数用于查找一个css_se是否存在,这个函数的参考有点怪,我们仔细分析一下:

  • oldcg为该进程原来的css_set
  • cgrp为要将进程添加进这个cgroup
  • template为临时变量,用来传递进程最终的cgroup信息
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
/*
 * find_existing_css_set() is a helper for
 * find_css_set(), and checks to see whether an existing
 * css_set is suitable.
 *
 * oldcg: the cgroup group that we're using before the cgroup
 * transition
 *
 * cgrp: the cgroup that we're moving into
 *
 * template: location in which to build the desired set of subsystem
 * state objects for the new cgroup group
 */
static struct css_set *find_existing_css_set(
        struct css_set *oldcg,
        struct cgroup *cgrp,
        struct cgroup_subsys_state *template[])
{
        int i;
        struct cgroupfs_root *root = cgrp->root;
        struct css_set *cg;
        unsigned long key;

        /*
         * Build the set of subsystem state objects that we want to see in the
         * new css_set. while subsystems can change globally, the entries here
         * won't change, so no need for locking.
         */
        for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
                if (root->subsys_mask & (1UL << i)) {
                        /* Subsystem is in this hierarchy. So we want
                         * the subsystem state from the new
                         * cgroup */
                        template[i] = cgrp->subsys[i];
                } else {
                        /* Subsystem is not in this hierarchy, so we
                         * don't want to change the subsystem state */
                        template[i] = oldcg->subsys[i];
                }
        }

        key = css_set_hash(template);
        hash_for_each_possible(css_set_table, cg, hlist, key) {
                if (!compare_css_sets(cg, oldcg, cgrp, template))
                        continue;

                /* This css_set matches what we need */
                return cg;
        }

        /* No existing cgroup group matched */
        return NULL;
}

在查找过程中,如果判断两个css_set相等呢?这个重任就交给了函数compare_css_sets:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
/*
 * compare_css_sets - helper function for find_existing_css_set().
 * @cg: candidate css_set being tested
 * @old_cg: existing css_set for a task
 * @new_cgrp: cgroup that's being entered by the task
 * @template: desired set of css pointers in css_set (pre-calculated)
 *
 * Returns true if "cg" matches "old_cg" except for the hierarchy
 * which "new_cgrp" belongs to, for which it should match "new_cgrp".
 */
static bool compare_css_sets(struct css_set *cg,
                             struct css_set *old_cg,
                             struct cgroup *new_cgrp,
                             struct cgroup_subsys_state *template[])
{
        struct list_head *l1, *l2;
	// 判断cgroup_subsys_state是否完全相等
        if (memcmp(template, cg->subsys, sizeof(cg->subsys))) {
                /* Not all subsystems matched */
                return false;
        }

        /*
         * Compare cgroup pointers in order to distinguish between
         * different cgroups in heirarchies with no subsystems. We
         * could get by with just this check alone (and skip the
         * memcmp above) but on most setups the memcmp check will
         * avoid the need for this more expensive check on almost all
         * candidates.
         */
	// 进一步严格判断各个链表的长度和其指向的值是否一致
        l1 = &cg->cg_links;
        l2 = &old_cg->cg_links;
        while (1) {
                struct cg_cgroup_link *cgl1, *cgl2;
                struct cgroup *cg1, *cg2;

                l1 = l1->next;
                l2 = l2->next;
                /* See if we reached the end - both lists are equal length. */
                if (l1 == &cg->cg_links) {
                        BUG_ON(l2 != &old_cg->cg_links);
                        break;
                } else {
                        BUG_ON(l2 == &old_cg->cg_links);
                }
                /* Locate the cgroups associated with these links. */
                cgl1 = list_entry(l1, struct cg_cgroup_link, cg_link_list);
                cgl2 = list_entry(l2, struct cg_cgroup_link, cg_link_list);
                cg1 = cgl1->cgrp;
                cg2 = cgl2->cgrp;
                /* Hierarchies should be linked in the same order. */
                BUG_ON(cg1->root != cg2->root);

                /*
                 * If this hierarchy is the hierarchy of the cgroup
                 * that's changing, then we need to check that this
                 * css_set points to the new cgroup; if it's any other
                 * hierarchy, then this css_set should point to the
                 * same cgroup as the old css_set.
                 */
                if (cg1->root == new_cgrp->root) {
                        if (cg1 != new_cgrp)
                                return false;
                } else {
                        if (cg1 != cg2)
                                return false;
                }
        }
        return true;
}

通过crash可以查看css_set_table

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
crash> p css_set_count
css_set_count = $4 = 42
crash> p css_set_table
css_set_table = $5 = 
 { {
    first = 0x0
  }, {
    first = 0x0
  }, {
    first = 0x0
  }, {
    first = 0x0
  }, {
    first = 0x0
  }, {
    first = 0x0
  }, {
    first = 0x0
  }, {
    first = 0xffffa0b5d43c90c8
  }, {
    first = 0x0
  }, {
    first = 0x0
  }, {
    first = 0xffffa0b5d6915248
  }, {
    first = 0x0
  }, {
    first = 0x0
  }, {
    first = 0x0
  }, {
    first = 0x0
  }, {
    first = 0x0
  }, {
    first = 0x0
  }, {
    first = 0x0
  }, {
    first = 0xffffa0b52042bb48
  }, {
    first = 0x0
  }, {
    first = 0xffffa0b5d1862f08
  }, {
    first = 0x0
  }, {
    first = 0xffffa0b5d6ac2008
  }, {
    first = 0xffffa0b5d43c9a88
  }, {
    first = 0x0
  }, {
    first = 0x0
  }, {
    first = 0x0
  }, {
    first = 0xffffa0b5d18629c8
  }, {
    first = 0x0
  }, {
    first = 0xffffa0b5205e7cc8
  }, {
    first = 0x0
  }, {
    first = 0xffffa0b612593008
  }, {
    first = 0x0
  }, {
    first = 0x0
  }, {
    first = 0xffffa0b5d1862188
  }, {
    first = 0x0
  }, {
    first = 0x0
  }, {
    first = 0xffffa0b5bf085f08
  }, {
    first = 0x0
  }, {
    first = 0xffffa0b5d1862548
  }, {
    first = 0x0
  }, {
    first = 0x0
  }, {
    first = 0xffffa0b5d6ac26c8
  }, {
    first = 0x0
  }, {
    first = 0x0
  }, {
    first = 0xffffa0b5d83cd308
  }, {
    first = 0xffffa0b5d09e4848
  }, {
    first = 0x0
  }, {
    first = 0xffffa0b535d28cc8
  }, {
    first = 0x0
  }, {
    first = 0x0
  }, {
    first = 0xffffa0b5d185c6c8
  }, {
    first = 0xffffa0b616f0d488
  }, {
    first = 0xffffa0b5205e70c8
  }, {
    first = 0x0
  }, {
    first = 0xffffa0b5d43c9248
  }, {
    first = 0x0
  }, {
    first = 0x0
  }, {
    first = 0x0
  }, {
    first = 0x0
  }, {
    first = 0x0
  }, {
    first = 0xffffa0b5cd167a88
  }, {
    first = 0xffffa0b5db76ce48
  }, {
    first = 0x0
  }, {
    first = 0x0
  }, {
    first = 0x0
  }, {
    first = 0x0
  }, {
    first = 0xffffa0b5d185c488
  }, {
    first = 0x0
  }, {
    first = 0x0
  }, {
    first = 0x0
  }, {
    first = 0xffffa0b5d43c9d88
  }, {
    first = 0x0
  }, {
    first = 0x0
  }, {
    first = 0x0
  }, {
    first = 0x0
  }, {
    first = 0x0
  }, {
    first = 0xffffa0b5db59d788
  }, {
    first = 0xffffa0b616922248
  }, {
    first = 0x0
  }, {
    first = 0x0
  }, {
    first = 0xffffa0b5d43c93c8
  }, {
    first = 0x0
  }, {
    first = 0x0
  }, {
    first = 0xffffa0b5d1862d88
  }, {
    first = 0x0
  }, {
    first = 0x0
  }, {
    first = 0xffffa0b5d4332248
  }, {
    first = 0x0
  }, {
    first = 0x0
  }, {
    first = 0x0
  }, {
    first = 0x0
  }, {
    first = 0x0
  }, {
    first = 0x0
  }, {
    first = 0x0
  }, {
    first = 0x0
  }, {
    first = 0x0
  }, {
    first = 0x0
  }, {
    first = 0x0
  }, {
    first = 0x0
  }, {
    first = 0x0
  }, {
    first = 0x0
  }, {
    first = 0x0
  }, {
    first = 0x0
  }, {
    first = 0x0
  }, {
    first = 0x0
  }, {
    first = 0x0
  }, {
    first = 0x0
  }, {
    first = 0x0
  }, {
    first = 0x0
  }, {
    first = 0x0
  }, {
    first = 0xffffa0b5d1862b48
  }, {
    first = 0xffffa0b5d18626c8
  }, {
    first = 0xffffa0b5d5d46308
  }, {
    first = 0x0
  }, {
    first = 0x0
  }, {
    first = 0x0
  }, {
    first = 0x0
  }, {
    first = 0x0
  }, {
    first = 0x0
  }, {
    first = 0x0
  }, {
    first = 0x0
  }, {
    first = 0xffffa0b5bcbb23c8
  }, {
    first = 0xffffa0b5d43c9548
  }, {
    first = 0x0
  }, {
    first = 0xffffa0b5d1862848
  }, {
    first = 0x0
  }, {
    first = 0x0
  } }
crash>