Discussion:
kernel BUG at fs/dcache.c:2105 (__d_rehash(): BUG_ON(!d_unhashed(entry)))
Banerjee, Debabrata
2012-09-17 22:53:16 UTC
Permalink
Hello, we're seeing this bug quite often (50 per day over 1500 machines,
however the parent process executes in a few seconds every 4 hours). The
process should be merging a freshly untar'd directory with ~8500 files in
it via rename, where most of the files stay the same (but are clobbered by
rename), on ext2. I'm attempting to isolate the problem in a clean
environment.

[229978.861098] ------------[ cut here ]------------
[229978.862013] kernel BUG at fs/dcache.c:2105!
[229978.862013] invalid opcode: 0000 [#1] SMP
[229978.873082] CPU 1

[229978.873082]
[229978.873082] Pid: 11817, comm: xxxxxxxx. Not tainted 3.0.30-3.0.1-amd64
#1
[229978.873082] RIP: 0010:[<ffffffff81139ab2>] [<ffffffff81139ab2>]
__d_rehash+0x52/0x60
[229978.873082] RSP: 0018:ffff8800f743dce8 EFLAGS: 00010286
[229978.873082] RAX: 018721dffc2c27d2 RBX: ffff8800f48ac6c0 RCX:
0000000000000013
[229978.873082] RDX: 000013f61bc3e0c4 RSI: ffffc900002fa850 RDI:
ffff8800f48ac6c0
[229978.873082] RBP: ffff8800f743dce8 R08: ffffea0002438868 R09:
ffffea0002438868
[229978.873082] R10: ffff8800e39f6fec R11: 0000000000000246 R12:
ffff8800234e3500
[229978.873082] R13: ffff8800234e3590 R14: ffff8800f48ac750 R15:
ffff880076557408
[229978.873082] FS: 00007fdd33ef96d0(0000) GS:ffff88007fd00000(0063)
knlGS:00000000f75588c0
[229978.873082] CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033
[229978.873082] CR2: 00000000f72ed000 CR3: 000000007b884000 CR4:
00000000000006e0
[229978.873082] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[229978.873082] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400
[229978.873082] Process xxxxxxxx. (pid: 11817, threadinfo
ffff8800f743c000, task ffff8800f2bda8c0)
[229978.873082] Stack:
[229978.873082] ffff8800f743dd18 ffffffff8113aa6b ffff8800234e3500
ffff8800f48ac6c0
[229978.873082] ffff8800f48ac6c0 ffff8800641bda28 ffff8800f743dd38
ffffffff8113ac3a
[229978.873082] ffff8800234e3500 0000000000000000 ffff8800f743ddb8
ffffffff8113224b
[229978.873082] Call Trace:
[229978.873082] [<ffffffff8113aa6b>] __d_move+0xbb/0x250
[229978.873082] [<ffffffff8113ac3a>] d_move+0x3a/0x60
[229978.873082] [<ffffffff8113224b>] vfs_rename+0x3cb/0x3e0
[229978.873082] [<ffffffff81130119>] ? __lookup_hash+0xd9/0x160
[229978.873082] [<ffffffff811345a3>] sys_renameat+0x243/0x260
[229978.873082] [<ffffffff81169ceb>] ? compat_filldir64+0xab/0xe0
[229978.873082] [<ffffffff811a72b8>] ? ext2_readdir+0x228/0x2e0
[229978.873082] [<ffffffff81169c40>] ? compat_filldir+0x100/0x100
[229978.873082] [<ffffffff81169c40>] ? compat_filldir+0x100/0x100
[229978.873082] [<ffffffff81136f0a>] ? vfs_readdir+0x9a/0xd0
[229978.873082] [<ffffffff811345db>] sys_rename+0x1b/0x20
[229978.873082] [<ffffffff814bd42c>] cstar_dispatch+0x7/0x32
[229978.873082] Code: e0 fe 48 85 c0 48 89 47 08 74 04 48 89 50 08 48 89
72 08 48 83 ca 01 48 89 16 0f ba 36 00 c9 c3 f3 90 48 8b 06 a8 01 75 f7 eb
be <0f> 0b eb fe 66 2e 0f 1f 84 00 00 00 00 00
55 48 89 e5 66 66 66
[229978.873082] RIP [<ffffffff81139ab2>] __d_rehash+0x52/0x60
[229978.873082] RSP <ffff8800f743dce8>
[229979.157156] ---[ end trace f2460e13ceb17f51 ]---

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
J. Bruce Fields
2012-09-18 20:04:08 UTC
Permalink
Post by Banerjee, Debabrata
Hello, we're seeing this bug quite often (50 per day over 1500 machines,
however the parent process executes in a few seconds every 4 hours). The
process should be merging a freshly untar'd directory with ~8500 files in
it via rename, where most of the files stay the same (but are clobbered by
rename), on ext2. I'm attempting to isolate the problem in a clean
environment.
[229978.861098] ------------[ cut here ]------------
[229978.862013] kernel BUG at fs/dcache.c:2105!
[229978.862013] invalid opcode: 0000 [#1] SMP
[229978.873082] CPU 1
[229978.873082]
[229978.873082] Pid: 11817, comm: xxxxxxxx. Not tainted 3.0.30-3.0.1-amd64
What's 3.0.30-3.0.1? Is this reproduceable with an upstream kernel?

--b.
Post by Banerjee, Debabrata
#1
[229978.873082] RIP: 0010:[<ffffffff81139ab2>] [<ffffffff81139ab2>]
__d_rehash+0x52/0x60
[229978.873082] RSP: 0018:ffff8800f743dce8 EFLAGS: 00010286
0000000000000013
ffff8800f48ac6c0
ffffea0002438868
ffff8800234e3500
ffff880076557408
[229978.873082] FS: 00007fdd33ef96d0(0000) GS:ffff88007fd00000(0063)
knlGS:00000000f75588c0
[229978.873082] CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033
00000000000006e0
0000000000000000
0000000000000400
[229978.873082] Process xxxxxxxx. (pid: 11817, threadinfo
ffff8800f743c000, task ffff8800f2bda8c0)
[229978.873082] ffff8800f743dd18 ffffffff8113aa6b ffff8800234e3500
ffff8800f48ac6c0
[229978.873082] ffff8800f48ac6c0 ffff8800641bda28 ffff8800f743dd38
ffffffff8113ac3a
[229978.873082] ffff8800234e3500 0000000000000000 ffff8800f743ddb8
ffffffff8113224b
[229978.873082] [<ffffffff8113aa6b>] __d_move+0xbb/0x250
[229978.873082] [<ffffffff8113ac3a>] d_move+0x3a/0x60
[229978.873082] [<ffffffff8113224b>] vfs_rename+0x3cb/0x3e0
[229978.873082] [<ffffffff81130119>] ? __lookup_hash+0xd9/0x160
[229978.873082] [<ffffffff811345a3>] sys_renameat+0x243/0x260
[229978.873082] [<ffffffff81169ceb>] ? compat_filldir64+0xab/0xe0
[229978.873082] [<ffffffff811a72b8>] ? ext2_readdir+0x228/0x2e0
[229978.873082] [<ffffffff81169c40>] ? compat_filldir+0x100/0x100
[229978.873082] [<ffffffff81169c40>] ? compat_filldir+0x100/0x100
[229978.873082] [<ffffffff81136f0a>] ? vfs_readdir+0x9a/0xd0
[229978.873082] [<ffffffff811345db>] sys_rename+0x1b/0x20
[229978.873082] [<ffffffff814bd42c>] cstar_dispatch+0x7/0x32
[229978.873082] Code: e0 fe 48 85 c0 48 89 47 08 74 04 48 89 50 08 48 89
72 08 48 83 ca 01 48 89 16 0f ba 36 00 c9 c3 f3 90 48 8b 06 a8 01 75 f7 eb
be <0f> 0b eb fe 66 2e 0f 1f 84 00 00 00 00 00
55 48 89 e5 66 66 66
[229978.873082] RIP [<ffffffff81139ab2>] __d_rehash+0x52/0x60
[229978.873082] RSP <ffff8800f743dce8>
[229979.157156] ---[ end trace f2460e13ceb17f51 ]---
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Banerjee, Debabrata
2012-09-18 20:58:24 UTC
Permalink
[229978.873082] Pid: 11817, comm: xxxxxxxx. Not tainted 3.0.30-3.0.1-a=
md64
What's 3.0.30-3.0.1? Is this reproduceable with an upstream kernel?
--b.
An internal build number, it's 3.0.30. Also I already skimmed for relev=
ant
patches to fs/ext2/* and fs/dcache* in stable and mainline and could fi=
nd
none. My best guess right now is a race somewhere in ext2, dentry must =
get
rehashed between __d_drop(dentry) and __d_rehash(dentry,=8A) in __d_mov=
e().

-Debabrata

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel=
" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
J. Bruce Fields
2012-09-18 21:23:17 UTC
Permalink
Post by Banerjee, Debabrata
=20
[229978.873082] Pid: 11817, comm: xxxxxxxx. Not tainted 3.0.30-3.0.1=
-amd64
Post by Banerjee, Debabrata
What's 3.0.30-3.0.1? Is this reproduceable with an upstream kernel?
--b.
=20
An internal build number, it's 3.0.30.
But it looks like line 2105 of 3.0.30 is a comment?:

$ git show v3.0.30:fs/dcache.c |nl|grep '2105'
2105 * This helper attempts to cope with remotely renamed directories

$ git grep -n 'BUG_ON(!d_unhashed(entry))' v3.0.30:fs/dcache.c
v3.0.30:fs/dcache.c:2077: BUG_ON(!d_unhashed(entry));

Just curious.

--b.
Post by Banerjee, Debabrata
Also I already skimmed for relevant
patches to fs/ext2/* and fs/dcache* in stable and mainline and could =
find
Post by Banerjee, Debabrata
none. My best guess right now is a race somewhere in ext2, dentry mus=
t get
Post by Banerjee, Debabrata
rehashed between __d_drop(dentry) and __d_rehash(dentry,=C5=A0) in __=
d_move().
Post by Banerjee, Debabrata
=20
-Debabrata
=20
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel=
" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Banerjee, Debabrata
2012-09-18 22:17:33 UTC
Permalink
Post by J. Bruce Fields
$ git show v3.0.30:fs/dcache.c |nl|grep '2105'
2105 * This helper attempts to cope with remotely renamed directories
$ git grep -n 'BUG_ON(!d_unhashed(entry))' v3.0.30:fs/dcache.c
v3.0.30:fs/dcache.c:2077: BUG_ON(!d_unhashed(entry));
Just curious.
--b.
It's just some instrumentation, doesn't have an affect on the outcome,
it's the same src line. In retrospect I probably should have grabbed the
stack from the vanilla dcache.c. Anyways I'll be getting more data but it
will take a few days, I can get a vanilla one if that will satisfy you.

-Debabrata

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
J. Bruce Fields
2012-09-19 14:07:34 UTC
Permalink
Post by Banerjee, Debabrata
Post by J. Bruce Fields
$ git show v3.0.30:fs/dcache.c |nl|grep '2105'
2105 * This helper attempts to cope with remotely renamed directories
$ git grep -n 'BUG_ON(!d_unhashed(entry))' v3.0.30:fs/dcache.c
v3.0.30:fs/dcache.c:2077: BUG_ON(!d_unhashed(entry));
It's just some instrumentation, doesn't have an affect on the outcome,
it's the same src line. In retrospect I probably should have grabbed the
stack from the vanilla dcache.c. Anyways I'll be getting more data but it
will take a few days, I can get a vanilla one if that will satisfy you.
No big deal, just curious.

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Loading...