Discussion:
Intentionally corrupted vfat fs causing BUG
(too old to reply)
Sami Liedes
2014-10-10 20:57:06 UTC
Permalink
Hi!

I ran some fuzz tests on a vfat filesystem on 3.17 and found a
filesystem that differs from a pristine filesystem by one bit and
causes a kernel BUG. This seems to be an old bug; I can also replicate
it on a 3.3.4 kernel I happened to have lying around.

The set of operations I run for filesystems is this:

mount $TARGET_DEV /mnt -t vfat
cd /mnt
timeout 30 cp -r doc doc2 >&/dev/null
timeout 30 find -xdev >&/dev/null
timeout 30 find -xdev -print0 2>/dev/null |xargs -0 touch -- >&/dev/null
timeout 30 mkdir tmp >&/dev/null
timeout 30 echo whoah >tmp/filu >&/dev/null
timeout 30 rm -rf /mnt/* >&/dev/null
cd /
umount /mnt

The backtrace seems to indicate that the BUG happens at the rm phase.

You can get the pristine filesystem from

http://www.niksula.hut.fi/~sliedes/vfat/testimg.vfat.bz2

The broken filesystem is at

http://www.niksula.hut.fi/~sliedes/vfat/testimg.vfat.24.min.bz2

The only difference is this one bit:

--- /dev/fd/63 2014-10-10 23:23:09.424422610 +0300
+++ /dev/fd/62 2014-10-10 23:23:09.424422610 +0300
@@ -1977,7 +1977,7 @@
0008d230 08 39 08 39 00 00 bc 0d 08 39 13 00 00 00 00 00 |.9.9.....9......|
0008d240 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
-0008da00 2e 20 20 20 20 20 20 20 20 20 20 10 00 00 bc 0d |. .....|
+0008da00 2e 20 20 20 20 20 20 20 20 20 60 10 00 00 bc 0d |. `.....|
0008da10 08 39 08 39 00 00 bc 0d 08 39 0b 01 00 00 00 00 |.9.9.....9......|
0008da20 2e 2e 20 20 20 20 20 20 20 20 20 10 00 00 bc 0d |.. .....|
0008da30 08 39 08 39 00 00 bc 0d 08 39 13 00 00 00 00 00 |.9.9.....9......|


Backtrace on 3.17:

[ 1.363073] ------------[ cut here ]------------
[ 1.363437] kernel BUG at fs/namei.c:2430!
[ 1.363749] invalid opcode: 0000 [#1] SMP
[ 1.364088] CPU: 0 PID: 889 Comm: rm Not tainted 3.17.0+ #32
[ 1.364517] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
[ 1.365291] task: ffff880000066360 ti: ffff8800063b0000 task.ti: ffff8800063b0000
[ 1.365813] RIP: 0010:[<ffffffff8116c998>] [<ffffffff8116c998>] may_delete+0x128/0x140
[ 1.365813] RSP: 0018:ffff8800063b3e38 EFLAGS: 00010293
[ 1.365813] RAX: ffff8800065cf120 RBX: ffff8800065cf240 RCX: ffff8800000663b0
[ 1.365813] RDX: 0000000000000001 RSI: ffff8800065cf240 RDI: ffff880006631858
[ 1.365813] RBP: ffff8800063b3e58 R08: 0000000000000000 R09: 0000000000000001
[ 1.365813] R10: 0000000000000000 R11: 0000000000000044 R12: ffff8800066313b0
[ 1.365813] R13: ffff880006631858 R14: 0000000000000007 R15: 00000000fffffffe
[ 1.365813] FS: 0000000000000000(0000) GS:ffff880007c00000(0063) knlGS:00000000f7609940
[ 1.365813] CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033
[ 1.365813] CR2: 00000000ff967000 CR3: 000000000637f000 CR4: 00000000000006b0
[ 1.365813] Stack:
[ 1.365813] ffff8800065cf240 0000000000000000 ffff880006631858 0000000000000007
[ 1.365813] ffff8800063b3e80 ffffffff81173699 ffff880006334000 0000000000000000
[ 1.365813] 0000000008faf1e4 ffff8800063b3f68 ffffffff81173905 ffff8800065cf240
[ 1.365813] Call Trace:
[ 1.365813] [<ffffffff81173699>] vfs_rmdir+0x19/0xf0
[ 1.365813] [<ffffffff81173905>] do_rmdir+0x195/0x1d0
[ 1.365813] [<ffffffff810aa11d>] ? trace_hardirqs_on_caller+0x15d/0x200
[ 1.365813] [<ffffffff8165e9cb>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[ 1.365813] [<ffffffff81173d95>] SyS_unlinkat+0x25/0x40
[ 1.365813] [<ffffffff8188e888>] sysenter_dispatch+0x7/0x2a
[ 1.365813] Code: 41 5e 5d c3 0f 1f 80 00 00 00 00 b8 ff ff ff ff eb c4 90 0f 0b 66 0f 1f 44 00 00 48 39 5b 40 75 a2 b8 f0 ff ff ff eb ae 0f 1f 00 <0f> 0b 66 0f 1f 44 00 00 b8 fe ff ff ff eb 9c 66 0f 1f 84 00 00
[ 1.365813] RIP [<ffffffff8116c998>] may_delete+0x128/0x140
[ 1.365813] RSP <ffff8800063b3e38>
[ 1.378725] ---[ end trace 15817999647273ef ]---
[ 1.379086] Kernel panic - not syncing: Fatal exception
[ 1.379592] Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff)
[ 1.380349] Rebooting in 1 seconds..

Sami
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Richard Weinberger
2014-10-11 10:20:39 UTC
Permalink
Post by Sami Liedes
Hi!
I ran some fuzz tests on a vfat filesystem on 3.17 and found a
filesystem that differs from a pristine filesystem by one bit and
causes a kernel BUG. This seems to be an old bug; I can also replicate
it on a 3.3.4 kernel I happened to have lying around.
mount $TARGET_DEV /mnt -t vfat
cd /mnt
timeout 30 cp -r doc doc2 >&/dev/null
timeout 30 find -xdev >&/dev/null
timeout 30 find -xdev -print0 2>/dev/null |xargs -0 touch -- >&/dev/null
timeout 30 mkdir tmp >&/dev/null
timeout 30 echo whoah >tmp/filu >&/dev/null
timeout 30 rm -rf /mnt/* >&/dev/null
cd /
umount /mnt
The backtrace seems to indicate that the BUG happens at the rm phase.
You can get the pristine filesystem from
http://www.niksula.hut.fi/~sliedes/vfat/testimg.vfat.bz2
The broken filesystem is at
http://www.niksula.hut.fi/~sliedes/vfat/testimg.vfat.24.min.bz2
--- /dev/fd/63 2014-10-10 23:23:09.424422610 +0300
+++ /dev/fd/62 2014-10-10 23:23:09.424422610 +0300
@@ -1977,7 +1977,7 @@
0008d230 08 39 08 39 00 00 bc 0d 08 39 13 00 00 00 00 00 |.9.9.....9......|
0008d240 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
-0008da00 2e 20 20 20 20 20 20 20 20 20 20 10 00 00 bc 0d |. .....|
+0008da00 2e 20 20 20 20 20 20 20 20 20 60 10 00 00 bc 0d |. `.....|
The issue here is that this directory entry is faulty.
Usually each entry has a "." entry which points to itself. In your image
the ->name is different. Hence, there is no "." but a directory named
". `"
which points to itself.
IOW a directory loop.

The vFAT file system misses to check whether there is really a "." and
".." directory.
These directories are the only ones which are allowed to be hard links.
And it misses to detect loops which triggers the BUG_ON() in the VFS core.
For the latter issue I'm not unsure where to fix this.
Al?
Post by Sami Liedes
0008da10 08 39 08 39 00 00 bc 0d 08 39 0b 01 00 00 00 00 |.9.9.....9......|
0008da20 2e 2e 20 20 20 20 20 20 20 20 20 10 00 00 bc 0d |.. .....|
0008da30 08 39 08 39 00 00 bc 0d 08 39 13 00 00 00 00 00 |.9.9.....9......|
[ 1.363073] ------------[ cut here ]------------
[ 1.363437] kernel BUG at fs/namei.c:2430!
[ 1.363749] invalid opcode: 0000 [#1] SMP
[ 1.364088] CPU: 0 PID: 889 Comm: rm Not tainted 3.17.0+ #32
[ 1.364517] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
[ 1.365291] task: ffff880000066360 ti: ffff8800063b0000 task.ti: ffff8800063b0000
[ 1.365813] RIP: 0010:[<ffffffff8116c998>] [<ffffffff8116c998>] may_delete+0x128/0x140
[ 1.365813] RSP: 0018:ffff8800063b3e38 EFLAGS: 00010293
[ 1.365813] RAX: ffff8800065cf120 RBX: ffff8800065cf240 RCX: ffff8800000663b0
[ 1.365813] RDX: 0000000000000001 RSI: ffff8800065cf240 RDI: ffff880006631858
[ 1.365813] RBP: ffff8800063b3e58 R08: 0000000000000000 R09: 0000000000000001
[ 1.365813] R10: 0000000000000000 R11: 0000000000000044 R12: ffff8800066313b0
[ 1.365813] R13: ffff880006631858 R14: 0000000000000007 R15: 00000000fffffffe
[ 1.365813] FS: 0000000000000000(0000) GS:ffff880007c00000(0063) knlGS:00000000f7609940
[ 1.365813] CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033
[ 1.365813] CR2: 00000000ff967000 CR3: 000000000637f000 CR4: 00000000000006b0
[ 1.365813] ffff8800065cf240 0000000000000000 ffff880006631858 0000000000000007
[ 1.365813] ffff8800063b3e80 ffffffff81173699 ffff880006334000 0000000000000000
[ 1.365813] 0000000008faf1e4 ffff8800063b3f68 ffffffff81173905 ffff8800065cf240
[ 1.365813] [<ffffffff81173699>] vfs_rmdir+0x19/0xf0
[ 1.365813] [<ffffffff81173905>] do_rmdir+0x195/0x1d0
[ 1.365813] [<ffffffff810aa11d>] ? trace_hardirqs_on_caller+0x15d/0x200
[ 1.365813] [<ffffffff8165e9cb>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[ 1.365813] [<ffffffff81173d95>] SyS_unlinkat+0x25/0x40
[ 1.365813] [<ffffffff8188e888>] sysenter_dispatch+0x7/0x2a
[ 1.365813] Code: 41 5e 5d c3 0f 1f 80 00 00 00 00 b8 ff ff ff ff eb c4 90 0f 0b 66 0f 1f 44 00 00 48 39 5b 40 75 a2 b8 f0 ff ff ff eb ae 0f 1f 00 <0f> 0b 66 0f 1f 44 00 00 b8 fe ff ff ff eb 9c 66 0f 1f 84 00 00
[ 1.365813] RIP [<ffffffff8116c998>] may_delete+0x128/0x140
[ 1.365813] RSP <ffff8800063b3e38>
[ 1.378725] ---[ end trace 15817999647273ef ]---
[ 1.379086] Kernel panic - not syncing: Fatal exception
[ 1.379592] Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff)
[ 1.380349] Rebooting in 1 seconds..
Sami
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Thanks,
//richard
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
OGAWA Hirofumi
2014-10-12 12:08:14 UTC
Permalink
Post by Sami Liedes
Hi!
Hi,
Post by Sami Liedes
I ran some fuzz tests on a vfat filesystem on 3.17 and found a
filesystem that differs from a pristine filesystem by one bit and
causes a kernel BUG. This seems to be an old bug; I can also replicate
it on a 3.3.4 kernel I happened to have lying around.
mount $TARGET_DEV /mnt -t vfat
cd /mnt
timeout 30 cp -r doc doc2 >&/dev/null
timeout 30 find -xdev >&/dev/null
timeout 30 find -xdev -print0 2>/dev/null |xargs -0 touch -- >&/dev/null
timeout 30 mkdir tmp >&/dev/null
timeout 30 echo whoah >tmp/filu >&/dev/null
timeout 30 rm -rf /mnt/* >&/dev/null
cd /
umount /mnt
The backtrace seems to indicate that the BUG happens at the rm phase.
You can get the pristine filesystem from
http://www.niksula.hut.fi/~sliedes/vfat/testimg.vfat.bz2
The broken filesystem is at
http://www.niksula.hut.fi/~sliedes/vfat/testimg.vfat.24.min.bz2
Hm, "." entry in directly seems to be corrupted. Maybe, possible causes
are, memory error, memory corruption, fat bug, other bug?

Well, this is 1 bit flip. Can you check memory by memtest or such?

Do you know how this reproduce? testimg.vfat was empty, and
testimg.vfat.24.min was corrupted already.

We would need the way how make corrupted image like testimg.vfat.24.min,
to find the cause of this problem. Base image for reproducing this bug,
and way to do are very helpful.

Thanks.
--
OGAWA Hirofumi <***@mail.parknet.co.jp>
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Richard Weinberger
2014-10-12 19:04:19 UTC
Permalink
On Sun, Oct 12, 2014 at 2:08 PM, OGAWA Hirofumi
Post by OGAWA Hirofumi
Post by Sami Liedes
Hi!
Hi,
Post by Sami Liedes
I ran some fuzz tests on a vfat filesystem on 3.17 and found a
filesystem that differs from a pristine filesystem by one bit and
causes a kernel BUG. This seems to be an old bug; I can also replicate
it on a 3.3.4 kernel I happened to have lying around.
mount $TARGET_DEV /mnt -t vfat
cd /mnt
timeout 30 cp -r doc doc2 >&/dev/null
timeout 30 find -xdev >&/dev/null
timeout 30 find -xdev -print0 2>/dev/null |xargs -0 touch -- >&/dev/null
timeout 30 mkdir tmp >&/dev/null
timeout 30 echo whoah >tmp/filu >&/dev/null
timeout 30 rm -rf /mnt/* >&/dev/null
cd /
umount /mnt
The backtrace seems to indicate that the BUG happens at the rm phase.
You can get the pristine filesystem from
http://www.niksula.hut.fi/~sliedes/vfat/testimg.vfat.bz2
The broken filesystem is at
http://www.niksula.hut.fi/~sliedes/vfat/testimg.vfat.24.min.bz2
Hm, "." entry in directly seems to be corrupted. Maybe, possible causes
are, memory error, memory corruption, fat bug, other bug?
Well, this is 1 bit flip. Can you check memory by memtest or such?
Do you know how this reproduce? testimg.vfat was empty, and
testimg.vfat.24.min was corrupted already.
We would need the way how make corrupted image like testimg.vfat.24.min,
to find the cause of this problem. Base image for reproducing this bug,
and way to do are very helpful.
You misunderstood Sami's issue. He corrupted the vfat fs intentionally
to find issues
in the vfat driver.
And as he reports he found an nasty issue.
Any user can trigger a BUG_ON() using a crafted vfat image.
Please note, if you mount exactly the same image using msdos fs the issue
does not occur.
--
Thanks,
//richard
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Sami Liedes
2014-10-12 20:40:45 UTC
Permalink
Post by Richard Weinberger
You misunderstood Sami's issue. He corrupted the vfat fs intentionally
to find issues
in the vfat driver.
And as he reports he found an nasty issue.
Any user can trigger a BUG_ON() using a crafted vfat image.
Please note, if you mount exactly the same image using msdos fs the issue
does not occur.
Yeah, you can think of it as either a security issue if you wish, or
just as a plain old robustness issue in the age of unreliable USB
sticks etc. in that it just would be more ideal to fail gracefully
instead of crashing.

Anyway, I'm not advocating for any measure of severity (I leave that
to others); I just find and report these in the hope that someone
finds the reports useful. I personally view these more as robustness
than security bugs at the moment, but certainly they can be seen as
either.

And if some of these get fixed, I will rerun the tests, so I might
produce a daunting stream of reports. I know it would be nicer to
report everything at once, but usually the issues found first are
prevalent enough to mask other issues.

By the way, I find it interesting that once I implemented a tool to
minimize the differences between a bad fs and a good fs, every bug I
have found in any filesystem implementation has minimized to a single
bit flip. That suggests to me that fuzz testing is probably not very
effective in finding bugs that require more than that.

Sami
OGAWA Hirofumi
2014-10-13 07:57:52 UTC
Permalink
Post by Richard Weinberger
Post by OGAWA Hirofumi
We would need the way how make corrupted image like testimg.vfat.24.min,
to find the cause of this problem. Base image for reproducing this bug,
and way to do are very helpful.
You misunderstood Sami's issue. He corrupted the vfat fs intentionally
to find issues
in the vfat driver.
And as he reports he found an nasty issue.
Any user can trigger a BUG_ON() using a crafted vfat image.
Please note, if you mount exactly the same image using msdos fs the issue
does not occur.
Ah.

BTW, msdos doesn't allow ".*" as filename, so not trigger this. But root
cause of this is same as double linked dir, "." should not
matter. I.e. this issue would be able to reproduce on all FSes if made
corrupted image intentionally.

If we want to fix intentional corruption like this seriously, I guess we
would need something like online-fsck to detect like double link of
dir. If we want to avoid only Oops, it might be enough to remove
BUG_ON().

I'm still not sure whether this is right direction or not though,
because mount operation is root only and untrusted image should run fsck
before. But, also, Oops is clearly unexpected. Hmmm...

Al?


[PATCH] Avoid Oops on corrupted dir in may_delete()

Signed-off-by: OGAWA Hirofumi <***@mail.parknet.co.jp>
---

fs/namei.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

diff -puN fs/namei.c~fix-oops-on-corrupted-fs fs/namei.c
--- linux-3.17/fs/namei.c~fix-oops-on-corrupted-fs 2014-10-13 16:34:28.352999516 +0900
+++ linux-3.17-hirofumi/fs/namei.c 2014-10-13 16:35:19.196803169 +0900
@@ -2427,7 +2427,10 @@ static int may_delete(struct inode *dir,
return -ENOENT;
BUG_ON(!inode);

- BUG_ON(victim->d_parent->d_inode != dir);
+ /* Easy check of corrupted dir. */
+ if (victim->d_parent->d_inode != dir)
+ return -EBUSY;
+
audit_inode_child(dir, victim, AUDIT_TYPE_CHILD_DELETE);

error = inode_permission(dir, MAY_WRITE | MAY_EXEC);
_
--
OGAWA Hirofumi <***@mail.parknet.co.jp>
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Richard Weinberger
2014-10-13 08:22:31 UTC
Permalink
Post by OGAWA Hirofumi
Post by Richard Weinberger
Post by OGAWA Hirofumi
We would need the way how make corrupted image like testimg.vfat.24.min,
to find the cause of this problem. Base image for reproducing this bug,
and way to do are very helpful.
You misunderstood Sami's issue. He corrupted the vfat fs intentionally
to find issues
in the vfat driver.
And as he reports he found an nasty issue.
Any user can trigger a BUG_ON() using a crafted vfat image.
Please note, if you mount exactly the same image using msdos fs the issue
does not occur.
Ah.
BTW, msdos doesn't allow ".*" as filename, so not trigger this. But root
cause of this is same as double linked dir, "." should not
matter. I.e. this issue would be able to reproduce on all FSes if made
corrupted image intentionally.
If we want to fix intentional corruption like this seriously, I guess we
would need something like online-fsck to detect like double link of
dir. If we want to avoid only Oops, it might be enough to remove
BUG_ON().
I'm still not sure whether this is right direction or not though,
because mount operation is root only and untrusted image should run fsck
before. But, also, Oops is clearly unexpected. Hmmm...
This limitation is not true anymore. Plug in a USB stick into a recent Linux desktop,
it will automatically mount it...
Also think of user namespaces and FUSE.

Thanks,
//richard
Post by OGAWA Hirofumi
Al?
[PATCH] Avoid Oops on corrupted dir in may_delete()
---
fs/namei.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff -puN fs/namei.c~fix-oops-on-corrupted-fs fs/namei.c
--- linux-3.17/fs/namei.c~fix-oops-on-corrupted-fs 2014-10-13 16:34:28.352999516 +0900
+++ linux-3.17-hirofumi/fs/namei.c 2014-10-13 16:35:19.196803169 +0900
@@ -2427,7 +2427,10 @@ static int may_delete(struct inode *dir,
return -ENOENT;
BUG_ON(!inode);
- BUG_ON(victim->d_parent->d_inode != dir);
+ /* Easy check of corrupted dir. */
+ if (victim->d_parent->d_inode != dir)
+ return -EBUSY;
+
audit_inode_child(dir, victim, AUDIT_TYPE_CHILD_DELETE);
error = inode_permission(dir, MAY_WRITE | MAY_EXEC);
_
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
OGAWA Hirofumi
2014-10-13 08:35:46 UTC
Permalink
Post by Richard Weinberger
Post by OGAWA Hirofumi
I'm still not sure whether this is right direction or not though,
because mount operation is root only and untrusted image should run fsck
before. But, also, Oops is clearly unexpected. Hmmm...
This limitation is not true anymore. Plug in a USB stick into a recent
Linux desktop, it will automatically mount it... Also think of user
namespaces and FUSE.
Not really (well, true, some sort though). It is still controlled by root
or capability, right? I.e. still controlled by admin of system.

I read user namespaces last time, it doesn't allow to mount the block
device by namespace's root.

FUSE is allowed to mount by true user (I.e. admin can't disallow it)? I
still didn't check it fully.
--
OGAWA Hirofumi <***@mail.parknet.co.jp>
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Richard Weinberger
2014-10-13 08:39:03 UTC
Permalink
Post by OGAWA Hirofumi
Post by Richard Weinberger
Post by OGAWA Hirofumi
I'm still not sure whether this is right direction or not though,
because mount operation is root only and untrusted image should run fsck
before. But, also, Oops is clearly unexpected. Hmmm...
This limitation is not true anymore. Plug in a USB stick into a recent
Linux desktop, it will automatically mount it... Also think of user
namespaces and FUSE.
Not really (well, true, some sort though). It is still controlled by root
or capability, right? I.e. still controlled by admin of system.
Fact is, I can plugin a USB stick to my buddies Laptop and make it trigger a BUG_ON. :)
Post by OGAWA Hirofumi
I read user namespaces last time, it doesn't allow to mount the block
device by namespace's root.
FUSE is allowed to mount by true user (I.e. admin can't disallow it)? I
still didn't check it fully.
The question is how long these limits will stay...
User namespaces uncovered already a pile of issues wrt. to mounting.

Thanks,
//richard
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
OGAWA Hirofumi
2014-10-13 08:59:38 UTC
Permalink
Post by Richard Weinberger
Post by OGAWA Hirofumi
Post by Richard Weinberger
Post by OGAWA Hirofumi
I'm still not sure whether this is right direction or not though,
because mount operation is root only and untrusted image should run fsck
before. But, also, Oops is clearly unexpected. Hmmm...
This limitation is not true anymore. Plug in a USB stick into a recent
Linux desktop, it will automatically mount it... Also think of user
namespaces and FUSE.
Not really (well, true, some sort though). It is still controlled by root
or capability, right? I.e. still controlled by admin of system.
Fact is, I can plugin a USB stick to my buddies Laptop and make it trigger a BUG_ON. :)
Post by OGAWA Hirofumi
I read user namespaces last time, it doesn't allow to mount the block
device by namespace's root.
FUSE is allowed to mount by true user (I.e. admin can't disallow it)? I
still didn't check it fully.
The question is how long these limits will stay...
User namespaces uncovered already a pile of issues wrt. to mounting.
Well, anyway, I don't object like that simple patch.

My worry is, I feel we need something like online-fsck finally if we
tackled fully to avoid issues (I still didn't analyze about this issue
seriously and fully), and measurable overheads.

And I myself have interest to online/runtime-fsck (i.e. detect and fix)
though, I don't have interest to make it generic operations, and I would
not have interest to tackle for all FSes...
--
OGAWA Hirofumi <***@mail.parknet.co.jp>
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Richard Weinberger
2014-10-13 14:36:10 UTC
Permalink
Post by OGAWA Hirofumi
Post by Richard Weinberger
The question is how long these limits will stay...
User namespaces uncovered already a pile of issues wrt. to mounting.
Well, anyway, I don't object like that simple patch.
My worry is, I feel we need something like online-fsck finally if we
tackled fully to avoid issues (I still didn't analyze about this issue
seriously and fully), and measurable overheads.
And I myself have interest to online/runtime-fsck (i.e. detect and fix)
though, I don't have interest to make it generic operations, and I would
not have interest to tackle for all FSes...
At the end of the day we have to treat every filesystem provided by the user as
user input and must not trust it.

Thanks,
//richard
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Richard Weinberger
2014-10-19 16:36:10 UTC
Permalink
Post by OGAWA Hirofumi
Post by Richard Weinberger
Post by OGAWA Hirofumi
Post by Richard Weinberger
Post by OGAWA Hirofumi
I'm still not sure whether this is right direction or not though,
because mount operation is root only and untrusted image should run fsck
before. But, also, Oops is clearly unexpected. Hmmm...
This limitation is not true anymore. Plug in a USB stick into a recent
Linux desktop, it will automatically mount it... Also think of user
namespaces and FUSE.
Not really (well, true, some sort though). It is still controlled by root
or capability, right? I.e. still controlled by admin of system.
Fact is, I can plugin a USB stick to my buddies Laptop and make it trigger a BUG_ON. :)
Post by OGAWA Hirofumi
I read user namespaces last time, it doesn't allow to mount the block
device by namespace's root.
FUSE is allowed to mount by true user (I.e. admin can't disallow it)? I
still didn't check it fully.
The question is how long these limits will stay...
User namespaces uncovered already a pile of issues wrt. to mounting.
Well, anyway, I don't object like that simple patch.
My worry is, I feel we need something like online-fsck finally if we
tackled fully to avoid issues (I still didn't analyze about this issue
seriously and fully), and measurable overheads.
And I myself have interest to online/runtime-fsck (i.e. detect and fix)
though, I don't have interest to make it generic operations, and I would
not have interest to tackle for all FSes...
What about this one?

diff --git a/fs/fat/namei_vfat.c b/fs/fat/namei_vfat.c
index 6df8d3d..60a28b7 100644
--- a/fs/fat/namei_vfat.c
+++ b/fs/fat/namei_vfat.c
@@ -736,7 +736,8 @@ static struct dentry *vfat_lookup(struct inode *dir, struct dentry *dentry,
}

alias = d_find_alias(inode);
- if (alias && !vfat_d_anon_disconn(alias)) {
+ if (alias && !vfat_d_anon_disconn(alias) &&
+ alias->d_parent == dentry->d_parent) {
/*
* This inode has non anonymous-DCACHE_DISCONNECTED
* dentry. This means, the user did ->lookup() by an

VFAT suffers from the issue because it is using dentry aliases to have fast lookups
for short and long names. Without aliases the VFS will catch the loop just fine as on ext2 (I've
tested it).
Let's change vfat_lookup() to verify that both the alias and dentry have the same parent otherwise
we're facing a loop.

Thanks,
//richard
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
OGAWA Hirofumi
2014-10-23 15:28:58 UTC
Permalink
Post by Richard Weinberger
Post by OGAWA Hirofumi
Well, anyway, I don't object like that simple patch.
My worry is, I feel we need something like online-fsck finally if we
tackled fully to avoid issues (I still didn't analyze about this issue
seriously and fully), and measurable overheads.
And I myself have interest to online/runtime-fsck (i.e. detect and fix)
though, I don't have interest to make it generic operations, and I would
not have interest to tackle for all FSes...
What about this one?
Looks like strange. If we want to tackle this at per-FS. We should not
return double linked dir at first. Since double linked breaks dir
hierarchy, even if this one can avoid that Oops, double linked can be
easily the cause of another Oops, deadlock, etc.

Well, this patch is untested though. For example, somethings like
following. But, again, this fixes only one of cases in double linked.
(And to fix fully, my mind was already talked.)

Signed-off-by: OGAWA Hirofumi <***@mail.parknet.co.jp>
---

fs/fat/namei_msdos.c | 9 +++++++++
fs/fat/namei_vfat.c | 9 +++++++++
2 files changed, 18 insertions(+)

diff -puN fs/fat/namei_vfat.c~vfat-detect-dir-loop fs/fat/namei_vfat.c
--- linux-tux3/fs/fat/namei_vfat.c~vfat-detect-dir-loop 2014-10-20 17:44:17.874542711 +0900
+++ linux-tux3-hirofumi/fs/fat/namei_vfat.c 2014-10-20 17:42:24.494864616 +0900
@@ -735,6 +735,15 @@ static struct dentry *vfat_lookup(struct
goto error;
}

+ /* Simple sanity check to avoid Oops. */
+ if (inode == dentry->d_parent->d_inode) {
+ fat_fs_error(sb, "%s: detected directory loop (i_pos %lld)",
+ __func__, MSDOS_I(inode)->i_pos);
+ iput(inode);
+ err = -EIO;
+ goto error;
+ }
+
alias = d_find_alias(inode);
if (alias && !vfat_d_anon_disconn(alias)) {
/*
diff -puN fs/fat/namei_msdos.c~vfat-detect-dir-loop fs/fat/namei_msdos.c
--- linux-tux3/fs/fat/namei_msdos.c~vfat-detect-dir-loop 2014-10-20 17:44:25.858520043 +0900
+++ linux-tux3-hirofumi/fs/fat/namei_msdos.c 2014-10-20 17:45:42.654302012 +0900
@@ -215,6 +215,15 @@ static struct dentry *msdos_lookup(struc
case 0:
inode = fat_build_inode(sb, sinfo.de, sinfo.i_pos);
brelse(sinfo.bh);
+
+ /* Simple sanity check to avoid Oops. */
+ if (inode == dentry->d_parent->d_inode) {
+ fat_fs_error(sb,
+ "%s: detected directory loop (i_pos %lld)",
+ __func__, MSDOS_I(inode)->i_pos);
+ iput(inode);
+ inode = ERR_PTR(-EIO);
+ }
break;
default:
inode = ERR_PTR(err);
_
--
OGAWA Hirofumi <***@mail.parknet.co.jp>
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Al Viro
2014-10-23 16:01:06 UTC
Permalink
Post by OGAWA Hirofumi
Post by Richard Weinberger
What about this one?
Looks like strange. If we want to tackle this at per-FS. We should not
return double linked dir at first. Since double linked breaks dir
hierarchy, even if this one can avoid that Oops, double linked can be
easily the cause of another Oops, deadlock, etc.
Well, this patch is untested though. For example, somethings like
following. But, again, this fixes only one of cases in double linked.
(And to fix fully, my mind was already talked.)
Hmm... Why hadn't d_splice_alias() caught that, though? Look: in that
case we see that inode is non-NULL, a directory and has an alias
(namely, dentry->d_parent). So we hit this:
new = __d_find_any_alias(inode);
if (new) {
if (!IS_ROOT(new)) {
spin_unlock(&inode->i_lock);
dput(new);
return ERR_PTR(-EIO);
}
if (d_ancestor(new, dentry)) {
spin_unlock(&inode->i_lock);
dput(new);
return ERR_PTR(-EIO);
}
and depending on whether that ->d_parent had been the filesystem root,
we hit either the former or the latter. IOW, we should've done
exactly that...

FWIW, there *is* a bug in that path - we ought to have done iput(inode)
on both failure exits in order to follow the calling conventions. But that
doesn't look like it would oops right there...

Could somebody repost the oops stack trace? The bug in d_splice_alias()
is real (and fairly old), but I'd like to understand if there's anything
else in the game...
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Al Viro
2014-10-23 16:16:06 UTC
Permalink
Post by Al Viro
Hmm... Why hadn't d_splice_alias() caught that, though?
Aha. It's not namei_msdos.c part, it's namei_vfat.c one. And there
we don't call d_splice_alias() on the affected path...

OK, so your check isn't enough. What we need there is this:
if (alias && alias->d_parent == dentry->d_parent && ...)
Otherwise that d_move() isn't safe at all.

Moreover, for directories we don't want to bother with that codepath
at all - d_splice_alias() will do that d_move() just fine there.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
OGAWA Hirofumi
2014-10-23 16:45:49 UTC
Permalink
Post by Al Viro
Post by Al Viro
Hmm... Why hadn't d_splice_alias() caught that, though?
Aha. It's not namei_msdos.c part, it's namei_vfat.c one. And there
we don't call d_splice_alias() on the affected path...
if (alias && alias->d_parent == dentry->d_parent && ...)
Otherwise that d_move() isn't safe at all.
Hm, sounds like I'm missing something. what case has different
->d_parent on alias if prevented by my check?
Post by Al Viro
Moreover, for directories we don't want to bother with that codepath
at all - d_splice_alias() will do that d_move() just fine there.
d_splice_alias() calls __d_find_alias() with want_discon==1, so
__d_find_alias() doesn't return dentry, and d_splice_alias() doesn't use
d_move() path, right?
--
OGAWA Hirofumi <***@mail.parknet.co.jp>
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
OGAWA Hirofumi
2014-10-23 16:50:39 UTC
Permalink
Post by OGAWA Hirofumi
Post by Al Viro
Post by Al Viro
Hmm... Why hadn't d_splice_alias() caught that, though?
Aha. It's not namei_msdos.c part, it's namei_vfat.c one. And there
we don't call d_splice_alias() on the affected path...
if (alias && alias->d_parent == dentry->d_parent && ...)
Otherwise that d_move() isn't safe at all.
Hm, sounds like I'm missing something. what case has different
->d_parent on alias if prevented by my check?
You meant the case of more complex double linked loop?
--
OGAWA Hirofumi <***@mail.parknet.co.jp>
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Richard Weinberger
2014-10-23 16:55:46 UTC
Permalink
Post by OGAWA Hirofumi
Post by OGAWA Hirofumi
Post by Al Viro
Post by Al Viro
Hmm... Why hadn't d_splice_alias() caught that, though?
Aha. It's not namei_msdos.c part, it's namei_vfat.c one. And there
we don't call d_splice_alias() on the affected path...
if (alias && alias->d_parent == dentry->d_parent && ...)
Otherwise that d_move() isn't safe at all.
Hm, sounds like I'm missing something. what case has different
->d_parent on alias if prevented by my check?
You meant the case of more complex double linked loop?
Yes. This is why I came up with the alias->d_parent == dentry->d_parent check.

Thanks,
//richard
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Al Viro
2014-10-23 16:55:33 UTC
Permalink
Post by OGAWA Hirofumi
d_splice_alias() calls __d_find_alias() with want_discon==1, so
__d_find_alias() doesn't return dentry, and d_splice_alias() doesn't use
d_move() path, right?
Hmm... Not in the current mainline (and not because of want_discon - that's
gone already). However, with the fixes I've got in the local tree it
will both find and move it - same as d_materialise_unique() would in the
current mainline.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Al Viro
2014-10-23 17:21:58 UTC
Permalink
Post by Al Viro
Post by OGAWA Hirofumi
d_splice_alias() calls __d_find_alias() with want_discon==1, so
__d_find_alias() doesn't return dentry, and d_splice_alias() doesn't use
d_move() path, right?
Hmm... Not in the current mainline (and not because of want_discon - that's
gone already). However, with the fixes I've got in the local tree it
will both find and move it - same as d_materialise_unique() would in the
current mainline.
Untested interim fix follows; as soon as d_splice_alias()/d_materialise_unique()
merge happens, we'll be able to clean vfat_lookup() a bit more.

a) don't bother with ->d_time for positives - we only check it for negatives
anyway.
b) make sure to set it at unlink and rmdir time - at *that* point soon-to-be
negative dentry matches then-current directory contents
c) don't go into renaming of old alias in vfat_lookup() unless it has
the same parent (which it will, unless we are seeing corrupted image) *and*
is a non-directory
d) use (for now) d_materialise_unique() instead of d_splice_alias() - that one
will do renames of old directory aliases just fine (and pretty soon so will
d_splice_alias(), but this bug is -stable fodder)

Signed-off-by: Al Viro <***@zeniv.linux.org.uk>
---
diff --git a/fs/fat/namei_vfat.c b/fs/fat/namei_vfat.c
index 6df8d3d..eed856f 100644
--- a/fs/fat/namei_vfat.c
+++ b/fs/fat/namei_vfat.c
@@ -736,17 +736,17 @@ static struct dentry *vfat_lookup(struct inode *dir, struct dentry *dentry,
}

alias = d_find_alias(inode);
- if (alias && !vfat_d_anon_disconn(alias)) {
+ if (alias && alias->d_parent == dentry->d_parent &&
+ !S_ISDIR(inode->i_mode) && !vfat_d_anon_disconn(alias)) {
/*
- * This inode has non anonymous-DCACHE_DISCONNECTED
+ * This file has non anonymous-DCACHE_DISCONNECTED
* dentry. This means, the user did ->lookup() by an
* another name (longname vs 8.3 alias of it) in past.
*
* Switch to new one for reason of locality if possible.
*/
BUG_ON(d_unhashed(alias));
- if (!S_ISDIR(inode->i_mode))
- d_move(alias, dentry);
+ d_move(alias, dentry);
iput(inode);
mutex_unlock(&MSDOS_SB(sb)->s_lock);
return alias;
@@ -755,12 +755,9 @@ static struct dentry *vfat_lookup(struct inode *dir, struct dentry *dentry,

out:
mutex_unlock(&MSDOS_SB(sb)->s_lock);
- dentry->d_time = dentry->d_parent->d_inode->i_version;
- dentry = d_splice_alias(inode, dentry);
- if (dentry)
- dentry->d_time = dentry->d_parent->d_inode->i_version;
- return dentry;
-
+ if (!inode)
+ dentry->d_time = dir->i_version;
+ return d_materialise_unique(dentry, inode);
error:
mutex_unlock(&MSDOS_SB(sb)->s_lock);
return ERR_PTR(err);
@@ -793,7 +790,6 @@ static int vfat_create(struct inode *dir, struct dentry *dentry, umode_t mode,
inode->i_mtime = inode->i_atime = inode->i_ctime = ts;
/* timestamp is already written, so mark_inode_dirty() is unneeded. */

- dentry->d_time = dentry->d_parent->d_inode->i_version;
d_instantiate(dentry, inode);
out:
mutex_unlock(&MSDOS_SB(sb)->s_lock);
@@ -824,6 +820,7 @@ static int vfat_rmdir(struct inode *dir, struct dentry *dentry)
clear_nlink(inode);
inode->i_mtime = inode->i_atime = CURRENT_TIME_SEC;
fat_detach(inode);
+ dentry->d_time = dir->i_version;
out:
mutex_unlock(&MSDOS_SB(sb)->s_lock);

@@ -849,6 +846,7 @@ static int vfat_unlink(struct inode *dir, struct dentry *dentry)
clear_nlink(inode);
inode->i_mtime = inode->i_atime = CURRENT_TIME_SEC;
fat_detach(inode);
+ dentry->d_time = dir->i_version;
out:
mutex_unlock(&MSDOS_SB(sb)->s_lock);

@@ -889,7 +887,6 @@ static int vfat_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode)
inode->i_mtime = inode->i_atime = inode->i_ctime = ts;
/* timestamp is already written, so mark_inode_dirty() is unneeded. */

- dentry->d_time = dentry->d_parent->d_inode->i_version;
d_instantiate(dentry, inode);

mutex_unlock(&MSDOS_SB(sb)->s_lock);
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
OGAWA Hirofumi
2014-10-23 17:58:05 UTC
Permalink
Post by Al Viro
Post by Al Viro
Post by OGAWA Hirofumi
d_splice_alias() calls __d_find_alias() with want_discon==1, so
__d_find_alias() doesn't return dentry, and d_splice_alias() doesn't use
d_move() path, right?
Hmm... Not in the current mainline (and not because of want_discon - that's
gone already). However, with the fixes I've got in the local tree it
will both find and move it - same as d_materialise_unique() would in the
current mainline.
Untested interim fix follows; as soon as d_splice_alias()/d_materialise_unique()
merge happens, we'll be able to clean vfat_lookup() a bit more.
a) don't bother with ->d_time for positives - we only check it for negatives
anyway.
b) make sure to set it at unlink and rmdir time - at *that* point soon-to-be
negative dentry matches then-current directory contents
c) don't go into renaming of old alias in vfat_lookup() unless it has
the same parent (which it will, unless we are seeing corrupted image) *and*
is a non-directory
d) use (for now) d_materialise_unique() instead of d_splice_alias() - that one
will do renames of old directory aliases just fine (and pretty soon so will
d_splice_alias(), but this bug is -stable fodder)
Ah, this calls d_move() for directory?

I recalled why it didn't change alias if directory. Changing result of
getcwd() may become the surprise of apps... If doesn't surprise, looks
good.
Post by Al Viro
---
diff --git a/fs/fat/namei_vfat.c b/fs/fat/namei_vfat.c
index 6df8d3d..eed856f 100644
--- a/fs/fat/namei_vfat.c
+++ b/fs/fat/namei_vfat.c
@@ -736,17 +736,17 @@ static struct dentry *vfat_lookup(struct inode *dir, struct dentry *dentry,
}
alias = d_find_alias(inode);
- if (alias && !vfat_d_anon_disconn(alias)) {
+ if (alias && alias->d_parent == dentry->d_parent &&
+ !S_ISDIR(inode->i_mode) && !vfat_d_anon_disconn(alias)) {
/*
- * This inode has non anonymous-DCACHE_DISCONNECTED
+ * This file has non anonymous-DCACHE_DISCONNECTED
* dentry. This means, the user did ->lookup() by an
* another name (longname vs 8.3 alias of it) in past.
*
* Switch to new one for reason of locality if possible.
*/
BUG_ON(d_unhashed(alias));
- if (!S_ISDIR(inode->i_mode))
- d_move(alias, dentry);
+ d_move(alias, dentry);
iput(inode);
mutex_unlock(&MSDOS_SB(sb)->s_lock);
return alias;
@@ -755,12 +755,9 @@ static struct dentry *vfat_lookup(struct inode *dir, struct dentry *dentry,
mutex_unlock(&MSDOS_SB(sb)->s_lock);
- dentry->d_time = dentry->d_parent->d_inode->i_version;
- dentry = d_splice_alias(inode, dentry);
- if (dentry)
- dentry->d_time = dentry->d_parent->d_inode->i_version;
- return dentry;
-
+ if (!inode)
+ dentry->d_time = dir->i_version;
+ return d_materialise_unique(dentry, inode);
mutex_unlock(&MSDOS_SB(sb)->s_lock);
return ERR_PTR(err);
@@ -793,7 +790,6 @@ static int vfat_create(struct inode *dir, struct dentry *dentry, umode_t mode,
inode->i_mtime = inode->i_atime = inode->i_ctime = ts;
/* timestamp is already written, so mark_inode_dirty() is unneeded. */
- dentry->d_time = dentry->d_parent->d_inode->i_version;
d_instantiate(dentry, inode);
mutex_unlock(&MSDOS_SB(sb)->s_lock);
@@ -824,6 +820,7 @@ static int vfat_rmdir(struct inode *dir, struct dentry *dentry)
clear_nlink(inode);
inode->i_mtime = inode->i_atime = CURRENT_TIME_SEC;
fat_detach(inode);
+ dentry->d_time = dir->i_version;
mutex_unlock(&MSDOS_SB(sb)->s_lock);
@@ -849,6 +846,7 @@ static int vfat_unlink(struct inode *dir, struct dentry *dentry)
clear_nlink(inode);
inode->i_mtime = inode->i_atime = CURRENT_TIME_SEC;
fat_detach(inode);
+ dentry->d_time = dir->i_version;
mutex_unlock(&MSDOS_SB(sb)->s_lock);
@@ -889,7 +887,6 @@ static int vfat_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode)
inode->i_mtime = inode->i_atime = inode->i_ctime = ts;
/* timestamp is already written, so mark_inode_dirty() is unneeded. */
- dentry->d_time = dentry->d_parent->d_inode->i_version;
d_instantiate(dentry, inode);
mutex_unlock(&MSDOS_SB(sb)->s_lock);
--
OGAWA Hirofumi <***@mail.parknet.co.jp>
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
OGAWA Hirofumi
2014-10-23 17:35:18 UTC
Permalink
Post by Al Viro
Post by OGAWA Hirofumi
d_splice_alias() calls __d_find_alias() with want_discon==1, so
__d_find_alias() doesn't return dentry, and d_splice_alias() doesn't use
d_move() path, right?
Hmm... Not in the current mainline (and not because of want_discon - that's
gone already). However, with the fixes I've got in the local tree it
will both find and move it - same as d_materialise_unique() would in the
current mainline.
Ah, I see. Checked latest linus tree, looks like added some checks (I
was still working on 3.6.15, sorry).

So, Richard, can you add comment such as

/* Check FS corruption, will handle by d_splice_alias() */

for that less understandable check, and resend patch with your
Signed-off-by: (better to Cc: akpm)?

Acked-by: OGAWA Hirofumi <***@mail.parknet.co.jp>

Thanks.
--
OGAWA Hirofumi <***@mail.parknet.co.jp>
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
J. Bruce Fields
2014-10-23 17:54:28 UTC
Permalink
Post by Al Viro
Post by OGAWA Hirofumi
d_splice_alias() calls __d_find_alias() with want_discon==1, so
__d_find_alias() doesn't return dentry, and d_splice_alias() doesn't use
d_move() path, right?
Hmm... Not in the current mainline (and not because of want_discon - that's
gone already). However, with the fixes I've got in the local tree it
will both find and move it - same as d_materialise_unique() would in the
current mainline.
What happened to the idea of failing with -EIO in that case?

In the case of a disk filesystem it seems like it can only happen due to
corruption and I'd assumed we're better off failing the first time we
notice it.

(That said, I haven't thought it through beyond that, so it's not
something I have that strong an opinion on.)

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Al Viro
2014-10-23 18:05:11 UTC
Permalink
Post by J. Bruce Fields
Post by Al Viro
Hmm... Not in the current mainline (and not because of want_discon - that's
gone already). However, with the fixes I've got in the local tree it
will both find and move it - same as d_materialise_unique() would in the
current mainline.
What happened to the idea of failing with -EIO in that case?
Quiet d_move() is definitely better in this case. Failing with EIO is what
d_splice_alias() does *now*, and that's why the patch I've posted uses
d_materialise_unique(). Patches in my local tree switch that case of
d_splice_alias() to what d_materialise_unique() does (and kill
d_materialise_unique() completely - compat #define is left, but all
in-tree users are switched to d_splice_alias())...
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
J. Bruce Fields
2014-10-23 18:16:46 UTC
Permalink
Post by Al Viro
Post by J. Bruce Fields
Post by Al Viro
Hmm... Not in the current mainline (and not because of want_discon - that's
gone already). However, with the fixes I've got in the local tree it
will both find and move it - same as d_materialise_unique() would in the
current mainline.
What happened to the idea of failing with -EIO in that case?
Quiet d_move() is definitely better in this case.
Can you explain why? (Apologies if I'm missing the obvious.)

--b.
Post by Al Viro
Failing with EIO is what
d_splice_alias() does *now*, and that's why the patch I've posted uses
d_materialise_unique(). Patches in my local tree switch that case of
d_splice_alias() to what d_materialise_unique() does (and kill
d_materialise_unique() completely - compat #define is left, but all
in-tree users are switched to d_splice_alias())...
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Al Viro
2014-10-23 16:56:29 UTC
Permalink
Post by OGAWA Hirofumi
Hm, sounds like I'm missing something. what case has different
->d_parent on alias if prevented by my check?
A cross-directory link.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Loading...