Discussion:
[RFC PATCH] coredump: fix incomplete core file created when dump_skip was used last
(too old to reply)
Victor Kamensky
2014-10-21 22:57:07 UTC
Permalink
Hi,

During gdb testsuite testing in arm V7 LE rootfs running on
top of ARM V8 kernel bigcore.exp test exposed issue in kernel
user land core file writing logic. The issue is that for
certain memory layout of crashed process upper address memory
pages were not available so dump_skip with llseek was used
but there was no subsequent write. As result core file was
truncated. Proposed RFC patch follows this cover letter. In
the proposed patch code tracks whether last operation was llseek
and writes one last byte at the end of core file to force write
into the file of skipped pages. Below is more details on issue
analysis and test case to reproduce similar issue on x86 box.

Thanks,
Victor

Appendix 1 Original issue analysis
----------------------------------

During test huge core file was created and when loaded into
gdb, gdb complained about mismatch in core file size:

BFD: Warning: /var/volatile/tmp/./core is truncated: expected core file size >= 4293058560, found: 4293038080.

i.e (0xffe2e000 - 0xffe29000 = 0x5000) 5 pages were missing in
core file.

Last 'Program Headers' entry in core file:

LOAD 0xffe1f000 0xffff1000 0x00000000 0x0f000 0x0f000 RW 0x1000

Size of core file

***@genericarmv7a:/tmp# ls -al core
-rw------- 1 root root 4293038080 Oct 14 23:20 core

debug printk from elf_core_dump showing that for the
last 5 page get_dump_page returned 0 and as result
dump_skip was called last 5 times. In

addr = 0xffffa000, page = 0xffffffbeedb6a920
addr = 0xffffb000, page = 0x (null)
addr = 0xffffc000, page = 0x (null)
addr = 0xffffd000, page = 0x (null)
addr = 0xffffe000, page = 0x (null)
addr = 0xfffff000, page = 0x (null)

In dump_skip llseek was executed in the file, but because
there were not subsequent writes after that resulting file
is missing those 5 pages.

Appendix 2 Test case for x86
----------------------------

Here is test that illustrates the issue on x86_64 machine.
Test should be compiled in 32bit mode, because in 64bit
mode [vsyscall] is at highest address entry and it always
dumped into core so issue is not reproduced.

Test creates MAP_FIXED mapping at upper addresses above
stack (i.e 0xfff00000 address works on my FC20) with mapping
size more than one page. Only first page in the mapping is
touched. Remaining pages in mapping will not have backing
memory so when process crashes it will create truncated core,
since dump_skip will be used for those not backed pages.

In below output note gdb complains about truncated core
file.

[***@coreos-lnx2 bc]$ ls
brokencore.c
[***@coreos-lnx2 bc]$ cat brokencore.c
#include <stdio.h>
#include <stdlib.h>
#include <sys/mman.h>

char *
create_mapping(void *addr, size_t size)
{
char *buffer = NULL;

buffer = mmap(addr, size,
PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS|MAP_FIXED,
0, 0);
if (buffer == MAP_FAILED) {
perror("mmap failed\n");
}
return buffer;
}

int
touch_memory (char *buffer, int size, int stride)
{
int retval = 0;
int i;
for (i = 0; i < size; i += stride) {
if (buffer[i] != 0) {
retval = 1;
}
buffer[i] = 1;
}
return retval;
}

int
main (int argc, char **argv)
{
size_t size;
void *addr;
int alloc_all = 0;

char *buffer;


if (argc > 2) {
addr = (void *) strtoul(argv[1], NULL, 16);
size = strtoul(argv[2], NULL, 10);
if (size <= 4096) {
printf("Size must be more than one page\n");
}
} else {
printf("Usage: %s hex_addr dec_size [commit_all]\n", argv[0]);
}

if (argc > 3) {
/*
* Will touch all memory pages in allocation,
* core file will be complete.
*/
alloc_all = 1;
}

buffer = create_mapping(addr, size);
if (buffer) {
/*
* Touch first page, so core file will have segment with
* entry for this mapping with FileSiz != 0
*/
touch_memory(buffer, 4096, 1024);
if (alloc_all) {
touch_memory(buffer, size, 1024);
}
} else {
printf("failed to do mmap\n");
}

/* crash */
*(char *)0 = 0;
return 0;
}
[***@coreos-lnx2 bc]$ gcc -m32 -g -o brokencore brokencore.c
[***@coreos-lnx2 bc]$ ulimit -c unlimited
[***@coreos-lnx2 bc]$ ./brokencore 0xfff00000 409600
Segmentation fault (core dumped)
[***@coreos-lnx2 bc]$ ls -l
total 92
-rwxrwxr-x. 1 kamensky kamensky 8936 Oct 21 15:02 brokencore
-rw-rw-r--. 1 kamensky kamensky 1655 Oct 21 15:02 brokencore.c
-rw-------. 1 kamensky kamensky 221184 Oct 21 15:03 core.12944
[***@coreos-lnx2 bc]$ gdb brokencore -core=./core.12944
GNU gdb (GDB) Fedora 7.7.1-19.fc20
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from brokencore...done.
BFD: Warning: /home/kamensky/tmp/bc/./core.12944 is truncated: expected core file size >= 626688, found: 221184.
[New LWP 12944]
Core was generated by `./brokencore 0xfff00000 409600'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x0804868e in main (argc=3, argv=0xffbce1e4) at brokencore.c:77
77 *(char *)0 = 0;
(gdb)

Victor Kamensky (1):
coredump: fix incomplete core file created when dump_skip was used
last

fs/coredump.c | 25 +++++++++++++++++++++++++
include/linux/binfmts.h | 6 ++++++
2 files changed, 31 insertions(+)
--
1.8.1.4
Victor Kamensky
2014-10-21 22:57:08 UTC
Permalink
There are situation when dump_skip is used on last part
of core file with intent of creating zeros. However if it
is not followed by any write (through dump_emit) zeros
will not be written into the file. I.e llseek without
subsequent write does not create any content.

Such issue happened during bigcore.exp gdb testsuite run on
ARM V7 rootfs running on top of ARM V8 kenel (compat mode).
For last vma in elf_core_dump function last 5 pages could
not be mapped and as result dump_skip was called. And resulting
core file was missing 5 last pages and gdb complained about
that.

Solution is to introduce new field into coredump_params struct
that would track whether last helper used did real write or
llseek. If last operation was llseek, move position back to
1 byte and write 1 zero byte. As result all content intended
with last dump_skip will appear in core file as zeros.

Signed-off-by: Victor Kamensky <***@linaro.org>
---
fs/coredump.c | 26 ++++++++++++++++++++++++++
include/linux/binfmts.h | 6 ++++++
2 files changed, 32 insertions(+)

diff --git a/fs/coredump.c b/fs/coredump.c
index a93f7e6..da0000b 100644
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -489,6 +489,27 @@ static int umh_pipe_setup(struct subprocess_info *info, struct cred *new)
return err;
}

+/*
+ * Write last byte into core file. It is called if last operation
+ * on core file was llseek through dump_skip, and we need to write
+ * something at the end of the file, so zeros intended to be added
+ * by dump_skip will go into the file.
+ */
+static int dump_write_last_byte(struct coredump_params *cprm)
+{
+ char lastbyte = 0;
+ struct file *file = cprm->file;
+
+ if (file->f_op->llseek && file->f_op->llseek != no_llseek) {
+ if (dump_interrupted() ||
+ file->f_op->llseek(file, -1, SEEK_CUR) < 0)
+ return 0;
+ if (!dump_emit(cprm, &lastbyte, 1))
+ return 0;
+ }
+ return 1;
+}
+
void do_coredump(const siginfo_t *siginfo)
{
struct core_state core_state;
@@ -514,6 +535,7 @@ void do_coredump(const siginfo_t *siginfo)
* by any locks.
*/
.mm_flags = mm->flags,
+ .last_op_status = COREDUMP_LAST_WRITE,
};

audit_core_dumps(siginfo->si_signo);
@@ -664,6 +686,8 @@ void do_coredump(const siginfo_t *siginfo)
if (!dump_interrupted()) {
file_start_write(cprm.file);
core_dumped = binfmt->core_dump(&cprm);
+ if (cprm.last_op_status == COREDUMP_LAST_LLSEEK)
+ dump_write_last_byte(&cprm);
file_end_write(cprm.file);
}
if (ispipe && core_pipe_limit)
@@ -706,6 +730,7 @@ int dump_emit(struct coredump_params *cprm, const void *addr, int nr)
cprm->written += n;
nr -= n;
}
+ cprm->last_op_status = COREDUMP_LAST_WRITE;
return 1;
}
EXPORT_SYMBOL(dump_emit);
@@ -721,6 +746,7 @@ int dump_skip(struct coredump_params *cprm, size_t nr)
file->f_op->llseek(file, nr, SEEK_CUR) < 0)
return 0;
cprm->written += nr;
+ cprm->last_op_status = COREDUMP_LAST_LLSEEK;
return 1;
} else {
while (nr > PAGE_SIZE) {
diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
index 61f29e5..5f4ad91 100644
--- a/include/linux/binfmts.h
+++ b/include/linux/binfmts.h
@@ -53,6 +53,11 @@ struct linux_binprm {
#define BINPRM_FLAGS_EXECFD_BIT 1
#define BINPRM_FLAGS_EXECFD (1 << BINPRM_FLAGS_EXECFD_BIT)

+/* last coredump operation status */
+
+#define COREDUMP_LAST_WRITE 1
+#define COREDUMP_LAST_LLSEEK 2
+
/* Function parameter for binfmt->coredump */
struct coredump_params {
const siginfo_t *siginfo;
@@ -61,6 +66,7 @@ struct coredump_params {
unsigned long limit;
unsigned long mm_flags;
loff_t written;
+ int last_op_status;
};

/*
--
1.8.1.4
Oleg Nesterov
2014-10-22 16:55:30 UTC
Permalink
Post by Victor Kamensky
+static int dump_write_last_byte(struct coredump_params *cprm)
+{
+ char lastbyte = 0;
+ struct file *file = cprm->file;
+
+ if (file->f_op->llseek && file->f_op->llseek != no_llseek) {
+ if (dump_interrupted() ||
+ file->f_op->llseek(file, -1, SEEK_CUR) < 0)
+ return 0;
+ if (!dump_emit(cprm, &lastbyte, 1))
+ return 0;
+ }
+ return 1;
+}
Perhaps do_truncate(cprm.file->f_path.dentry, ->f_pos) makes more sense?

and unless I missed something cprm->last_op_status can be avoided, we can
simply check f_pos != i_size_read() at the end?

Oleg.
Victor Kamensky
2014-10-22 23:21:49 UTC
Permalink
Post by Oleg Nesterov
Post by Victor Kamensky
+static int dump_write_last_byte(struct coredump_params *cprm)
+{
+ char lastbyte = 0;
+ struct file *file = cprm->file;
+
+ if (file->f_op->llseek && file->f_op->llseek != no_llseek) {
+ if (dump_interrupted() ||
+ file->f_op->llseek(file, -1, SEEK_CUR) < 0)
+ return 0;
+ if (!dump_emit(cprm, &lastbyte, 1))
+ return 0;
+ }
+ return 1;
+}
Perhaps do_truncate(cprm.file->f_path.dentry, ->f_pos) makes more sense?
and unless I missed something cprm->last_op_status can be avoided, we can
simply check f_pos != i_size_read() at the end?
Oleg, nice advise! Thanks.

So the whole fix becomes something like this:

diff --git a/fs/coredump.c b/fs/coredump.c
index a93f7e6..8c17f1d 100644
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -664,6 +664,14 @@ void do_coredump(const siginfo_t *siginfo)
if (!dump_interrupted()) {
file_start_write(cprm.file);
core_dumped = binfmt->core_dump(&cprm);
+ /*
+ * If last operation was dump_skip with llseek, we need to
+ * truncate file up to f_pos to match expected size.
+ */
+ if (!ispipe &&
+ (cprm.file->f_pos > i_size_read(file_inode(cprm.file))))
+ do_truncate(cprm.file->f_path.dentry,
+ cprm.file->f_pos, 0, cprm.file);
file_end_write(cprm.file);
}
if (ispipe && core_pipe_limit)

May I use your name with Suggested-by tag?

Thanks,
Victor
Post by Oleg Nesterov
Oleg.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Oleg Nesterov
2014-10-23 16:39:34 UTC
Permalink
Post by Victor Kamensky
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -664,6 +664,14 @@ void do_coredump(const siginfo_t *siginfo)
if (!dump_interrupted()) {
file_start_write(cprm.file);
core_dumped = binfmt->core_dump(&cprm);
+ /*
+ * If last operation was dump_skip with llseek, we need to
+ * truncate file up to f_pos to match expected size.
+ */
+ if (!ispipe &&
+ (cprm.file->f_pos > i_size_read(file_inode(cprm.file))))
+ do_truncate(cprm.file->f_path.dentry,
+ cprm.file->f_pos, 0, cprm.file);
file_end_write(cprm.file);
}
if (ispipe && core_pipe_limit)
May I use your name with Suggested-by tag?
Sure, thanks ;)

I'd suggest to add a simple helper, but I won't insist. And perhaps the
caller can also check "core_dumped" along with !ispipe, but this is
purely cosmetic.

Oleg.

Loading...