Today something funny happened to me, I was crawling the web for PDFs files when I noticed my Ubuntu 16.04 wget command segfaulting, which is a pretty rare thing considering how well those commands are tested against input.

On Ubuntu 16.04 a wget version 1.17.1 is shipped,

GNU Wget 1.17.1 built on linux-gnu.

so I grabbed the sources from wget website to debug / investigate, or give a quick run on it with ASAN on, because I’m lazy…

On the wget website there is the version 1.18 which does not exibit this problem. So I assume or it was fixed or the code changed.

Anyway just sharing this because I think it’s funny and rare to see!

The culprit URL is : http://ia600208.us.archive.org/23/items/sarabia_20160316_0705/%d9%84%db%95%20%d8%aa%db%86%d9%be%d8%ae%d8%a7%d9%86%db%95%d9%88%db%95%20%d8%a8%db%86%20%d8%b9%db%95%d8%b1%d8%b9%db%95%d8%b1.pdf

which is a very peculiar filename, with arabics characters. In fact I didn’t even open it or can read the title, if anyone knows what the title means let me know, I was just downloading a bunch of stuff from archive.org.

ASAN trace

➜  wget-1.17.1 ./src/wget http://ia600208.us.archive.org/23/items/sarabia_20160316_0705/%d9%84%db%95%20%d8%aa%db%86%d9%be%d8%ae%d8%a7%d9%86%db%95%d9%88%db%95%20%d8%a8%db%86%20%d8%b9%db%95%d8%b1%d8%b9%db%95%d8%b1.pdf
--2016-07-09 23:38:04--  http://ia600208.us.archive.org/23/items/sarabia_20160316_0705/%d9%84%db%95%20%d8%aa%db%86%d9%be%d8%ae%d8%a7%d9%86%db%95%d9%88%db%95%20%d8%a8%db%86%20%d8%b9%db%95%d8%b1%d8%b9%db%95%d8%b1.pdf
Resolving ia600208.us.archive.org (ia600208.us.archive.org)... 207.241.227.228
Connecting to ia600208.us.archive.org (ia600208.us.archive.org)|207.241.227.228|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 444923 (434K) [application/pdf]
Saving to: ‘\331%84\333%95 ت\333%86پخا\331%86\333%95\331%88\333%95 ب\333%86 ع\333%95رع\333%95ر.pdf.2’

=================================================================
==42306==ERROR: AddressSanitizer: negative-size-param: (size=-4)
    #0 0x7fa89f7c8e72  (/usr/lib/x86_64-linux-gnu/libasan.so.3+0x47e72)
    #1 0x4c986f in memset /usr/include/x86_64-linux-gnu/bits/string3.h:90
    #2 0x4c986f in create_image /media/bob/e4109b52-3574-43a8-b95d-33b3494128de/misc/wget-1.17.1/src/progress.c:1167
    #3 0x4cbdb6 in bar_create /media/bob/e4109b52-3574-43a8-b95d-33b3494128de/misc/wget-1.17.1/src/progress.c:602
    #4 0x4dd0ae in fd_read_body /media/bob/e4109b52-3574-43a8-b95d-33b3494128de/misc/wget-1.17.1/src/retr.c:274
    #5 0x4826bc in read_response_body /media/bob/e4109b52-3574-43a8-b95d-33b3494128de/misc/wget-1.17.1/src/http.c:1682
    #6 0x49be1d in gethttp /media/bob/e4109b52-3574-43a8-b95d-33b3494128de/misc/wget-1.17.1/src/http.c:3753
    #7 0x4a1aaf in http_loop /media/bob/e4109b52-3574-43a8-b95d-33b3494128de/misc/wget-1.17.1/src/http.c:3971
    #8 0x4df57a in retrieve_url /media/bob/e4109b52-3574-43a8-b95d-33b3494128de/misc/wget-1.17.1/src/retr.c:817
    #9 0x40c142 in main /media/bob/e4109b52-3574-43a8-b95d-33b3494128de/misc/wget-1.17.1/src/main.c:1868
    #10 0x7fa89e7b582f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2082f)
    #11 0x40e948 in _start (/media/bob/e4109b52-3574-43a8-b95d-33b3494128de/misc/wget-1.17.1/src/wget+0x40e948)

0x61200000bb0f is located 207 bytes inside of 303-byte region [0x61200000ba40,0x61200000bb6f)
allocated by thread T0 here:
    #0 0x7fa89f847e18 in __interceptor_malloc (/usr/lib/x86_64-linux-gnu/libasan.so.3+0xc6e18)
    #1 0x543650 in xmalloc /media/bob/e4109b52-3574-43a8-b95d-33b3494128de/misc/wget-1.17.1/lib/xmalloc.c:41

SUMMARY: AddressSanitizer: negative-size-param (/usr/lib/x86_64-linux-gnu/libasan.so.3+0x47e72) 
==42306==ABORTING

So long story very short, a very big parameter is passed to memset, which asan catches, however the regular memset happily accept it..

gdb crash/stack trace

Program received signal SIGSEGV, Segmentation fault.
__memset_avx2 () at ../sysdeps/x86_64/multiarch/memset-avx2.S:161
161	../sysdeps/x86_64/multiarch/memset-avx2.S: No such file or directory.
(gdb) bt
#0  __memset_avx2 () at ../sysdeps/x86_64/multiarch/memset-avx2.S:161
#1  0x0000555555582891 in ?? ()
#2  0x0000555555582e3e in ?? ()
#3  0x0000555555585f32 in ?? ()
#4  0x0000555555575e00 in ?? ()
#5  0x000055555557afac in ?? ()
#6  0x000055555557bd2a in ?? ()
#7  0x0000555555586b64 in ?? ()
#8  0x0000555555561d83 in ?? ()
#9  0x00007ffff6aa8830 in __libc_start_main (main=0x555555560700, argc=2, argv=0x7fffffffe128, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffe118)
    at ../csu/libc-start.c:291
#10 0x0000555555562019 in ?? ()
(gdb) info registers
rax            0x20202020	538976288
rbx            0xfffffffffffffffc	-4
rcx            0xfffffffffffe876b	-96405
rdx            0xfffffffffffffffc	-4
rsi            0x5555557d776b	93824994867051
rdi            0x5555557ef000	93824994963456
rbp            0x5555557d74d0	0x5555557d74d0
rsp            0x7fffffffd4c8	0x7fffffffd4c8
r8             0x0	0
r9             0x5555557d76bb	93824994866875
r10            0x7fffffffd418	140737488344088
r11            0x0	0
r12            0x0	0
r13            0x71	113
r14            0x5555557cd8f0	93824994826480
r15            0x5555557d776f	93824994867055
rip            0x7ffff6bfa328	0x7ffff6bfa328 <__memset_avx2+392>
eflags         0x10287	[ CF PF SF IF RF ]
cs             0x33	51
ss             0x2b	43
ds             0x0	0
es             0x0	0
fs             0x0	0
gs             0x0	0