no reply from osd.X
Some slow requests and spontaneous restarts, which is not a critical event for ceph and quite frequent. In general, it happens. But, unfortunately, not all of the restarted osd were able to rise. Namely, the hero fell osd.45 on storage6. 017-10-12 12:10:01.756562 7f02459af880 0 ceph version 0.94.10 (b1e0532418e4631af01acbc0cedd426f1905f4af), process ceph-osd, pid 2343 2017-10-12 12:10:01.767719 7f02459af880 0 filestore(/var/lib/ceph/osd/ceph-45) backend xfs (magic 0x58465342) 2017-10-12 12:10:01.769302 7f02459af880 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-45) detect_features: FIEMAP ioctl is supported and appears to work 2017-10-12 12:10:01.769310 7f02459af880 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-45) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option 2017-10-12 12:10:01.785177 7f02459af880 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-45) detect_features: syncfs(2) syscall fully supported (by glibc and kernel) 2017-10-12 12:10:01.785229 7f02459af880 0 xfsfilestorebackend(/var/lib/ceph/osd/ceph-45) detect_feature: extsize is disabled by conf 2017-10-12 12:10:01.788289 7f02459af880 0 filestore(/var/lib/ceph/osd/ceph-45) mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled 2017-10-12 12:10:01.790182 7f02459af880 -1 journal FileJournal::_open: disabling aio for non-block journal. Use journal_force_aio to force use of aio anyway 2017-10-12 12:10:01.790184 7f02459af880 1 journal _open /var/lib/ceph/osd/ceph-45/journal fd 19: 10485760000 bytes, block size 4096 bytes, directio = 1, aio = 0 2017-10-12 12:10:01.790428 7f02459af880 1 journal _open /var/lib/ceph/osd/ceph-45/journal fd 19: 10485760000 bytes, block size 4096 bytes, directio = 1, aio = 0 2017-10-12 12:10:01.790923 7f02459af880 0 <cls> cls/hello/cls_hello.cc:271: loading cls_hello 2017-10-12 12:10:01.796251 7f02459af880 0 osd.45 19681 crush map has features 1107558400, adjusting msgr requires for clients 2017-10-12 12:10:01.796260 7f02459af880 0 osd.45 19681 crush map has features 1107558400 was 8705, adjusting msgr requires for mons 2017-10-12 12:10:01.796263 7f02459af880 0 osd.45 19681 crush map has features 1107558400, adjusting msgr requires for osds 2017-10-12 12:10:01.796273 7f02459af880 0 osd.45 19681 load_pgs 2017-10-12 12:10:01.850570 7f02459af880 -1 *** Caught signal (Aborted) ** in thread 7f02459af880 ceph version 0.94.10 (b1e0532418e4631af01acbc0cedd426f1905f4af) 1: /usr/bin/ceph-osd() [0xb35f7d] 2: (()+0x11390) [0x7f0244897390] 3: (gsignal()+0x38) [0x7f0242c8f428] 4: (abort()+0x16a) [0x7f0242c9102a] 5: (__gnu_cxx::__verbose_terminate_handler()+0x16d) [0x7f02435d284d] 6: (()+0x8d6b6) [0x7f02435d06b6] 7: (()+0x8d701) [0x7f02435d0701] 8: (()+0x8d919) [0x7f02435d0919] 9: (object_info_t::decode(ceph::buffer::list::iterator&)+0xb7a) [0x7b47ba] 10: (object_info_t::object_info_t(ceph::buffer::list&)+0x1b7) [0x78cd07] 11: (PGLog::read_log(ObjectStore*, coll_t, coll_t, ghobject_t, pg_info_t const&, std::map<eversion_t, hobject_t, std::less<eversion_t>, std::allocator<std::pair<eversion_t const, hobject_t> > >&, PGLog::IndexedLog&, pg_missing_t&, std::__cxx11::basic_ostringstream<char, std::char_traits<char>, std::allocator<char> >&, std::set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >*)+0xf52) [0x784c82] 12: (PG::read_state(ObjectStore*, ceph::buffer::list&)+0x24e) [0x81791e] 13: (OSD::load_pgs()+0xbb6) [0x6b9eb6] 14: (OSD::init()+0x137f) [0x6bdddf] 15: (main()+0x2c20) [0x646310] 16: (__libc_start_main()+0xf0) [0x7f0242c7a830] 17: (_start()+0x29) [0x65c779] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
root@storage6:~# /usr/bin/ceph-osd -f --cluster ceph --id 45 starting osd.45 at :/0 osd_data /var/lib/ceph/osd/ceph-45 /var/lib/ceph/osd/ceph-45/journal 2017-10-12 15:13:45.288995 7fc5b1b06880 -1 osd.45 22551 log_to_monitors {default=true} terminate called after throwing an instance of 'ceph::buffer::malformed_input' what(): buffer::malformed_input: __PRETTY_FUNCTION__ decode past end of struct encoding *** Caught signal (Aborted) ** in thread 7fc58e699700 terminate called recursively Aborted (core dumped)
recovery 3/3102030 unfound (0.000%)
root@volume1:~# ceph pg 2.67 list_missing { "offset": { "oid": "", "key": "", "snapid": 0, "hash": 0, "max": 0, "pool": -1, "namespace": "" }, "num_missing": 3, "num_unfound": 3, "objects": [ { "oid": { "oid": "rbd_data.8e53d86b8b4567.0000000000000220", "key": "", "snapid": -2, "hash": 2391933031, "max": 0, "pool": 2, "namespace": "" }, "need": "19232'24922826", "have": "19172'24922749", "locations": [] }, { "oid": { "oid": "rbd_data.42375f55c840f3.000000000000c6d3", "key": "", "snapid": -2, "hash": 1598576743, "max": 0, "pool": 2, "namespace": "" }, "need": "19229'24922824", "have": "19110'24922274", "locations": [] }, { "oid": { "oid": "rbd_data.b4b7816b8b4567.0000000000000298", "key": "", "snapid": -2, "hash": 2351466599, "max": 0, "pool": 2, "namespace": "" }, "need": "19229'24922823", "have": "19110'24921517", "locations": [] } ], "more": 0 }
pidof `start ceph-osd id=45` | xargs strace -p 2> 45-strace.log
ceph::buffer::malformed_input
thrown: stat("/var/lib/ceph/osd/ceph-45/current/2.c_head", {st_mode=S_IFDIR|0755, st_size=24576, ...}) = 0 stat("/var/lib/ceph/osd/ceph-45/current/2.c_head/DIR_C", {st_mode=S_IFDIR|0755, st_size=24576, ...}) = 0 stat("/var/lib/ceph/osd/ceph-45/current/2.c_head/DIR_C/DIR_0", {st_mode=S_IFDIR|0755, st_size=24576, ...}) = 0 stat("/var/lib/ceph/osd/ceph-45/current/2.c_head/DIR_C/DIR_0/DIR_0", {st_mode=S_IFDIR|0755, st_size=24576, ...}) = 0 stat("/var/lib/ceph/osd/ceph-45/current/2.c_head/DIR_C/DIR_0/DIR_0/DIR_D", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0 stat("/var/lib/ceph/osd/ceph-45/current/2.c_head/DIR_C/DIR_0/DIR_0/DIR_D/DIR_B", 0x7ffcccfdf330) = -1 ENOENT (No such file or directory) stat("/var/lib/ceph/osd/ceph-45/current/2.c_head/DIR_C/DIR_0/DIR_0/DIR_D/rbd\\udata.e1bde639901b63.0000000000000000__head_852BD00C__2", {st_mode=S_IFREG|0644, st_size=8388608, ...}) = 0 stat("/var/lib/ceph/osd/ceph-45/current/2.c_head/DIR_C/DIR_0/DIR_0/DIR_D/rbd\\udata.e1bde639901b63.0000000000000000__head_852BD00C__2", {st_mode=S_IFREG|0644, st_size=8388608, ...}) = 0 open("/var/lib/ceph/osd/ceph-45/current/2.c_head/DIR_C/DIR_0/DIR_0/DIR_D/rbd\\udata.e1bde639901b63.0000000000000000__head_852BD00C__2", O_RDWR) = 22 fgetxattr(22, "user.ceph._", 0x7ffcccfdf7c0, 100) = -1 ERANGE (Numerical result out of range) fgetxattr(22, "user.ceph._", NULL, 0) = 250 fgetxattr(22, "user.ceph._", "\17\10\r\1\0\0\4\3I\0\0\0\0\0\0\0(\0\0\0rbd_data.e1bde639901b63.0000000000000000\376\377\377\377\377\377\377\377\f\320+\205\0\0\0\0\0\2\0\0\0\0\0\0\0\6\3\34\0\0\0\2\0\0\0\0\0\0\0\377\377\377\377\0\0\0\0\0\0\0\0\377\377\377\377\377\377\377\377\0\0\0\0M\311\274\2\0\0\0\0\34K\0\0L\311\274\2\0\0\0\0\34K\0\0\2\2\25\0\0\0\10h\361\374\0\0\0\0\0\341\244\v\0\0\0\0\0\0\0\0\0\0\0\200\0\0\0\0\0\265?\336Y\256\270N\5\2\2\25\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0M\311\274\2\0\0\0\0\0\0\0", 250) = 250 futex(0x7f963ba19680, FUTEX_WAKE_PRIVATE, 2147483647) = 0 write(2, "terminate called after throwing "..., 48) = 48 write(2, "ceph::buffer::malformed_input", 29) = 29 write(2, "'\n", 2) = 2 write(2, " what(): ", 11) = 11 write(2, "buffer::malformed_input: __PRETT"..., 79) = 79 write(2, "\n", 1) = 1 rt_sigprocmask(SIG_UNBLOCK, [ABRT], NULL, 8) = 0 tgkill(23708, 23708, SIGABRT) = 0 --- SIGABRT {si_signo=SIGABRT, si_code=SI_TKILL, si_pid=23708, si_uid=0} --- write(2, "*** Caught signal (Aborted) **\n "..., 55) = 55 futex(0x7f963b802110, FUTEX_WAKE_PRIVATE, 2147483647) = 0 write(2, " ceph version 0.94.10 (b1e053241"..., 1418) = 1418 futex(0x48d40b4, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x48d40b0, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1 futex(0x48d4030, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x48d4058, FUTEX_WAIT_PRIVATE, 2, NULL) = 0
e1bde639901b63.0000000000000000__head_852BD00C__2
? Since the Streys is clearly seen that the exception is thrown after trying to process this object. Well, we are looking for his cue on other storajs: find /mnt/osd? -name *e1bde639901b63.0000000000000000__head_852BD00C*
pidof `start ceph-osd id=8` | awk {'print $1'} | xargs strace -fe open,fsetxattr -p 2>8-osd-xattr.strace
Source: https://habr.com/ru/post/353854/
All Articles