Войти в систему

Home
    - Создать дневник
    - Написать в дневник
       - Подробный режим

LJ.Rossia.org
    - Новости сайта
    - Общие настройки
    - Sitemap
    - Оплата
    - ljr-fif

Редактировать...
    - Настройки
    - Список друзей
    - Дневник
    - Картинки
    - Пароль
    - Вид дневника

Сообщества

Настроить S2

Помощь
    - Забыли пароль?
    - FAQ
    - Тех. поддержка



Пишет Русскоязычное Linux-сообщество ([info]lj_ru_linux)
@ 2012-11-14 11:00:00


Previous Entry  Add to memories!  Tell a Friend!  Next Entry
Электроника винта пошла гулять?
Приветствую, уважаемые сообщники!

Сегодня ночью, при попытке переписать большие и толстые файлы на один из разделов виртуалки, получил фигвам.


В логах хоста такое:
Nov 13 23:52:29 reiss kernel: [2599179.840101] ata5: lost interrupt (Status 0x50)
Nov 13 23:52:29 reiss kernel: [2599179.840140] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Nov 13 23:52:29 reiss kernel: [2599179.840156] ata5.00: failed command: WRITE DMA EXT
Nov 13 23:52:29 reiss kernel: [2599179.840165] ata5.00: cmd 35/00:58:20:df:35/00:00:54:00:00/e0 tag 0 dma 45056 out
Nov 13 23:52:29 reiss kernel: [2599179.840166] res 40/00:01:09:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Nov 13 23:52:29 reiss kernel: [2599179.840176] ata5.00: status: { DRDY }
Nov 13 23:52:29 reiss kernel: [2599179.840222] ata5: soft resetting link
Nov 13 23:52:34 reiss kernel: [2599185.004130] ata5.00: qc timeout (cmd 0x27)
Nov 13 23:52:34 reiss kernel: [2599185.004136] ata5.00: failed to read native max address (err_mask=0x4)
Nov 13 23:52:34 reiss kernel: [2599185.004140] ata5.00: HPA support seems broken, skipping HPA handling
Nov 13 23:52:34 reiss kernel: [2599185.004143] ata5.00: revalidation failed (errno=-5)
Nov 13 23:52:34 reiss kernel: [2599185.004200] ata5: soft resetting link
Nov 13 23:52:34 reiss kernel: [2599185.176485] ata5.00: configured for UDMA/100
Nov 13 23:52:34 reiss kernel: [2599185.176502] ata5: EH complete
Nov 13 23:53:05 reiss kernel: [2599215.840096] ata5: lost interrupt (Status 0x50)
Nov 13 23:53:05 reiss kernel: [2599215.840137] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Nov 13 23:53:05 reiss kernel: [2599215.840152] ata5.00: failed command: WRITE DMA EXT
Nov 13 23:53:05 reiss kernel: [2599215.840161] ata5.00: cmd 35/00:58:20:df:35/00:00:54:00:00/e0 tag 0 dma 45056 out
Nov 13 23:53:05 reiss kernel: [2599215.840163] res 40/00:01:09:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Nov 13 23:53:05 reiss kernel: [2599215.840171] ata5.00: status: { DRDY }
Nov 13 23:53:05 reiss kernel: [2599215.840219] ata5: soft resetting link
Nov 13 23:53:05 reiss kernel: [2599216.020449] ata5.00: configured for UDMA/100
Nov 13 23:53:05 reiss kernel: [2599216.020464] ata5: EH complete
Nov 13 23:53:36 reiss kernel: [2599246.880101] ata5: lost interrupt (Status 0x50)
Nov 13 23:53:36 reiss kernel: [2599246.880141] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Nov 13 23:53:36 reiss kernel: [2599246.880156] ata5.00: failed command: WRITE DMA EXT
Nov 13 23:53:36 reiss kernel: [2599246.880166] ata5.00: cmd 35/00:58:20:df:35/00:00:54:00:00/e0 tag 0 dma 45056 out
Nov 13 23:53:36 reiss kernel: [2599246.880167] res 40/00:01:09:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Nov 13 23:53:36 reiss kernel: [2599246.880176] ata5.00: status: { DRDY }
Nov 13 23:53:36 reiss kernel: [2599246.880222] ata5: soft resetting link
Nov 13 23:53:36 reiss kernel: [2599247.060459] ata5.00: configured for UDMA/100
Nov 13 23:53:36 reiss kernel: [2599247.060475] ata5: EH complete

... и так далее, кучу экранов....

В логах виртуалки - ошибки о записи на диск в кучу секторов.

S.M.A.R.T. - статус винта рапортует вот о чём:
smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.2.0-3-amd64] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Device Model: ST2000DM001-9YN164
Serial Number: Z1E0N13L
LU WWN Device Id: 5 000c50 04d7a86db
Firmware Version: CC4B
User Capacity: 2,000,398,934,016 bytes [2.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: 8
ATA Standard is: ATA-8-ACS revision 4
Local Time is: Wed Nov 14 13:00:19 2012 EET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 245) Self-test routine in progress...
50% of test remaining.
Total time to complete Offline
data collection: ( 584) seconds.
Offline data collection
capabilities: (0x73) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
No Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 220) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x3085) SCT Status supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 116 100 006 Pre-fail Always - 115806880
3 Spin_Up_Time 0x0003 095 095 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 7
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 100 253 030 Pre-fail Always - 211070
9 Power_On_Hours 0x0032 098 098 000 Old_age Always - 2474
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 7
183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always - 0
184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
188 Command_Timeout 0x0032 100 100 000 Old_age Always - 0
189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0022 062 061 045 Old_age Always - 38 (Min/Max 32/38)
191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 0
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 3
193 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 182
194 Temperature_Celsius 0x0022 038 040 000 Old_age Always - 38 (0 27 0 0)
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 101094940213271
241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 190605094874
242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 5068997233

SMART Error Log Version: 1
No Errors Logged

Правильно ли я понимаю, что это глюк электроники винта, который висит на 5-м SATA канале, так как бэдов нету, винт не стукали и не перегревали?

UPD: Совсем забыл сказать - рядом, в той же корзине стоят 2 других (терабайтных) винта в зеркале - никаких нареканий, используются активно - там образы виртуалок.


(Читать комментарии) (Добавить комментарий)