8000 Transcend MSM362I mSATA mini (TS32GMSM362I) freezes during self-test · Issue #278 · smartmontools/smartmontools · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Transcend MSM362I mSATA mini (TS32GMSM362I) freezes during self-test #278

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wofferl opened this issue Aug 29, 2024 · 1 comment
Open
Labels
drivedb Entries to the drivedb.h

Comments

@wofferl
Copy link
wofferl commented Aug 29, 2024
=== START OF INFORMATION SECTION ===
Device Model:     TS32GMSM362I
Firmware Version: Q0407A
User Capacity:    32,017,047,552 bytes [32.0 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Form Factor:      mSATA
TRIM Command:     Available
Device is:        Not in smartctl database 7.3/5610
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Thu Aug 29 12:53:57 2024 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is:   Unavailable
APM level is:     254 (maximum performance)
Rd look-ahead is: Disabled
Write cache is:   Enabled
DSN feature is:   Unavailable
ATA Security is:  Disabled, frozen [SEC2]
Wt Cache Reorder: Unavailable

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x02) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (  120) seconds.
Offline data collection
capabilities:                    (0x11) SMART execute Offline immediate.
                                        No Auto Offline data collection support.
                                        Suspend Offline collection upon new
                                        command.
                                        No Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        No Selective Self-test supported.
SMART capabilities:            (0x0002) Does not save SMART data before
                                        entering power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        (   1) minutes.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     ------   100   100   050    -    0
  5 Reallocated_Sector_Ct   -O----   100   100   050    -    0
  9 Power_On_Hours          ------   100   100   050    -    66
 12 Power_Cycle_Count       ------   100   100   050    -    46
160 Unknown_Attribute       ------   100   100   050    -    0
161 Unknown_Attribute       ------   100   100   050    -    181
162 Unknown_Attribute       ------   100   100   050    -    1
163 Unknown_Attribute       ------   100   100   050    -    8
164 Unknown_Attribute       ------   100   100   050    -    547446
165 Unknown_Attribute       ------   100   100   050    -    281
166 Unknown_Attribute       ------   100   100   050    -    229
167 Unknown_Attribute       ------   100   100   050    -    260
168 Unknown_Attribute       ------   100   100   050    -    3000
169 Unknown_Attribute       ------   092   092   050    -    92
192 Power-Off_Retract_Count ------   100   100   050    -    29
194 Temperature_Celsius     ------   100   100   050    -    32
195 Hardware_ECC_Recovered  ------   100   100   050    -    961
196 Reallocated_Event_Count ------   100   100   050    -    0
199 UDMA_CRC_Error_Count    ------   100   100   050    -    0
241 Total_LBAs_Written      ------   100   100   050    -    255138
242 Total_LBAs_Read         ------   100   100   050    -    5181
245 Unknown_Attribute       ------   100   100   050    -    268145
                            ||||||_ K auto-keep
                            |||||__ C event count
                            ||||___ R error rate
                            |||____ S speed/performance
                            ||_____ O updated online
                            |______ P prefailure warning

General Purpose Log Directory Version 1
SMART           Log Directory Version 1 [multi-sector log support]
Address    Access  R/W   Size  Description
0x00       GPL,SL  R/O      1  Log Directory
0x01           SL  R/O      1  Summary SMART error log
0x02           SL  R/O      1  Comprehensive SMART error log
0x03       GPL     R/O      1  Ext. Comprehensive SMART error log
0x04       GPL,SL  R/O      8  Device Statistics log
0x06           SL  R/O      1  SMART self-test log
0x07       GPL     R/O      1  Extended self-test log
0x09           SL  R/W      1  Selective self-test log
0x10       GPL     R/O      1  NCQ Command Error log
0x11       GPL     R/O      1  SATA Phy Event Counters log
0x30       GPL,SL  R/O      9  IDENTIFY DEVICE data log
0x31       GPL     -        4  Reserved
0x80-0x9f  GPL,SL  R/W     16  Host vendor specific log

SMART Extended Comprehensive Error Log Version: 1 (1 sectors)
No Errors Logged

Warning! SMART Extended Self-test Log Structure error: invalid SMART checksum.
SMART Extended Self-test Log Version: 1 (1 sectors)
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%        66         -
# 2  Short offline       Completed without error       00%        65         -
# 3  Short offline       Completed without error       00%        21         -
# 4  Extended offline    Completed without error       00%         4         -
# 5  Offline             Completed without error       00%         2         -
# 6  Offline             Completed without error       00%         2         -
# 7  Short offline       Completed without error       00%         0         -
# 8  Abort offline test  Aborted by host               00%         0         -

Selective Self-tests/Logging not supported

SCT Commands not supported

Device Statistics (GP Log 0x04)
Page  Offset Size        Value Flags Description
0x01  =====  =               =  ===  == General Statistics (rev 1) ==
0x01  0x008  4              46  ---  Lifetime Power-On Resets
0x01  0x010  4              66  ---  Power-on Hours
0x01  0x018  6      3835883162  ---  Logical Sectors Written
0x01  0x020  6        11790225  ---  Number of Write Commands
0x01  0x028  6       339596151  ---  Logical Sectors Read
0x01  0x030  6         2700324  ---  Number of Read Commands
0x04  =====  =               =  ===  == General Errors Statistics (rev 1) ==
0x04  0x008  4               0  ---  Number of Reported Uncorrectable Errors
0x04  0x010  4           12728  ---  Resets Between Cmd Acceptance and Completion
0x05  =====  =               =  ===  == Temperature Statistics (rev 1) ==
0x05  0x008  1              33  ---  Current Temperature
0x05  0x010  1              -2  ---  Average Short Term Temperature
0x05  0x018  1              97  ---  Average Long Term Temperature
0x05  0x020  1              67  ---  Highest Temperature
0x05  0x028  1              14  ---  Lowest Temperature
0x05  0x030  1              -1  ---  Highest Average Short Term Temperature
0x05  0x038  1               0  ---  Lowest Average Short Term Temperature
0x05  0x040  1              -2  ---  Highest Average Long Term Temperature
0x05  0x048  1               0  ---  Lowest Average Long Term Temperature
0x05  0x050  4               0  ---  Time in Over-Temperature
0x05  0x058  1             100  ---  Specified Maximum Operating Temperature
0x05  0x060  4               0  ---  Time in Under-Temperature
0x05  0x068  1               0  ---  Specified Minimum Operating Temperature
0x06  =====  =               =  ===  == Transport Statistics (rev 1) ==
0x06  0x008  4             556  ---  Number of Hardware Resets
0x06  0x018  4               0  ---  Number of Interface CRC Errors
0x07  =====  =               =  ===  == Solid State Device Statistics (rev 1) ==
0x07  0x008  1               8  ---  Percentage Used Endurance Indicator
                                |||_ C monitored condition met
                                ||__ D supports DSN
                                |___ N normalized value

Pending Defects log (GP Log 0x0c) not supported

SATA Phy Event Counters (GP Log 0x11)
ID      Size     Value  Description
0x0001  2            0  Command failed due to ICRC error
0x0002  2            0  R_ERR response for data FIS
0x0003  2            0  R_ERR response for device-to-host data FIS
0x0004  2            0  R_ERR response for host-to-device data FIS
0x0005  2            0  R_ERR response for non-data FIS
0x0006  2            0  R_ERR response for device-to-host non-data FIS
0x0007  2            0  R_ERR response for host-to-device non-data FIS
0x0008  2            0  Device-to-host non-data FIS retries
0x0009  2            0  Transition from drive PhyRdy to drive PhyNRdy
0x000a  2           28  Device-to-host register FISes sent due to a COMRESET
0x000b  2            0  CRC errors within host-to-device FIS
0x000d  2            0  Non-CRC errors within host-to-device FIS
0x000f  2            0  R_ERR response for host-to-device data FIS, CRC
0x0010  2            0  R_ERR response for host-to-device data FIS, non-CRC
0x0012  2            0  R_ERR response for host-to-device non-data FIS, CRC
0x0013  2            0  R_ERR response for host-to-device non-data FIS, non-CRC

This SSD seems to have a firmware bug where it becomes unresponsive when I/O happens during the self-test.
It is easy to reproduce with the following command:

smartctl -t short /dev/sda; dd if=/dev/sda of=/dev/null bs=1 count=1; sleep 120; dmesg -c; dd if=/dev/sda of=/dev/null bs=1 count=1; dmesg -c

The first dd read one byte during the self-test, but then the next dd after the self-test is done is getting a timeout. The unit must be disconnected from power to get it working again.

smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.1.0-23-amd64] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Short self-test routine immediately in off-line mode".
Drive command "Execute SMART Short self-test routine immediately in off-line mode" successful.
Testing has begun.
Please wait 1 minutes for test to complete.
Test will complete after Thu Aug 29 12:58:10 2024 CEST
Use smartctl -X to abort test.
1+0 records in
1+0 records out
1 byte copied, 0.000761994 s, 1.3 kB/s

dd: error reading '/dev/sda': Input/output error
0+0 records in
0+0 records out
0 bytes copied, 90.9276 s, 0.0 kB/s
[ 2249.883801] ata1.00: exception Emask 0x0 SAct 0x1000 SErr 0x50000 action 0x6 frozen
[ 2249.883900] ata1: SError: { PHYRdyChg CommWake }
[ 2249.883949] ata1.00: failed command: READ FPDMA QUEUED
[ 2249.883992] ata1.00: cmd 60/20:60:00:00:00/00:00:00:00:00/40 tag 12 ncq dma 16384 in
                        res 40/00:01:04:07:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 2249.884114] ata1.00: status: { DRDY }
[ 2249.884155] ata1: hard resetting link
[ 2255.235787] ata1: link is slow to respond, please be patient (ready=0)
[ 2260.075789] ata1: softreset failed (device not ready)
[ 2260.075864] ata1: hard resetting link
[ 2265.427782] ata1: link is slow to respond, please be patient (ready=0)
[ 2270.267788] ata1: softreset failed (device not ready)
[ 2270.267854] ata1: hard resetting link
[ 2275.623786] ata1: link is slow to respond, please be patient (ready=0)
[ 2285.803786] ata1: link is slow to respond, please be patient (ready=0)
[ 2305.303785] ata1: softreset failed (device not ready)
[ 2305.303865] ata1: limiting SATA link speed to 3.0 Gbps
[ 2305.303873] ata1: hard resetting link
[ 2310.515786] ata1: softreset failed (device not ready)
[ 2310.515847] ata1: reset failed, giving up
[ 2310.515884] ata1.00: disable device
[ 2310.516993] sd 0:0:0:0: [sda] tag#12 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=90s
[ 2310.517008] sd 0:0:0:0: [sda] tag#12 Sense Key : Not Ready [current]
[ 2310.517016] sd 0:0:0:0: [sda] tag#12 Add. Sense: Logical unit not ready, hard reset required
[ 2310.517026] sd 0:0:0:0: [sda] tag#12 CDB: Read(10) 28 00 00 00 00 00 00 00 20 00
[ 2310.517031] I/O error, dev sda, sector 0 op 0x0:(READ) flags 0x80700 phys_seg 4 prio class 2
[ 2310.517161] ata1: EH complete
[ 2310.517272] sd 0:0:0:0: [sda] tag#13 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=0s
[ 2310.517283] sd 0:0:0:0: [sda] tag#13 CDB: Read(10) 28 00 00 00 00 00 00 00 08 00
[ 2310.517287] I/O error, dev sda, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2
[ 2310.517351] Buffer I/O error on dev sda, logical block 0, async page read

Transcend replied with a standard answer to contact the distributor. I don't know yet if the distributor is willing to reach this
problem to transcend, because he simple said "the device is not smartctl compatible".

Before that we had a consumer SSD MSM 320 or 370 which probably had the same problem. The distributor then exchanged it for the embedded one because they suspected a temperature problem.

But since this is a test unit we want to deliver to our customers later, I have to prevent them from running the self-test.

What would be the "official" way for smartmontools to go here?
Patching smartd/smartctl so that the self-test does not start with a warning message or only adding a message about the faulty firmware 8018 in the database?

MSM362M & MSM362I mSATA mini

@chrfranke chrfranke added the drivedb Entries to the drivedb.h label Sep 23, 2024
@chrfranke
Copy link

Patching smartd/smartctl so that the self-test does not start with a warning message or only adding a message about the faulty firmware in the database?

We could add a warning message to the database.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
drivedb Entries to the drivedb.h
Projects
None yet
Development

No branches or pull requests

2 participants
0