Friday, September 15, 2017

Best Practices Of Running File System Check On Ext4 Or Xfs


linux dvd.png
Generally, running a file system check/repair command in Linux is expected to automatically repair at least some of the inconsistencies it finds. In some cases, severely damaged i-nodes or directories may be discarded if they cannot be repaired, hence, significant changes to the file system may occur which may result in data loss.


To ensure that unexpected or undesirable changes are not permanently made, perform the following precautionary steps:



Dry run (read-only mode)
-----------------------------------------
Most file system checkers have a mode of operation which checks but does not repair the file system. In this mode, the checker will print any errors that it finds and actions that it would have taken, without actually modifying the file system. Using "e2fsck" on EXT4 (ext family file systems) and "xfs_repair" for XFS.



--> From man page of "e2fsck" :

e2fsck -n <BlockDevice>

-n  Open  the filesystem read-only, and assume an answer of ‘no’ to all questions.  Allows e2fsck to be used non-interactively. This option may not be specified at the same time as the -p or -y options.


--> Snap shows how to run file system check in dry-run mode using "e2fsck" command:

[root@localhost ~]# e2fsck -n /dev/sdc1

e2fsck 1.41.12 (17-May-2010)

/dev/sdc1: clean, 11/76912 files, 19969/307200 blocks


--> Run file system check in dry-run mode using "xfs_repair" (on XFS file systems):

[ command syntax: xfs_repair -n <BlockDevice> OR xfs_repair -n <MountPoint> ]


[root@managed1 ~]# xfs_repair -n /dev/sdb1

Phase 1 - find and verify superblock...

Phase 2 - using internal log

- zero log...

- scan filesystem freespace and inode maps...
- found root inode chunk
Phase 3 - for each AG...
- scan (but don't clear) agi unlinked lists...
- process known inodes and perform inode discovery...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
- process newly discovered inodes...
Phase 4 - check for duplicate blocks...
- setting up duplicate extent list...
- check for inodes claiming duplicate blocks...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
- traversing filesystem ...
- traversal finished ...
- moving disconnected inodes to lost+found ...
Phase 7 - verify link counts...
No modify flag set, skipping filesystem flush and exiting.


Operate first on a file system image, if available
--------------------------------------------------------------------
Most file systems (EXT2, EXT3, EXT4, XFS etc.,) supports the creation of a metadata image (using "e2image" command or using "xfs_metadump" in case of XFS file system), a sparse copy of the file system which contains only metadata. Because file system checkers operate only on metadata, such an image can be used to perform a dry run (if available) of an actual file system repair, to evaluate what changes would actually be made. If the changes are acceptable, the repair can then be performed on the file system itself.

--> Using 'e2image' command (for ext file systems):

So, if there is a metadata dump stored which is in good condition could be used to repair corrupted file system structure as demonstrated below. Here, I'd take a backup file system metadata using "e2image" and then run file system check on the metadata dump and see how it works:
--> Taking image backup using "e2image" (need to use "r" option to take raw image backup of metadata):
Need to un-mount file system or mount read-only before proceeding....

[root@rhel200 ~]# mount -o remount,ro /boot

[root@rhel200 ~]# e2image -r /dev/sda1 /tmp/sda1.image.out
e2image 1.41.12 (17-May-2010)

[root@rhel200 ~]# du -sh /tmp/sda1.image.out
472K    /tmp/sda1.image.out

[root@rhel200 ~]# file /tmp/sda1.image.out
/tmp/sda1.image.out: Linux rev 1.0 ext4 filesystem data (extents) (huge files)

Now, we could run file system check on this image file instead of running it directly on actual file system:

[root@rhel200 ~]# e2fsck /tmp/sda1.image.out
e2fsck 1.41.12 (17-May-2010)
/tmp/sda1.image.out: clean, 39/128016 files, 66044/512000 blocks

This shows that up to the point of taking image backup the file system looks clean without errors.


--> Lets try to destroy the file system structure on the corresponding block device and later try to recover it using image file. So, step by step details are recorded here:
[root@managed1 ~]# dd if=/dev/zero of=/dev/sdb1 bs=1024 count=2
2+0 records in
2+0 records out
2048 bytes (2.0 kB) copied, 0.020512 s, 99.8 kB/s

[root@managed1 ~]# mount /dev/sdb1 /data
mount: /dev/sdb1 is write-protected, mounting read-only
mount: unknown filesystem type '(null)'

[root@managed1 ~]# mount /dev/sdb1 /data -t ext4
mount: wrong fs type, bad option, bad superblock on /dev/sdb1,
missing codepage or helper program, or other error
In some cases useful info is found in syslog - try
dmesg | tail or so.

[root@managed1 ~]# dumpe2fs /dev/sdb1
dumpe2fs 1.42.9 (28-Dec-2013)
dumpe2fs: Bad magic number in super-block while trying to open /dev/sdb1
Couldn't find valid filesystem superblock.

--> Now, try to restore image data using the file "/tmp/sdb1.image.out":
Man page of "e2image" :
RESTORING FILE SYSTEM METADATA USING AN IMAGE FILE
     
The -I option will cause e2image to install the metadata stored in the image file back to the device.  It can be used to  restore  the  filesystem   metadata back to the device in emergency situations.

WARNING!!!! The -I option should only be used as a desperation measure when other alternatives have failed.  If the filesystem has changed since the image file was created, data will be lost. In general, you should make a full image backup of the file system first, in  case  you  wish  to  try  other recovery strategies afterwards.

[root@managed1 ~]# e2image -I /dev/sdb1 /tmp/sdb1.image.out
e2image 1.42.9 (28-Dec-2013)
[root@managed1 ~]# echo $?
0
[root@managed1 ~]# e2image -I /dev/sdb1 /tmp/sdb1.image.out
e2image 1.42.9 (28-Dec-2013)
[root@managed1 ~]# mount /dev/sdb1 /data
[root@managed1 ~]# df -PTh /data
Filesystem     Type  Size  Used Avail Use% Mounted on
/dev/sdb1      ext4  190M   41M  136M  24% /data

<<< metadata file restored and device mounted, accessed successfully. This is only applicable if block device superblock is corrupted. There is no way to recover if disk is physically damaged or layout is damaged. >>>

--> Using 'xfs_metadump' & 'xfs_mdrestore' commands in xfs file system:

--> Taking image backup using "xfs_metadump":

[root@managed1 ~]# xfs_metadump /dev/sdb2 /tmp/sdb2.image.out
[root@managed1 ~]# du -sh /tmp/sdb2.image.out
3.5M    /tmp/sdb2.image.out
[root@managed1 ~]# file /tmp/sdb2.image.out
/tmp/sdb2.image.out: XFS filesystem metadump image

--> Destroy file system metadata and recover it later using image file:
[root@managed1 ~]# dd if=/dev/zero of=/dev/sdb2 bs=1024 count=2
2+0 records in
2+0 records out
2048 bytes (2.0 kB) copied, 0.0203458 s, 101 kB/s
[root@managed1 ~]# mount /dev/sdb2 /data2
mount: /dev/sdb2 is write-protected, mounting read-only
mount: unknown filesystem type '(null)'
[root@managed1 ~]# mount /dev/sdb2 /data2 -t xfs
mount: wrong fs type, bad option, bad superblock on /dev/sdb2,
missing codepage or helper program, or other error
In some cases useful info is found in syslog - try
dmesg | tail or so.

--> Now, try to restore image data using "xfs_mdrestore" from the  file "/tmp/sdb2.image.out":

[root@managed1 ~]# xfs_mdrestore /tmp/sdb2.image.out /dev/sdb2
[root@managed1 ~]# mount /dev/sdb2 /data2
[root@managed1 ~]# df -PTh /data2
Filesystem     Type  Size  Used Avail Use% Mounted on
/dev/sdb2      xfs   197M   11M  187M   6% /data2

Save a files system image for support investigations
----------------------------------------------------------------------------
A pre-repair file system metadata image can often be useful for support investigations if there is a possibility that the corruption was due to a software bug. Patterns of corruption present in the pre-repair image may aid in root-cause analysis. If possible, a regular practice of taking image backup or at least metadata backup would be a lifesaver.


Operate only on unmounted file systems
---------------------------------------------------------

Disk errors
-----------------
File system check tools cannot repair hardware problems. A file system must be fully readable and writable if repair is to operate successfully. If a file system was corrupted due to a hardware error, the file system must first be moved to a good disk, for example with the dd utility or in case of xfs the command "xfs_copy" could be used.

No comments: