Filesystem-Management im Linux

Voraussetzungen/Überblick
Eigenschaften von Journaling Filesystemen
Ext3
Ext4
XFS
Benchmarks - Zusammenfassung
Literatur und Verweise
- Bücher
- Links

Voraussetzungen/Überblick

Es sollte bekannt sein

prinzipieller Aufbau traditioneller Unix-Filesysteme
Superblock, Inodes, Blöcke, Blockgruppen, Verzeichnisse, Freilisten, Bitmaps, Mountpunkte, sync, fstab, ...
Tools zum Einrichten und zur Konsistenzprüfung

Schwerpunkt:

Journaling Filesysteme: ext3 und XFS in Verbindung mit LVM2 und Device Mapper

Interessant, hier jedoch (leider) nicht:

Journaling Filesysteme
- Namesys: Reiser4
- IBM: JFS
Cluster-Filesysteme
Crypto-Filesysteme
- eCryptfs
- ...
Verteilte / Netzwerkfilesysteme
- NFSv4
- OpenAFS

Eigenschaften von Journaling Filesystemen

Konsistenz traditioneller Filesysteme ist bei Crash gefährdet
Änderungen im Filesystem erfordern i.d.R. Schreibvorgänge an verschiedenen Stellen (Datenblöcke, Inodes, Verzeichniseinträge)
Gefahr: wird ein Vorgang nicht erfolgreich zu Ende geführt können schwere Inkonsistenzen entstehen
oftmals aufwendige Konsistenzprüfungen/-reparaturen notwendig
Journal/Log zeichnet Änderungen vor deren Ausführung auf
Ausführung wird im Journal als committed gekennzeichnet
geänderte Daten bleiben gültig bis Schreibvorgang beendet ist (Transaktion)
Arten

Metadaten-Journaling
sichert (nur) Konsistenz der Metadaten

Full-Journaling
sichert Konsistenz der Fileinhalte
Begriff Journaling wird meist als Synonym für Metadaten-Journaling verwendet
bei mount / fsck werden zunächst alle nicht als committed gekennzeichneten Journal-Einträge ausgeführt

Ext3

Eigenschaften

Erweiterung von ext2
- Journal
- Online-Resize
voll kompatibel zu ext2
- ext3-Filesystem kann als ext2-Filesystem montiert werden
- bestehendes ext2-Filesystem kann in ext3-Filesystem konvertiert werden (tune2fs -j)
verfügbar in allen zeitgemäßen Linux-Distributionen
Erzeugen
- mke2fs -j
Journal-Modi
- full
  - Mount-Option data=journal
  - Empfehlung: separates Journal-Device benutzen
  - mke2fs -J device=external-journal
    - external-journal muss vorher eingerichtet werden:
      mke2fs -O journal_dev external-journal
- ordered
  - Mount-Option data=ordered
  - Metadaten-Journal
  - geordnete Reihenfolge der Schreiboperationen
  - Garantie: Dateininhalte sind geschrieben, bevor Journal-Eintrag committed wird
  - default Modus
- writeback
  - Mount-Option data=writeback
  - Metadaten-Journal
  - Schreiboperationen entsprechend sync
  - Gefahr: Journal-Eintrag kann committed sein, bevor Daten geschrieben sind (out-of-order writes)
Journal-Modus kann für einzelne Files gesetzt werden
- chattr +j file
- Modus data=journal für file, auch wenn FS mit anderem Modus montiert ist
Kernel-Thread [kjournald] für jedes montierte ext3-Filesystem
Mount-Option commit=nrsec : sync für alle Daten und Metadaten alle nrsec Sekunden (default: 5 Sekunden)
Journal wird in einem File gehalten (ggf. separates Journal-Device)
Resize
- vergrößern: mounted/unmounted
- verkleinern: unmounted
erschwertes undelete: Block-Adressen im Inode werden bei unlink(2) mit NULL überschrieben
http://de.wikipedia.org/wiki/Ext3

Tools

Management
- mke2fs
  - Blocksize
    - 1K, 2K, 4K in Abhängigkeit der FS-Größe
    - Beobachtung: 4K für FS größer als 512MB
  - Anpassung an RAID-System
    - Option -E stride=stripe_size
    - stripe_size : Anzahl der FS-Blöcke pro Stripe des RAID-Systems
  - Anzahl der Inodes
    - Option -i byte-per-inode
    - ein Inode aller byte-per-inode Bytes im FS
    - nicht kleiner als Blocksize
    - alternativ: -N number_of_inodes
  - siehe auch /etc/mke2fs.conf
- resize2fs
Informationen, Tuning
- tune2fs, blkid, uuidgen, filefrag, findfs
Fehleranalyse, -reparatur
- badblocks, e2fsck, e2image, debugfs, dumpe2fs
Attribute Handling
- chattr, lsattr

Beispiel: FS anlegen

ext3 in Striped Logical Volume anlegen
Ausgangspunkt: Block-Devices /dev/sda, dev/sdb8
1. PVs initialisieren
2. VG erzeugen
3. LV erzeugen
4. FS anlegen
  - Blocksize=4K, Stripesize=64K, also 16 FS-Blöcke pro Stripe
5. FS montieren
Benchmark bonnie++

# pvcreate /dev/sda /dev/sdb8
  Physical volume "/dev/sda" successfully created
  Physical volume "/dev/sdb8" successfully created

# vgcreate t_local_01 /dev/sda /dev/sdb8
  Volume group "t_local_01" successfully created

# lvcreate -L 20G -i 2 -n stripe01 t_local_01
  Using default stripesize 64,00 KB
  Logical volume "stripe01" created

# mke2fs -j -L stripe01 -E stride=16 /dev/t_local_01/stripe01
mke2fs 1.39 (29-May-2006)
Dateisystem-Label=stripe01
OS-Typ: Linux
Blockgröße=4096 (log=2)
Fragmentgröße=4096 (log=2)
2621440 Inodes, 5242880 Blöcke
262144 Blöcke (5.00%) reserviert für den Superuser
erster Datenblock=0
Maximum filesystem blocks=0
160 Blockgruppen
32768 Blöcke pro Gruppe, 32768 Fragmente pro Gruppe
16384 Inodes pro Gruppe
Superblock-Sicherungskopien gespeichert in den Blöcken:
        32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
        4096000

Schreibe Inode-Tabellen: erledigt
Erstelle Journal (32768 Blöcke): erledigt
Schreibe Superblöcke und Dateisystem-Accountinginformationen: erledigt

Das Dateisystem wird automatisch alle 31 Mounts bzw. alle 180 Tage überprüft,
je nachdem, was zuerst eintritt. Veränderbar mit tune2fs -c oder -t .

# tune2fs -l /dev/t_local_01/stripe01
tune2fs 1.39 (29-May-2006)
Filesystem volume name:   stripe01
Last mounted on:          <not available>
Filesystem UUID:          813b3521-bfbf-4162-a8c2-45d30e9a3d51
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal resize_inode dir_index filetype sparse_super large_file
Default mount options:    (none)
Filesystem state:         clean
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              2621440
Block count:              5242880
Reserved block count:     262144
Free blocks:              5116557
Free inodes:              2621429
First block:              0
Block size:               4096
Fragment size:            4096
Reserved GDT blocks:      1022
Blocks per group:         32768
Fragments per group:      32768
Inodes per group:         16384
Inode blocks per group:   512
Filesystem created:       Tue Jun 19 14:08:30 2007
Last mount time:          n/a
Last write time:          Tue Jun 19 14:08:32 2007
Mount count:              0
Maximum mount count:      31
Last checked:             Tue Jun 19 14:08:30 2007
Check interval:           15552000 (6 months)
Next check after:         Sun Dec 16 13:08:30 2007
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:               128
Journal inode:            8
Default directory hash:   tea
Directory Hash Seed:      427bbaa6-4b11-45f6-ba80-f805302b5372
Journal backup:           inode blocks

# mkdir /mnt/stripe01

# mount /dev/t_local_01/stripe01 /mnt/stripe01

# df -k /mnt/stripe01
Dateisystem          1K-Blöcke   Benutzt Verfügbar Ben% Eingehängt auf
/dev/mapper/t_local_01-stripe01
                      20642428    176200  19417652   1% /mnt/stripe01

# cd /mnt/stripe01
# bonnie++ -u root
Using uid:0, gid:0.
Writing with putc()...done
Writing intelligently...done
Rewriting...done
Reading with getc()...done
Reading intelligently...done
start 'em...done...done...done...
Create files in sequential order...done.
Stat files in sequential order...done.
Delete files in sequential order...done.
Create files in random order...done.
Stat files in random order...done.
Delete files in random order...done.
Version  1.03       ------Sequential Output------ --Sequential Input- --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
eldorado         4G 38903  96 95129  51 38720  26 44898  92 99621  25 185.3   1
                    ------Sequential Create------ --------Random Create--------
                    -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++
eldorado,4G,38903,96,95129,51,38720,26,44898,92,99621,25,185.3,1,\\
            16,+++++,+++,+++++,+++,+++++,+++,+++++,+++,+++++,+++,+++++,+++

# cp -pr /usr/bin /mnt/stripe01

# df -k /mnt/stripe01
Dateisystem          1K-Blöcke   Benutzt Verfügbar Ben% Eingehängt auf
/dev/mapper/t_local_01-stripe01
                      20642428    723364  18870488   4% /mnt/stripe01
# lvs
  LV       VG         Attr   LSize  Origin Snap%  Move Log Copy%
  stripe01 t_local_01 -wi-a- 20,00G
# vgs
  VG         #PV #LV #SN Attr   VSize   VFree
  t_local_01   2   1   0 wz--n- 205,46G 185,46G
# pvs
  PV         VG         Fmt  Attr PSize   PFree
  /dev/sda   t_local_01 lvm2 a-    74,53G  64,53G
  /dev/sdb8  t_local_01 lvm2 a-   130,93G 120,93G
#

Beispiel: FS online erweitern

oben erzeugtes Filesystem online um 10 GB erweitern
1. erweitern des Logical Volumes (hier trivial, da PVs ausreichend freie PEs besitzen)
2. erweitern des Filesystems

# lvextend -L +10G t_local_01/stripe01
  Using stripesize of last segment 64,00 KB
  Extending logical volume stripe01 to 30,00 GB
  Logical volume stripe01 successfully resized

# resize2fs /dev/t_local_01/stripe01
resize2fs 1.39 (29-May-2006)
Filesystem at /dev/t_local_01/stripe01 is mounted on /mnt/stripe01; on-line resizing required
Performing an on-line resize of /dev/t_local_01/stripe01 to 7864320 (4k) blocks.
The filesystem on /dev/t_local_01/stripe01 is now 7864320 blocks long.

# df -k /mnt/stripe01
Dateisystem          1K-Blöcke   Benutzt Verfügbar Ben% Eingehängt auf
/dev/mapper/t_local_01-stripe01
                      30963708    723364  28667608   3% /mnt/stripe01

Beispiel: FS und LV verkleinern

verkleinern auf 1 GB
1. FS demontieren
2. Konsistenzcheck des FS
3. FS verkleinern
4. FS montieren
5. LV verkleinern

# resize2fs /dev/t_local_01/stripe01 1G
resize2fs 1.39 (29-May-2006)
Filesystem at /dev/t_local_01/stripe01 is mounted on /mnt/stripe01; on-line resizing required
On-line shrinking from 7864320 to 262144 not supported.

# umount /mnt/stripe01

# resize2fs /dev/t_local_01/stripe01 1G
resize2fs 1.39 (29-May-2006)
Bitte zuerst 'e2fsck -f /dev/t_local_01/stripe01 ' laufen lassen.

# e2fsck -f /dev/t_local_01/stripe01
e2fsck 1.39 (29-May-2006)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
stripe01: 3916/3932160 files (0.2% non-contiguous), 304234/7864320 blocks

# resize2fs /dev/t_local_01/stripe01 1G
resize2fs 1.39 (29-May-2006)
Resizing the filesystem on /dev/t_local_01/stripe01 to 262144 (4k) blocks.
The filesystem on /dev/t_local_01/stripe01 is now 262144 blocks long.

# mount /dev/t_local_01/stripe01 /mnt/stripe01

# df -k /mnt/stripe01
Dateisystem          1K-Blöcke   Benutzt Verfügbar Ben% Eingehängt auf
/dev/mapper/t_local_01-stripe01
                       1032088    698856    291292  71% /mnt/stripe01
# lvreduce -L -29G t_local_01/stripe01
  WARNING: Reducing active and open logical volume to 1,00 GB
  THIS MAY DESTROY YOUR DATA (filesystem etc.)
Do you really want to reduce stripe01? [y/n]: y
  Reducing logical volume stripe01 to 1,00 GB
  Logical volume stripe01 successfully resized

# vgs
  VG         #PV #LV #SN Attr   VSize   VFree
  t_local_01   2   1   0 wz--n- 205,46G 204,46G
#

Ext4

ab Kernel 2.6.19
in Entwicklung / experimentell
Eigenschaften:
- ext3-Limits überwinden, z.B.
  - FS-Größe bis zu 1EB
  - mehr als 32000 Verzeichniseinträge
- Verwendung von extents (kontinuierliche Folge von Datenblöcken), vgl. XFS
kompatibel zu ext3 (solange keine extents benutzt werden)
- neue Features

XFS

Historie, Verfügbarkeit

von SGI
- entwickelt für IRIX
- portiert nach Linux
- seit Kernel 2.6 direkt integriert
Support in SLES 10
enabled in Fedora Core
disabled in RHEL
- ... und in Scientific Linux, ...
für Scientific Linux verfügbar in contrib area
- SL 4.4: z.B.: http://ftp.tu-chemnitz.de/pub/linux/scientific/44/i386/contrib/RPMS/xfs/
- SL 5.0: SRPMs von CentOS: http://ftp.tu-chemnitz.de/pub/linux/centos/5.0/centosplus/SRPMS/

Eigenschaften

Limits
- 32bit Linux
  - max. Filegröße: 16TB
  - max. Filesystemgröße: 16TB
- 64bit Linux
  - max. Filegröße: 9EB
  - max. Filesystemgröße: 18EB
Journaling: Quick Recovery
- asynchrone Logs (nur Metadaten)
- Journal ist zentraler Bestandteil und kein Add-On
- fsck.xfs: existiert nur, um Namenskonvention zu erfüllen, führt exit(0) aus
- replay des Journals bei mount
Fast Transactions
- Baum-Strukturen (B+ tree) für Verzeichnisinhalte, Frei-Listen, Extent-Listen eines Files, Meta-Daten
- zeitoptimierte Journal-Strukturen und -algorithmen
- Transaktion:
  - Folge von Meta-Daten-Änderungen
  - eine logische Operation im Filesystem
  - Konsistenz nach jeder Transaktion
- Log:
  - geordnete Menge von Transaktionen
  - organisiert in zirkularer Liste
  - write ahead (transaction, meta-data)
  - in-core log buffer (2-8)
  - on-disk log (write only) - intern oder separate disk
Online Administration
- Vergrößern
- Unterstützung von Snapshots
  - FS-Konsistenz im Snapshot sichern
  - filesystem "freeze" (einfrieren) - beenden aller FS-Operationen
  - snapshot anfertigen
  - filesystem "thaw" (auftauen)
POSIX ACLs
Quota
- user/group/project quota
- soft/hard limits
Support für HSM
- DMAPI/XDSM-Support (Data Management API)
extended file attributes
- 64KB beliebige Binärdaten
- Name/Werte-Paare
- user: geschützt durch file permissions
- system: nur root (z.B. geschützte Meta-Daten: ACLs, migration status eines HSM-Systems)
- security: (SELinux)
Backup/Restore
- spezielle Tools
- einschließlich quota und extented file attributes
- endian neutral (dumps sind austauschbar zwischen verschiedenen Plattformen)
Realtime Subvolume
- spezieller space allocator für realtime Anwendungen (kontinuierlicher Datenfluss)
- ersetzt default allocator
- im Linux nur experimentell
  
  siehe Manual xfs_rtcp
  Currently, realtime partitions are not supported under the Linux version of XFS, and use of a realtime partition WILL CAUSE CORRUPTION on the data partition. As such, this command is made available for curious DEVELOPERS ONLY at this point in time.

Aufbau von XFS

Sections

data section
- Filesystem-Daten und Meta-Daten
log
- Änderungen der Meta-Daten (transaktionsorientiert)
- im normalen Betrieb: write-only
- wird nur bei mount gelesen
real-time section (optional)
- für Files, die konstante I/O-Rate erfordern

Allocation Groups

data section ist unterteilt in Allocation Groups (AG)
Basis für Vergrößerung eines Filesystems
vergleichbar Zylindergruppe in anderen Filesystemen
pro AG existieren
- Superblock
- Block- und Inode Freilisten
- Block- und Inode Zuordnung
ermöglicht Parallelität innerhalb eines Filesystems
Größe: 16 MB ... 1TB
Files können sich über mehrere AGs ausdehnen

Blöcke / Inodes

Vielfaches von Sektoren (gerätespezifisch)
max. Größe gebunden an Page Management des Kernels (max block size = page size)
- ia32/x86_64 : 4KB
- ia64: 16KB
Inodes
- 256 Byte ... 2KB
- können Inhalte aufnehmen (File-/Directorydaten, symb. Link)

Extents

herkömmliche Unix-Filesysteme verweisen aus dem Inode auf einzelne direkt oder indirekt adressierte Datenblöcke
im XFS verweisen Inodes auf sog. Extents
Extent: kontinuierliche Folge von Datenblöcken
höhere Performance
im Idealfall: ein Extent pro File
unwritten extents
- markiert als "not yet written"
- werden für preallocation von file space benutzt

Tools

attr                 (1)  - extended attributes on XFS filesystem objects
fsck.xfs [fsck]      (8)  - do nothing, successfully
xfs_admin            (8)  - change parameters of an XFS filesystem
xfs_bmap             (8)  - print block mapping for an XFS file
xfs_check            (8)  - check XFS filesystem consistency
xfs_copy             (8)  - copy the contents of an XFS filesystem
xfs_db               (8)  - debug an XFS filesystem
xfs_freeze           (8)  - suspend access to an XFS filesystem
xfs_growfs           (8)  - expand an XFS filesystem
xfs_info [xfs_growfs] (8)  - expand an XFS filesystem
xfs_io               (8)  - debug the I/O path of an XFS filesystem
xfs_logprint         (8)  - print the log of an XFS filesystem
xfs_mkfile           (8)  - create an XFS file
xfs_ncheck           (8)  - generate pathnames from i-numbers for XFS
xfs_quota            (8)  - manage use of quota on XFS filesystems
xfs_repair           (8)  - repair an XFS filesystem
xfs_rtcp             (8)  - XFS realtime copy command
xfsdq, xfsrq         (8)  - XFS dump and restore quota
xfsdump              (8)  - XFS filesystem incremental dump utility
xfsinvutil           (8)  - xfsdump inventory database checking and pruning utility
xfsrestore           (8)  - XFS filesystem incremental restore utility

XFS Administration

mkfs.xfs / mkfs -t xfs versteht zahlreiche Parameter zum Layout
- blocksize
- Anzahl / Größe der AG
- Stripe Alignment
- Log-Device
- default Annahmen ergeben sich aus Geräte-Parametern
vorhandenes FS analysieren mit xfs_info
- xfs_info entspricht xfs_growfs -n
- FS muss montiert sein
Parameter ändern/anzeigen mit xfs_admin
- benutzt xfs_db
- Parameter: Label, UUID, Journal-Version, ...
Konsistenz-Check mit xfs_check
- nur erforderlich, wenn man vermutet, dass Inkonsistenzen bestehen
- FS sollte nicht montiert sein (allenfalls read-only)
Reparieren mit xfs_repair
- FS darf nicht montiert sein
- andere Code-Basis als xfs_check
- Fehlersuche vergleichen: xfs_check mit xfs_repair -n
- sieben Phasen
- jede Phase vertraut darauf, dass vorhergehende Phase Fehler erfolgreich reparieren konnte
  1. Superblöcke finden, prüfen, korrigieren
  2. Allocation Group Header
  3. Inode-Trees, benutzte Datenblöcke
  4. Inodes
  5. Rebuild AG Header
  6. Directories, nicht zuordenbare Inodes nach=lost+found=
  7. Linkcounts der Inodes

Stripe Alignment

Option für mkfs.xfs
- -d sunit=sunitsize,swidth=swidthsize oder
- -d su=su_value,sw=sw_value
sunitsize : Größe eines Stripes in 512Byte-Vielfachen
swidthsize : Stripebreite (Anzahl der Stripes (= Anzahl der Platten/PVs) * sunitsize )
su_value: Größe eines Stripes in units (Byte, KB, ...)
sw_value: Anzahl der Stripes
Parameter werden von LVM - Striped Volumes automatisch übernommen
sinnvoll: für HW-RAID angeben
Beispiel:
- die Aufrufe von mkfs.xfs führen zum selben Ergebnis
  (64KB = 128 * 512 Byte = 64 * 1KB = 16 * 4096 Byte)
- beachten: die Ausgabe ist an Blockgröße orientiert
- Stripesize: 64KB = 128 * 512 Byte = 64 * 1KB = 16 * 4096 Byte
- Stripewidth: 128KB = 256 * 512 Byte = 2 * 64KB = 32 * 4096 Byte)

# lvcreate -i 2 -L16G t_local_01
  Using default stripesize 64,00 KB
  Logical volume "lvol5" created

# mkfs.xfs -N -dsunit=128,swidth=256 /dev/t_local_01/lvol5
meta-data=/dev/t_local_01/lvol5  isize=256    agcount=16, agsize=262128 blks
         =                       sectsz=512   attr=0
data     =                       bsize=4096   blocks=4194048, imaxpct=25
         =                       sunit=16     swidth=32 blks, unwritten=1
naming   =version 2              bsize=4096
log      =internal log           bsize=4096   blocks=2560, version=1
         =                       sectsz=512   sunit=0 blks
realtime =none                   extsz=131072 blocks=0, rtextents=0

# mkfs.xfs -N -d su=64k,sw=2 /dev/t_local_01/lvol5
meta-data=/dev/t_local_01/lvol5  isize=256    agcount=16, agsize=262128 blks
         =                       sectsz=512   attr=0
data     =                       bsize=4096   blocks=4194048, imaxpct=25
         =                       sunit=16     swidth=32 blks, unwritten=1
naming   =version 2              bsize=4096
log      =internal log           bsize=4096   blocks=2560, version=1
         =                       sectsz=512   sunit=0 blks
realtime =none                   extsz=131072 blocks=0, rtextents=0

# mkfs.xfs -N /dev/t_local_01/lvol5
meta-data=/dev/t_local_01/lvol5  isize=256    agcount=16, agsize=262128 blks
         =                       sectsz=512   attr=0
data     =                       bsize=4096   blocks=4194048, imaxpct=25
         =                       sunit=16     swidth=32 blks, unwritten=1
naming   =version 2              bsize=4096
log      =internal log           bsize=4096   blocks=2560, version=1
         =                       sectsz=512   sunit=0 blks
realtime =none                   extsz=131072 blocks=0, rtextents=0

während mkfs.xfs vorgenomme Parametrisierung bei mount als Optionen übernehmen (siehe unten)

Log-Device

Option für mkfs.xfs: -l subopt1[,subopt2,...]
- internal
- logdevice=/path
- size=value
  - Größe des internal Logs wird automatisch aus der Größe des Filesystems bestimmt
  - separates Log Device wird komplett genutzt (max. 128MB)
- sunit/su=value
  - siehe oben
separates Log Device: beste verfügbare Platte
während mkfs.xfs vorgenomme Parametrisierung bei mount als Optionen übernehmen (siehe unten)

Beispiel: XFS in Striped LVM Volume

Setup
1. Striped Volume
  - 2 PVs
  - Stripesize: 64KB
2. XFS anlegen
  - internal log
3. FS montieren
  - Stripesize und -breite in 512Byte-Blöcken
Benchmark bonnie++

# pvcreate /dev/sda /dev/sdb8
  Physical volume "/dev/sda" successfully created
  Physical volume "/dev/sdb8" successfully created

# vgcreate t_local_01 /dev/sda /dev/sdb8
  Volume group "t_local_01" successfully created

# lvcreate -L 50G -i 2 t_local_01
  Using default stripesize 64,00 KB
  Logical volume "lvol0" created

# mkfs.xfs /dev/t_local_01/lvol0
meta-data=/dev/t_local_01/lvol0  isize=256    agcount=16, agsize=819184 blks
         =                       sectsz=512   attr=0
data     =                       bsize=4096   blocks=13106944, imaxpct=25
         =                       sunit=16     swidth=32 blks, unwritten=1
naming   =version 2              bsize=4096
log      =internal log           bsize=4096   blocks=6400, version=1
         =                       sectsz=512   sunit=0 blks
realtime =none                   extsz=131072 blocks=0, rtextents=0

# mkdir /mnt/striped

# mount -o sunit=128,swidth=256 /dev/t_local_01/lvol0 /mnt/striped

# cd /mnt/striped
# bonnie++ -u root
Using uid:0, gid:0.
Writing with putc()...done
Writing intelligently...done
Rewriting...done
Reading with getc()...done
Reading intelligently...done
start 'em...done...done...done...
Create files in sequential order...done.
Stat files in sequential order...done.
Delete files in sequential order...done.
Create files in random order...done.
Stat files in random order...done.
Delete files in random order...done.
Version  1.03       ------Sequential Output------ --Sequential Input- --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
eldorado         4G 45206  98 106983  49 47081  29 38910  82 107094  27 265.1   1
                    ------Sequential Create------ --------Random Create--------
                    -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16  6220  58 +++++ +++  8341  66  6549  63 +++++ +++  4250  42
eldorado,4G,45206,98,106983,49,47081,29,38910,82,107094,27,265.1,1,\\
            16,6220,58,+++++,+++,8341,66,6549,63,+++++,+++,4250,42
#

Beispiel: XFS in verschlüsseltem LVM Mirror Volume auf multipath iSCSI-RAID-Devices mit separatem Log Device in Striped LVM Volume auf lokalen PVs

Ressourcen
- zwei RAID-Devices im SAN
  - RAID Level 6 (13+2)
  - Stripesize: 128 KB
  - jeweils zwei I/O-Pfade
  - iSCSI-Gateway (Cisco MDS 9506)
- separates Log Device
  - Striped LVM Volume auf lokalen PVs
  - Stripesize: 64KB
Setup
1. Striped Volume für Log Device einrichten
2. Mirrored Volume für Daten anlegen
  - Mirror Log im RAM (sonst: weiteres PV erforderlich)
3. verschlüsseltes Device via LUKS (Linux Unified Key Setup) einrichten
  1. Container anlegen, Keys ablegen
  2. Container über Device Mapper einbinden
4. XFS anlegen
5. FS montieren
bonnie++

# pvcreate /dev/sda /dev/sdb8
  Physical volume "/dev/sda" successfully created
  Physical volume "/dev/sdb8" successfully created
# vgcreate t_local_01 /dev/sda /dev/sdb8
  Volume group "t_local_01" successfully created
[root@eldorado /]# lvcreate -n xfslog01 -L128M -i 2 t_local_01
  Using default stripesize 64,00 KB
  Logical volume "xfslog01" created
# # lvdisplay -m
  --- Logical volume ---
  LV Name                /dev/t_local_01/xfslog01
  VG Name                t_local_01
  LV UUID                jWWXOR-F9QS-9l5d-7Shl-NssV-Omj2-loZ2xz
  LV Write Access        read/write
  LV Status              available
  # open                 0
  LV Size                128,00 MB
  Current LE             32
  Segments               1
  Allocation             inherit
  Read ahead sectors     0
  Block device           253:0

  --- Segments ---
  Logical extent 0 to 31:
    Type                striped
    Stripes             2
    Stripe size         64 KB
    Stripe 0:
      Physical volume   /dev/sdb8
      Physical extents  0 to 15
    Stripe 1:
      Physical volume   /dev/sda
      Physical extents  0 to 15

# multipath -l
mpath2 (3600d02300069ca1009c18e2f8377ac01) dm-11 IFT,A16F-G2422
[size=1.9T][features=0][hwhandler=0]
\_ round-robin 0 [prio=0][active]
 \_ 16:0:0:2 sdc 8:32  [active][undef]
\_ round-robin 0 [prio=0][enabled]
 \_ 15:0:0:2 sdg 8:96  [active][undef]
mpath4 (3600d0230006c1bef0c05093799154b06) dm-12 IFT,A24F-G2224-1
[size=2.0T][features=0][hwhandler=0]
\_ round-robin 0 [prio=0][active]
 \_ 14:0:0:6 sdf 8:80  [active][undef]
\_ round-robin 0 [prio=0][enabled]
 \_ 12:0:0:6 sde 8:64  [active][undef]
# pvcreate /dev/mpath/mpath2 /dev/mpath/mpath4
  Physical volume "/dev/mpath/mpath2" successfully created
  Physical volume "/dev/mpath/mpath4" successfully created
# lvcreate -m 1 --corelog -L100G --nosync t_iscsi_01
  WARNING: New mirror won't be synchronised. Don't read what you didn't write!
  Logical volume "lvol0" created

# lvs
  LV       VG         Attr   LSize   Origin Snap%  Move Log Copy%
  lvol0    t_iscsi_01 Mwi-a- 100,00G                        100,00
  xfslog01 t_local_01 -wi-a- 128,00M

# cryptsetup luksFormat -y -c aes-cbc-essiv:sha256 /dev/t_iscsi_01/lvol0

WARNING!
========
Daten auf /dev/t_iscsi_01/lvol0 werden unwiderruflich überschrieben.

Are you sure? (Type uppercase yes): YES
Enter LUKS passphrase:
Verify passphrase:
Command successful.

# cryptsetup luksOpen /dev/t_iscsi_01/lvol0 crypted_mirror
Enter LUKS passphrase:
key slot 0 unlocked.
Command successful. 

# dmsetup table
t_iscsi_01-lvol0: 0 209715200 mirror core 3 1024 nosync block_on_error 2 253:1 0 253:2 0
mpath2: 0 4096000000 multipath 0 0 2 1 round-robin 0 1 1 8:32 1000 round-robin 0 1 1 8:96 1000
t_local_01-xfslog01: 0 262144 striped 2 128 8:24 384 8:0 384
t_iscsi_01-lvol0_mimage_1: 0 209715200 linear 253:11 384
t_iscsi_01-lvol0_mimage_0: 0 209715200 linear 253:12 384
crypted_mirror: 0 209714168 crypt aes-cbc-essiv:sha256 00000000000000000000000000000000 0 253:3 1032
mpath4: 0 4294967295 multipath 0 0 2 1 round-robin 0 1 1 8:80 1000 round-robin 0 1 1 8:64 1000

# mkfs.xfs -d su=128k,sw=13 -l logdev=/dev/t_local_01/xfslog01,su=64k /dev/mapper/crypted_mirror
log stripe unit specified, using v2 logs
meta-data=/dev/mapper/crypted_mirror isize=256    agcount=16, agsize=1638400 blks
         =                       sectsz=512   attr=0
data     =                       bsize=4096   blocks=26214256, imaxpct=25
         =                       sunit=32     swidth=416 blks, unwritten=1
naming   =version 2              bsize=4096
log      =/dev/t_local_01/xfslog01 bsize=4096   blocks=32768, version=2
         =                       sectsz=512   sunit=16 blks
realtime =none                   extsz=4096   blocks=0, rtextents=0
# mkdir /mnt/crypted_mirror
# mount -o sunit=256,swidth=3328,logdev=/dev/t_local_01/xfslog01 /dev/mapper/crypted_mirror /mnt/crypted_mirror
# cd /mnt/crypted_mirror
# bonnie++ -u root
Using uid:0, gid:0.
Writing with putc()...done
Writing intelligently...done
Rewriting...done
Reading with getc()...done
Reading intelligently...done
start 'em...done...done...done...
Create files in sequential order...done.
Stat files in sequential order...done.
Delete files in sequential order...done.
Create files in random order...done.
Stat files in random order...done.
Delete files in random order...done.
Version  1.03       ------Sequential Output------ --Sequential Input- --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
eldorado         4G 28230  66 43016  11 11608   3 16159  38 17891   2 380.0   2
                    ------Sequential Create------ --------Random Create--------
                    -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16  8609  52 +++++ +++  8485  48  9496  60 +++++ +++  5911  45
eldorado,4G,28230,66,43016,11,11608,3,16159,38,17891,2,380.0,2,\\
            16,8609,52,+++++,+++,8485,48,9496,60,+++++,+++,5911,45

Benchmarks - Zusammenfassung

nicht überbewerten
- nur ein Test pro Setup
- Maschine wurde während Test benutzt
- zweiter Teil vermuitlich vollständig im Buffer-Cache, Aussagen sind daher wertlos
Benchmark bonnie++: http://www.coker.com.au/bonnie++/

Version  1.03      ------Sequential Output------ --Sequential Input- --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
linear-ext3      4G 35921  90 54807  37 23090  15 28292  69 57133  12 156.4   1
striped-ext3     4G 40288  97 93582  50 36063  23 45096  90 92818  23 226.8   1
mirrored-ext3    4G 39075  96 45241  30 24082  17 40536  82 53966  11 152.6   1
striped-xfs      4G 45206  98 106983 49 47081  29 38910  82 107094 27 265.1   1
crypt_mirror-xfs 4G 28230  66 43016  11 11608   3 16159  38 17891   2 380.0   2
                    ------Sequential Create------ --------Random Create--------
                    -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files:max:min        /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
linear-ext3      16 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++
striped-ext3     16 29324  98 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++
mirrored-ext3    16 25161  63 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++
striped-xfs      16  6220  58 +++++ +++  8341  66  6549  63 +++++ +++  4250  42
crypt_mirror-xfs 16  8609  52 +++++ +++  8485  48  9496  60 +++++ +++  5911  45

Literatur und Verweise

Bücher

William von Hagen:
Linux Filesystems
Sams Publishing, 2002, ISBN 0-672-32272-2

Filesystem-Management im Linux

Voraussetzungen/Überblick

Eigenschaften von Journaling Filesystemen

Ext3

Eigenschaften

Tools

Beispiel: FS anlegen

Beispiel: FS online erweitern

Beispiel: FS und LV verkleinern

Ext4

XFS

Historie, Verfügbarkeit

Eigenschaften

Aufbau von XFS

Sections

Allocation Groups

Blöcke / Inodes

Extents

Tools

XFS Administration

Stripe Alignment

Log-Device

Beispiel: XFS in Striped LVM Volume

Beispiel: XFS in verschlüsseltem LVM Mirror Volume auf multipath iSCSI-RAID-Devices mit separatem Log Device in Striped LVM Volume auf lokalen PVs

Benchmarks - Zusammenfassung

Literatur und Verweise

Bücher

Links

„Gezähmte“ Moleküle für nachhaltigere Katalysatoren

Aus #wirsindchemnitz wird ZUSAMMENSTEHEN #TUCgether

„O-Phase“ bereitet auf das Studium vor

Verlängerter Einschreibungszeitraum für das Wintersemester und weitere Bewerbungsmöglichkeiten für drei zulassungsbeschränkte Studiengänge

Mathematik 27.09.2024 Fractal Geometry and Stochastics 7

Zentrale Studienberatung 15.10.2024 Campustage – "Studieren probieren an der TU Chemnitz in den Herbstferien"

TU Chemnitz 22.10.2024 IMMATRIKULATIONS- UND AUFTAKTFEIER 2024/2025

Hochschuldidaktik Sachsen 07.11.2024 TUCdigital – Tag der digitalen Hochschulbildung 2024

Mathematik 07.11.2024 Kulturgut Mathematik und Logistik

Zentrum für den wissenschaftlichen Nachwuchs 28.11.2024 11. Tag des wissenschaftlichen Nachwuchses (TdwN)

Filesystem-Management im Linux

Voraussetzungen/Überblick

Eigenschaften von Journaling Filesystemen

Ext3

Eigenschaften

Tools

Beispiel: FS anlegen

Beispiel: FS online erweitern

Beispiel: FS und LV verkleinern

Ext4

XFS

Historie, Verfügbarkeit

Eigenschaften

Aufbau von XFS

Sections

Allocation Groups

Blöcke / Inodes

Extents

Tools

XFS Administration

Stripe Alignment

Log-Device

Beispiel: XFS in Striped LVM Volume

Beispiel: XFS in verschlüsseltem LVM Mirror Volume auf multipath iSCSI-RAID-Devices mit separatem Log Device in Striped LVM Volume auf lokalen PVs

Benchmarks - Zusammenfassung

Literatur und Verweise

Bücher

Links

TUCaktuell

„Gezähmte“ Moleküle für nachhaltigere Katalysatoren

Aus #wirsindchemnitz wird ZUSAMMENSTEHEN #TUCgether

„O-Phase“ bereitet auf das Studium vor

Verlängerter Einschreibungszeitraum für das Wintersemester und weitere Bewerbungsmöglichkeiten für drei zulassungsbeschränkte Studiengänge

Veranstaltungen & Tipps

Mathematik 27.09.2024 27 Sep Fractal Geometry and Stochastics 7

Zentrale Studienberatung 15.10.2024 15 Okt Campustage – "Studieren probieren an der TU Chemnitz in den Herbstferien"

TU Chemnitz 22.10.2024 22 Okt IMMATRIKULATIONS- UND AUFTAKTFEIER 2024/2025

Hochschuldidaktik Sachsen 07.11.2024 07 Nov TUCdigital – Tag der digitalen Hochschulbildung 2024

Mathematik 07.11.2024 07 Nov Kulturgut Mathematik und Logistik

Zentrum für den wissenschaftlichen Nachwuchs 28.11.2024 28 Nov 11. Tag des wissenschaftlichen Nachwuchses (TdwN)

Soziale Medien

Mathematik 27.09.2024 Fractal Geometry and Stochastics 7

Zentrale Studienberatung 15.10.2024 Campustage – "Studieren probieren an der TU Chemnitz in den Herbstferien"

TU Chemnitz 22.10.2024 IMMATRIKULATIONS- UND AUFTAKTFEIER 2024/2025

Hochschuldidaktik Sachsen 07.11.2024 TUCdigital – Tag der digitalen Hochschulbildung 2024

Mathematik 07.11.2024 Kulturgut Mathematik und Logistik

Zentrum für den wissenschaftlichen Nachwuchs 28.11.2024 11. Tag des wissenschaftlichen Nachwuchses (TdwN)