How to Use tar and gzip for Archiving Files

Learn to use tar and gzip to create, compress, and extract archives on Linux with clear, step-by-step commands and examples. Ideal for DevOps and Linux beginners managing files and backups.

How to Use tar and gzip for Archiving Files

Short introduction

tar and gzip are the classic combination for packaging and compressing files on Unix-like systems. tar bundles multiple files and directories into a single archive, while gzip compresses that archive to save space. This guide shows practical commands, explanations, and examples so you can confidently create, inspect, and extract tar.gz files.

Understanding tar and gzip

What many beginners find confusing at first is that tar and gzip do two different jobs:

  • tar (tape archive) packages many files and directories into one file (commonly with a .tar extension).
  • gzip compresses a single file (commonly producing .gz).

You can use tar alone to create a .tar archive, gzip alone to compress a single file, or combine them so you end up with a compressed archive named something.tar.gz (or .tgz).

Example: create a .tar then compress it with gzip

# create a tar archive of "project" directory
tar -cf project.tar project/

# compress the tar file with gzip
gzip project.tar

# result: project.tar.gz
ls -l project.tar.gz

Or the most common shortcut: let tar call gzip for you (-z)

# create a gzip-compressed tarball in one step
tar -czf project.tar.gz project/

Flags explained (brief)

  • -c create, -x extract, -t list
  • -f FILE specify archive filename
  • -z filter through gzip
  • -v verbose output file names
  • -C change to directory (useful when extracting)

Creating archives with tar and gzip

Use tar to create archives and include options for verbosity, compression, and excluding files.

Basic examples

# create a compressed tarball with verbose output
tar -czvf backup-2025-10-01.tar.gz /home/you/documents

# same but without verbose (quieter)
tar -czf backup-2025-10-01.tar.gz /home/you/documents

Exclude patterns and controlling paths

# exclude node_modules and .git directories
tar -czf site.tar.gz --exclude='site/node_modules' --exclude='site/.git' site/

# create an archive without storing absolute paths (recommended)
cd /var/www
tar -czf /tmp/www.tar.gz html/ logs/

Using gzip compression levels (via gzip)

# default compression (balanced)
tar -czf archive.tar.gz dir/

# use gzip level 9 (max compression)
tar -I 'gzip -9' -cf archive.tar.gz dir/

# or compress after creating tar
tar -cf - dir/ | gzip -9 > archive.tar.gz

Notes:

  • Using tar -z is convenient. For more control, pipe tar into gzip and pass gzip options (like -9) explicitly or use -I with GNU tar.

Extracting, listing, and testing archives

You’ll often want to peek at an archive before extracting, extract to a specific directory, or test integrity.

Listing contents

# list files inside a tar.gz without extracting
tar -tzvf archive.tar.gz

# show only names (less verbose)
tar -tzf archive.tar.gz

Extracting

# extract into current directory
tar -xzf archive.tar.gz

# extract to a specific directory
tar -xzf archive.tar.gz -C /tmp/extracted

# extract a single file from an archive
tar -xzf archive.tar.gz path/inside/archive/file.txt

Testing integrity

# test gzip integrity (does not fully decompress to disk)
gzip -t archive.tar.gz && echo "gzip OK" || echo "gzip corrupt"

# for tar.gz, you can try listing to detect obvious problems
tar -tzf archive.tar.gz > /dev/null && echo "tar OK" || echo "tar failed"

Tip: If extraction complains about permissions, you might need sudo (or adjust tar flags). To preserve permissions when creating archives, tar normally keeps mode/owner; when extracting use sudo if restoring system files.

Common options, performance, and advanced tips

A set of options and approaches you’ll likely use repeatedly:

Common flags and what they mean

  • -v show progress (verbose)
  • -p preserve file permissions when extracting (useful when restoring backups)
  • --numeric-owner store numeric user/group IDs (helpful for portability)
  • --exclude or --exclude-from to skip files listed in a file

Parallel compression and faster performance

# use pigz (parallel gzip) if installed — much faster on multicore machines
tar -I pigz -cf fast-archive.tar.gz large_directory/

# or with compression level
tar -I 'pigz -9' -cf fast-archive.tar.gz large_directory/

Splitting large archives

# split a tar.gz into 500MB chunks
tar -czf - big-data/ | split -b 500M - big-data.tar.gz.part-

# to reassemble and extract
cat big-data.tar.gz.part-* | tar -xzf -

Security considerations

  • Beware extracting archives from untrusted sources — they can contain paths like ../../evil that overwrite files (use --strip-components when extracting or review listing first).
  • Use --numeric-owner and proper UID/GID mapping if restoring on different hosts.

Practical examples and workflows

Backup a home directory (excluding caches and big folders)

tar -czvf home-backup-$(date +%F).tar.gz \
  --exclude='*.cache' \
  --exclude='Downloads' \
  --exclude='node_modules' \
  /home/you

Pack a build directory but keep relative paths

cd /home/you/project
tar -czf ../project-build-1.0.tgz build/

Create a reproducible archive (deterministic timestamps)

# set MTIME to a fixed value for reproducible archives (GNU tar)
tar --mtime='2020-01-01' -czf reproducible.tgz src/

Sharing archives over network (streaming)

# send archive over SSH without creating a local file
tar -czf - folder/ | ssh user@server 'cat > /path/on/server/folder.tgz'

# or receive and extract remotely
ssh user@server 'tar -czf - /path/to/data' | tar -xzf - -C /local/path

Commands table

Command example Purpose
tar -czf archive.tar.gz dir/ Create gzip-compressed tarball
tar -xzf archive.tar.gz Extract gzip-compressed tarball
tar -tzf archive.tar.gz List contents of a tar.gz
gzip file Compress a single file (file -> file.gz)
gunzip file.gz Decompress a single gzip file
gzip -t file.gz Test gzip file integrity
tar -I pigz -cf out.tar.gz dir/ Create archive using parallel gzip (pigz)
tar -czvf out.tar.gz --exclude='node_modules' dir/ Create archive excluding patterns
tar -xzf archive.tar.gz -C /target/path Extract to specific directory
split -b 500M file Split file into 500MB pieces
cat part-* Reassemble split pieces for streaming/extraction

Common Pitfalls

  • Thinking gzip compresses multiple files directly: gzip compresses a single file. When you compress multiple files, you typically first tar them into one file then gzip that tar. Compressing many files individually gives many .gz files.
  • Extracting archives with absolute paths or parent-directory paths: Archives that contain entries like /etc/passwd or ../../etc/passwd can overwrite system files when extracted. Always inspect with tar -tzf before extracting, or use --strip-components and extract into an empty directory.
  • Losing ownership/permissions when not using appropriate flags: Extracting as a non-root user or without -p may not preserve owner/group information. When restoring system backups, run tar as root or use --numeric-owner to preserve numeric mappings.

Next Steps

  • Practice creating and extracting archives on a test directory, trying flags like --exclude and -C.
  • Install pigz and benchmark compression speed vs gzip for large backups.
  • Learn other compression formats (bzip2 with -j, xz with -J) and when they are useful for space vs speed.

👉 Explore more IT books and guides at dargslan.com.