Skip to content
Open
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
0da9b06
feat: add large-file-script.
yuting1214 Feb 25, 2025
268e0bd
Merge branch 'main' into mark-chen-pr
michen00 Mar 9, 2025
fea6565
Merge branch 'main' into mark-chen-pr
michen00 Mar 13, 2025
42f2b3e
Merge branch 'main' into mark-chen-pr
michen00 Mar 18, 2025
1229cfb
Merge branch 'main' into mark-chen-pr
michen00 Mar 18, 2025
e53e0cb
Merge branch 'main' into mark-chen-pr
michen00 Mar 18, 2025
4113f8f
Merge branch 'main' into mark-chen-pr
michen00 Mar 21, 2025
e564c47
Merge branch 'main' into mark-chen-pr
michen00 Mar 24, 2025
d5c04a3
Merge branch 'main' into mark-chen-pr
michen00 Apr 1, 2025
4c6db2f
Merge branch 'main' into mark-chen-pr
michen00 Apr 2, 2025
dc7373b
Merge branch 'main' into mark-chen-pr
michen00 Apr 22, 2025
97c74bf
Merge branch 'main' into mark-chen-pr
michen00 May 12, 2025
c9e29ce
Merge branch 'main' into mark-chen-pr
michen00 May 20, 2025
4e8e66f
Merge branch 'main' into mark-chen-pr
michen00 Jun 3, 2025
d65f6d2
Merge branch 'main' into mark-chen-pr
michen00 Jun 3, 2025
a3be28b
Merge branch 'main' into mark-chen-pr
michen00 Jun 3, 2025
9b5f72c
chore: autofix via pre-commit hooks
pre-commit-ci[bot] Jun 3, 2025
71c247d
Merge branch 'main' into mark-chen-pr
michen00 Jul 4, 2025
18acd2c
Merge branch 'main' into mark-chen-pr
michen00 Oct 11, 2025
446cb54
Merge branch 'main' into mark-chen-pr
michen00 Nov 26, 2025
8f3cab6
fix(large-files): make the script executable
michen00 Nov 26, 2025
b66c0b0
Merge branch 'main' into mark-chen-pr
michen00 Dec 8, 2025
9baab2a
Merge branch 'main' into mark-chen-pr
michen00 Dec 17, 2025
f585fd5
Merge branch 'main' into mark-chen-pr
michen00 Dec 23, 2025
4811f24
Merge branch 'main' into mark-chen-pr
michen00 Dec 25, 2025
51b7259
Merge branch 'main' into mark-chen-pr
michen00 Jan 6, 2026
a09b445
Merge branch 'main' into mark-chen-pr
michen00 Jan 14, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
61 changes: 61 additions & 0 deletions large-files
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
#!/bin/bash

set -e

# Function to display help message
usage() {
cat <<EOF
Usage: large-files [N]

Find the top N largest files in a Git repository.

Arguments:
N Number of large files to display (default: 20)

Description:
This script scans the entire Git history and identifies the largest files that
have ever existed in the repository. It uses 'git rev-list' to extract object
Comment on lines +16 to +17
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering all files that have ever existed significantly diminishes the utility of this tool. Consider the following output of refs to the same file:

image

(Side note: If we are going to claim 'to have ever existed', we should probably do an explicit fetchin the beginning.)

I can think of a couple of ways to make this script more useful:

  • explicitly limit scope: e.g., only consider files that exist at the HEAD of the current branch
  • refine the output: e.g., add distinguishing/informative column(s) to the output
  • parametrize the command: e.g., add more options to filter or sort output by different fields

@yuting1214, what do you think? Do any of the above (alone or in conjunction) sound more or less appealing to you?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Michael,
I'm more than happy to contribute more about this repo. It's just I'm a little occupied at the moment.
Once I settled a few things, I'll come back and write more code with you. Stay tuned.

sizes and sorts them to display the largest files.

Requirements:
- Ensure you're inside a Git repository.
- Install 'numfmt' (available in GNU coreutils) for better size formatting.

Examples:
large-files # Show top 20 largest files
large-files 50 # Show top 50 largest files
EOF
exit 0
}

# Default number of files to display
NUM_FILES=20

# Parse command-line argument
if [[ $# -gt 1 ]]; then
usage
elif [[ $# -eq 1 ]]; then
NUM_FILES="$1"
if ! [[ "$NUM_FILES" =~ ^[0-9]+$ ]]; then
echo "Error: N must be a valid integer."
usage
fi
fi

# Ensure we are inside a Git repository
if ! git rev-parse --is-inside-work-tree >/dev/null 2>&1; then
echo "Error: Not inside a Git repository."
exit 1
fi

echo "🔍 Finding the $NUM_FILES largest files in the repository..."

# Extract large files from Git history
git rev-list --objects --all |
git cat-file --batch-check='%(objectsize:disk) %(rest)' |
sort -rh |
head -n "$NUM_FILES" |
awk '{ printf "%10s %s\n", $1, $2 }' |
numfmt --to=iec-i --suffix=B --padding=7 --field=1

echo "✅ Done!"
Loading