Skip to content

arceuzvx/Scrubbing-sensitive-data-from-git-history

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 

Repository files navigation

Scrubbing-sensitive-data-from-git-history

Git is a powerful tool, but sometimes mistakes happen: credentials, API keys, or sensitive files end up in commits. Even if you remove them in the latest commit, they may still live in your Git history. I ran into this issue a few days ago 😭and was frantically searching for a way to get rid the sensitive data without installing additional softwares & purely using Git. (learnt a lot that day 🤔)

This guide walks you through safely removing sensitive data from Git history, step by step, while keeping your repo intact and your workflow professional. ✨

Why This Matters

  • Exposed credentials in public repositories can be exploited by attackers.
  • Simply deleting a file or changing a password in the latest commit is not enough.
  • Understanding Git history cleanup is essential for professional code hygiene.

Step 1: Backup Your Repository

Before touching history, create a mirror backup:

git clone --mirror <your-repo-url> repo-backup.git

This ensures you can restore your repo if something goes wrong.

Step 2: Identify What to Remove

Decide whether you want to remove:

  • A specific file (e.g., config.py containing passwords)
  • A specific string (e.g., postgres:postgres, API keys, tokens)

Step 3: Remove Files from History

If the file still exists in history:

git filter-branch --force --index-filter \
"git rm --cached --ignore-unmatch path/to/file" \
--prune-empty --tag-name-filter cat -- --all
  • --prune-empty removes empty commits
  • --tag-name-filter cat keeps tags intact

Step 4: Remove Specific Strings

Even if the file was deleted or renamed:

git filter-branch --force --tree-filter '
find . -type f -exec sed -i "s/SECRET_STRING/REMOVED/g" {} +
' --prune-empty --tag-name-filter cat -- --all
  • Replace SECRET_STRING with your sensitive value
  • Replace REMOVED with a safe placeholder or nothing

Step 5: Clean Up Dangling Commits

git reflog expire --expire=now --all
git gc --prune=now --aggressive

This prunes old commits and objects from Git’s internal storage.

Step 6: Push Cleaned History

git push origin --force --all
git push origin --force --tags

⚠️ Force-pushing rewrites history. Anyone else using the repo must re-clone.

Step 7: Clear Stashes

git stash clear
Stashes may contain sensitive data too.

Step 8: Prevent Future Leaks

  1. Move credentials to a .env file:
POSTGRES_USER=postgres
POSTGRES_PASSWORD=new_secure_password
  1. Add .env to .gitignore:
.env
  1. Update your app to read credentials via environment variables.

Step 9: Verify

Locally

git log -p | grep SECRET_STRING

If nothing appears → success!

Repeat with variations if needed (e.g., different passwords or API keys).

Or, On GitHub / Remote

  • Go to your repository on GitHub.
  • Use the Search bar in your repo (make sure “In this repository” is selected).
  • Search for your sensitive string, e.g., postgres:postgres.
  • If no results appear, the secrets have been successfully removed from all pushed commits.

You can also search for other common secrets like password, api_key, etc., to be thorough.

⚠️ Remember: GitHub search only indexes commits that are pushed and visible. Local dangling objects are not searchable on GitHub, so combine this with the local verification step.

Conclusion

This workflow ensures sensitive data is fully removed while keeping your Git history clean and your projects secure. No need to download additional softwares for wiping your git history.


Thanks for reading 📖🙇🏻‍♀️ Have a great day! ( or night 🦉💻) You can follow me on twitter 🐧.

Releases

No releases published

Packages

No packages published