Skip to main content

Command Palette

Search for a command to run...

Automate the removal of unnecessary files using Python

Remove Duplicate Files with Python

Published
2 min read
Automate the removal of unnecessary files using Python
H

I am software developer . I come from Lucknow, Uttar Pradesh, and hold a three-year diploma in Computer Science and Engineering. I have a strong knowledge of Python and expertise in Python Django. I believe in continuous learning and strive to acquire new skills every day. I am dedicated to delivering effective solutions and constantly improving my abilities in the field.

Duplicate files can take up a lot of space on your computer. They can also be confusing and difficult to find. This code uses Python to remove duplicate files from a directory.

The code first uses a regular expression to extract the name, number, and extension from each file in the directory. The name and number are used to create a unique identifier for each file. The code then iterates over the files and removes any duplicate files.

The code also renames the latest file for each name. This ensures that there is only one file with a given name in the directory.

Here is an example of how the code works:

$ python remove_duplicates.py ~/Downloads
Deleted duplicate file: ~/Downloads/file_1.txt
Deleted duplicate file: ~/Downloads/file_2.txt
Renamed ~/Downloads/file_3.txt to ~/Downloads/file_3_1.txt

The code can be used to remove duplicate files from any directory. To use the code, you will need to install the Python re and os modules. You can then run the code from the command line.

import re
import os

def remove_duplicates(directory):
  filedict = {}
  for root, dirs, files in os.walk(directory):
    for filename in files:
      filepath = os.path.join(root, filename)

      # Extract the name, number, and extension from the filename
      match = re.search(r"^(.*?)\{{(\d+)}\}{(.*?)$", filename)
      if match:
        name = match.group(1)
        number = int(match.group(2))
        ext = match.group(3) or ""

        # Add the file to the dictionary, if it doesn't already exist
        if name not in filedict:
          filedict[name] = [(number, filepath, ext)]
        else:
          filedict[name].append((number, filepath, ext))

  # Iterate over the dictionary and remove duplicate files
  for name in filedict:
    files = filedict[name]

    # Get the latest file
    latest = files[-1]
    ext = latest[2]

    # Delete the duplicate files
    for number, filepath, _ in files[:-1]:
      print("Deleted duplicate file: {}".format(filepath))
      os.remove(filepath)

    # Rename the latest file
    os.rename(latest[1], os.path.join(root, "{}{}".format(name, ext)))
    print("Renamed {} to {}{}".format(latest[1], name, ext))

if __name__ == "__main__":
  remove_duplicates(os.path.expanduser("~/Downloads"))

Conclusion

This article has shown you how to use Python to remove duplicate files from a directory. The code is simple to understand and use. It can be used to remove duplicate files from any directory.

If you are looking for a way to free up space on your computer and keep your files organized, this code is a great option. You can use it to remove duplicate files from your Downloads folder, your Photos folder, or any other directory.

K

that's great