Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Delete duplicate file based on modified time, remaining first created file

First I have video files that record from webcam camera. It will got many file of videos but I want to delete duplicate file base on modification time, limited by minutes.

For example,
I have 3 video files as below. base on (hour : minute : second)

  1. Ek001.AVI – time modification of file is 08:30:15
  2. Ek002.AVI – time modification of file is 08:30:40
  3. Ek003.AVI – time modification of file is 08:32:55

I want to get remains output.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

  1. Ek001.AVI – time modification of file is 08:30:15 (first file created remaining)
  2. Ek003.AVI

Now I have code for find modification time as below.

import os
import datetime
import glob
from datetime import datetime
      
for file in glob.glob('C:\\Users\\xxx\\*.AVI'):
    time_mod = os.path.getmtime(file)     
    print (datetime.fromtimestamp(time_mod).strftime('%Y-%m-%d %H:%M:%S'),'-->',file)

Please supporting me to adapt my code for delete duplicate file based on modified time, limited by minutes.

>Solution :

Here is my suggested solution. See the comments in the code itself for an detailed explanation, but the basic idea is that you build up a nested dictionary of lists of 2-element tuples, where the keys of the dictionary are the number of minutes since the start of Unix time, and the 2-tuples contain the filename and the remaining seconds. You then loop over the values of the dictionary (lists of tuples for files created within the same calendar minute), sort these by the seconds, and delete all except the first.

The use of a defaultdict here is just a convenience to avoid the need to explicitly add new lists to the dictionary when looping over files, because these will be added automatically when needed.

import os
import glob
from collections import defaultdict

files_by_minute = defaultdict(list)

# group together all the files according to the number of minutes since the
# start of Unix time, storing the filename and the number of remaining seconds
for filename in glob.glob("C:\\Users\\xxx\\*.AVI"):
    time_mod = os.path.getmtime(filename)
    mins = time_mod // 60
    secs = time_mod % 60
    files_by_minute[mins].append((filename, secs))

# go through each of these lists of files, removing the newer ones if
# there is more than one
for fileset in files_by_minute.values():
    if len(fileset) > 1:
        # sort tuples by second element (i.e. the seconds)
        fileset.sort(key=lambda t:t[1])
        # remove all except the first
        for file_info in fileset[1:]:
            filename = file_info[0]
            print(f"removing {filename}")
            os.remove(filename)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading