ACL Scripts: Directory Command

2012-02-01

ACL Scripts: Directory Command

The ACL Directory command produces a directory of files and folders stored on disk. It also reports file properties, such as size, attributes and dates.

In the following script, we are going to use the Directory command to identify possible duplicate files on our C: drive. The example has been written for ACL users that have attended an ACL 302 course or an ACL 303 course and that some experience in writting scripts.

Step 1: Setting up the script

First of all, it’s always good practice to add some header comments to the beginning of the script. This part is optional, however it’s always useful to write a description for future reference. We will also change some of the project settings.

Step 1

COMMENT
*************************************************************
*
* DIRECTORY command example – Identify duplicate files
*
* Description: This script identifies possible duplicate files
* on the C drive based on file name and file size
*
* Created By: Alex Psarras
* Created On: 16/11/2011
*
* version 1.0
*
*************************************************************

SET SAFETY OFF
SET EXACT ON
CLOSE PRI SEC

Step 2: Using the DIR command

In the next step, we will use the DIRECTORY or DIR command to produce a directory of files and folders stored on the C: drive. We will then extract all records where the value of the file attribute is “A”.

Step 2

COMMENT Use the DIR command to create the directory of the C: drive
DIR C\:*.* SUB TO x_Files

COMMENT Extract records where the file attribute is “A”
OPEN x_Files
EXTRACT RECORD IF FILE_ATTRIBUTES = “A” TO “A_Files”

Step 3: Create a summary of the directory

We will now use this newly extracted table to create a summary by file name and file size in order to identfy files that are saved on the C: drive more than once.

Step 3

COMMENT Create two computed fields that will be used in the summary
OPEN A_Files

COMMENT Strip the name of the file from the full file name, i.e. path + file name
DELETE FIELD c_File OK
DEFINE FIELD c_File COMPUTED SPLIT(LOWER(FILE_NAME), “\”,(OCCURS(FILE_NAME, “\”) +1))

COMMENT Convert the file size into a character field so we can summarise on it
DELETE FIELD c_Size OK
DEFINE FIELD c_Size COMPUTED ALLTRIM(STRING(FILE_SIZE, 20))

COMMENT Summarise on the file name and size where the size is greater than zero
SUMMARIZE ON c_File c_Size OTHER FILE_SIZE TO “x_Summary.FIL” IF FILE_SIZE > 0 PRESORT

Step 4: Extract possible duplicates

The fourth step involves extracting records from the summary table where the file name / file size combination appears more than once. I.e. the file is possibly duplicated in another location.

Step 4

COMMENT Extract records where the count in the summary table is greater than one and the where the total size of the files is greater than 50 MB (1 MB = 1,048,576 bytes)
OPEN x_Summary
EXTRACT RECORD IF COUNT > 1 AND ((FILE_SIZE * COUNT)/1048576 > 50) TO “x_Duplicates”

COMMENT Create computed fields that will be used in the results
OPEN x_Duplicates
COMMENT Calculate the file size in MB
DELETE FIELD c_FileSizeMB OK
DEFINE FIELD c_FileSizeMB COMPUTED FILE_SIZE/1048576

COMMENT Calculate the total file size in MB
DELETE FIELD c_TotalSizeMB OK
DEFINE FIELD c_TotalSizeMB COMPUTED (FILE_SIZE * COUNT)/1048576

COMMENT Use the record number as a case number
DELETE FIELD c_CaseNo OK
DEFINE FIELD c_CaseNo COMPUTED RECNO()

Step 5: Join back to the original directory table

Now that we have identified the possible duplicates, we will join back to the original directory table to get the path of each duplicated file.

Step 5

COMMENT Join the possible duplicates back to the directory extract to pull through the full location of each file
OPEN A_Files
OPEN x_Duplicates SECONDARY
JOIN PKEY c_File c_Size SKEY c_File c_Size FIELDS FILE_NAME WITH c_FileSizeMB c_TotalSizeMB COUNT c_CaseNo TO “x_DuplicateJoin” OPEN PRESORT SECSORT

Step 6: Sort on results

Finally, we will sort the results on the total file size and the case number (to group results together). In the final part of step 6 we will clean up the project by deleting the intermediate tables (this is optional).

Step 6

COMMENT Sort on the total size and the case number
OPEN x_DuplicateJoin
SORT ON c_TotalSizeMB D c_CaseNo TO “Duplicates”

COMMENT Delete the intermediate tables
DELETE FORMAT x_FILES OK
DELETE x_FILES.FIL OK
DELETE FORMAT A_Files OK
DELETE A_Files.FIL OK
DELETE FORMAT x_DuplicateJoin OK
DELETE x_DuplicateJoin.FIL OK
DELETE FORMAT x_Summary OK
DELETE x_Summary.FIL OK
DELETE FORMAT x_Duplicates OK
DELETE x_Duplicates.FIL OK

SET SAFETY ON

Full Script

hide/show

COMMENT
*************************************************************
*
* DIRECTORY command example – Identify duplicate files
*
* Description: This script identifies possible duplicate files
* on the C drive based on file name and file size
*
* Created By: Alex Psarras
* Created On: 16/11/2011
*
* version 1.0
*
*************************************************************

SET SAFETY OFF
SET EXACT ON
CLOSE PRI SEC

COMMENT Use the DIR command to create the directory of the C: drive
DIR C:*.* SUB TO x_Files

COMMENT Extract records where the file attribute is “A”
OPEN x_Files
EXTRACT RECORD IF FILE_ATTRIBUTES = “A” TO “A_Files”

COMMENT Create two computed fields that will be used in the summary
OPEN A_Files

COMMENT Strip the name of the file from the full file name, i.e. path + file name
DELETE FIELD c_File OK
DEFINE FIELD c_File COMPUTED SPLIT(LOWER(FILE_NAME), “”,(OCCURS(FILE_NAME, “”) +1))

COMMENT Convert the file size into a character field so we can summarise on it
DELETE FIELD c_Size OK
DEFINE FIELD c_Size COMPUTED ALLTRIM(STRING(FILE_SIZE, 20))

COMMENT Summarise on the file name and size where the size is greater than zero
SUMMARIZE ON c_File c_Size OTHER FILE_SIZE TO “x_Summary.FIL” IF FILE_SIZE > 0 PRESORT

COMMENT Extract records where the count in the summary table is greater than one and the where the total size of the files is greater than 50 MB (1 MB = 1,048,576 bytes)
OPEN x_Summary
EXTRACT RECORD IF COUNT > 1 AND ((FILE_SIZE * COUNT)/1048576 > 50) TO “x_Duplicates”
OPEN x_Duplicates

COMMENT Create computed fields that will be used in the results
COMMENT Calculate the file size in MB
DELETE FIELD c_FileSizeMB OK
DEFINE FIELD c_FileSizeMB COMPUTED FILE_SIZE/1048576

COMMENT Calculate the total file size in MB
DELETE FIELD c_TotalSizeMB OK
DEFINE FIELD c_TotalSizeMB COMPUTED (FILE_SIZE * COUNT)/1048576

COMMENT Use the record number as a case number
DELETE FIELD c_CaseNo OK
DEFINE FIELD c_CaseNo COMPUTED RECNO()

COMMENT Join the possible duplicates back to the directory extract to pull through the full location of each file
OPEN A_Files
OPEN x_Duplicates SECONDARY
JOIN PKEY c_File c_Size SKEY c_File c_Size FIELDS FILE_NAME WITH c_FileSizeMB c_TotalSizeMB COUNT c_CaseNo TO “x_DuplicateJoin” OPEN PRESORT SECSORT

COMMENT Sort on the total size and the case number
OPEN x_DuplicateJoin
SORT ON c_TotalSizeMB D c_CaseNo TO “Duplicates”

COMMENT Delete the intermediate tables
DELETE FORMAT x_FILES OK
DELETE x_FILES.FIL OK
DELETE FORMAT A_Files OK
DELETE A_Files.FIL OK
DELETE FORMAT x_DuplicateJoin OK
DELETE x_DuplicateJoin.FIL OK
DELETE FORMAT x_Summary OK
DELETE x_Summary.FIL OK
DELETE FORMAT x_Duplicates OK
DELETE x_Duplicates.FIL OK

SET SAFETY ON

DISCLAIMER:
DataConsulting Ltd. provide the script “as is” and free of charge. DataConsulting: (a) Do not provide support for these scripts; (b) Make no warranties or representations, expressed or implied, with respect to the script, including its fitness for a particular purpose, merchantability, durability, quality or its non-infringement; (c) Do not warrant that the script is free from errors; and (d) Will not be liable for any damages (including, but not limited to indirect damages such as lost profits and lost data) arising out of the use of, or the inability to use the script. You agree to assume all risk of loss or damage arising from the use of the script.

Wednesday, February 1, 2012 In: Hot Topics Comments (None)

Contact us

3 Appleton Court, Calder Park
Wakefield, WF2 7AR

+44 (0) 1924 254 101

enquiries@dataconsulting.co.uk

Mailing List

Subscribe to our newsletter.