DiskBoss Logo
Flexense Data Management Software

Duplicate Files Finder and Cleaner

DiskBoss includes a built-in duplicate files finder allowing one to search duplicate files, generate various types of charts showing duplicate disk space, remove duplicate files and save reports into a number of different formats. In order to search duplicate files in one or more disks or directories, select the required directories in the DiskBoss file navigator and press the 'Duplicates' button located on the main toolbar. DiskBoss will scan the selected disks and directories and display a dialog showing the list of detected duplicate files.

DiskBoss Search Duplicate Files

For each duplicate file set, DiskBoss shows the name of the original file, the number of duplicate files in the set, the size of each file in the set, the amount of wasted disk space and the currently selected duplicates removal action. In order to see all duplicate files related to a set, click on the set item in the set list.

DiskBoss Duplicate Files Results

The duplicate set dialog shows all duplicate files related to the set and allows one to select the original file, the duplicate files and the duplicates removal action. In order to select a file as the original, select the file item, press the right mouse button and select the 'Set as Original File' menu item. In order to see more information about a file, just click on the file item in the file list. Once finished selecting the duplicate files, use the removal actions combo box located in the bottom-left corner of the dialog to select an appropriate duplicates removal action.

DiskBoss Duplicate Files Search Video Tutorial

Duplicate Files Pie Charts

The DiskBoss duplicate files finder allows one to display charts showing the amount of wasted disk space and the number of duplicate files per extension, file type, file size, user name, etc. In order to open the charts dialog, press the 'Charts' button located on the duplicate files search results dialog toolbar.

DiskBoss Duplicate Files Chart Categories

The charts dialog displays information for the displayed duplicate files and the currently selected categories of duplicate files. In order to display a chart for another category of duplicates, select an appropriate category in the categories combo box and then open the charts dialog.

DiskBoss Duplicate Files Chart Dates

The charts dialog allows one copy the displayed chart image to the clipboard making it very easy to integrate DiskBoss charts into user's reports and presentations. Finally, the user is provided with the ability to customize the information displayed on the chart's status bar.

File Filters and File Categories

The DiskBoss duplicate files finder allows one to categorize and filter duplicate files by the file type, extension, category, size, user name, etc. In order to change the current duplicate files categorization mode, click on the file categories combo box located in the top-left corner of the categories view.

DiskBoss Duplicate Files Categories

The user is provided with the ability to apply multiple file filters, display specific types of duplicate files and apply duplicate files removal actions to or export reports showing filtered duplicate files only.

DiskBoss Duplicate Files Filter

In order to set one or more file filters, select an appropriate type of file categories in the categories combo box, select one or more file filters in the filters view, press the right mouse button and select the 'Apply Selected Filters' menu item.

DiskBoss Duplicate Files Filter Active

With active file filters, DiskBoss shows duplicate files matching the selected filters, exports reports showing matching files only and significantly simplifies selection of duplicates removal actions for specific file types or file categories. In order to clear the selected file filters, just press the 'Clear' button located on the right side of the categories selector.

Searching Files in Duplicate Files Search Results

DiskBoss provides the ability to search files in duplicate files results by the file name, extension, full path, file category, file size, file attributes, creation, last modification and last access dates. In order to start a file search operation, search duplicate files in one or more disks or directories and press the 'Search' button located on the main toolbar.

Duplicate Files Results Search Files

By default, DiskBoss will search duplicate files matching the user-specified rules in all duplicate file sets in the current duplicate files report. Once the file search operation is completed, DiskBoss will display the search results dialog showing all duplicate files matching the rules.

Searching Files in Duplicate Files Search Results

The search results dialog allows one to filter and categorize file search results, display various types of pie charts, copy, move and/or delete files, export file search results to a number of standard report formats including PDF, HTML, text, Excel CSV and XML. In addition, advanced users are provided with the ability to export file search results to an SQL database.

Selecting Duplicate Files Removal Actions

The DiskBoss duplicate files finder allows one to delete duplicate files, move duplicate files to another directory, replace duplicate files with shortcuts pointing to the original file, replace duplicate files with hard links, compress duplicate files, compress and move duplicate files to another directory.

DiskBoss Duplicate Files Removal Actions

In order to select a specific duplicates removal action for one or more sets of duplicate files, select the sets in the set list, press the right mouse button and select an appropriate duplicate files removal action.

WARNING: There are many duplicate files in the Windows system directory, which are important for proper operation of the operating system. Removal of duplicate files located in the Windows system directory may permanently damage the operating system and render the computer completely non-functional.

By default, DiskBoss selects the oldest file in each set as the original file and all other files in the set as duplicates. In order to change that, select one or more sets, press the right mouse button and select the 'Select Oldest Files as Duplicates' menu item.

DiskBoss Duplicate Files Set

Alternatively, open the duplicate files set dialog, select any arbitrary duplicate file in the set as the original file, select an appropriate duplicate files removal action that should be executed for this specific duplicate files set and select one or more duplicate files in the set that the removal action should be applied to.

Executing Duplicate Files Removal Actions

Once finished selecting duplicates and removal actions, press the 'Execute' button to see the duplicate files removal actions preview dialog. The duplicates removal actions preview dialog shows the selected duplicate files and removal actions that will be executed and allows one to review and manually confirm each specific action before execution.

DiskBoss Duplicate Files Removal Actions Preview

The operating system and other system applications may have a large number of duplicate files located in various system directories. These duplicate files may be very important for proper operation of the operating system and other system applications and it is highly dangerous to remove these duplicate files. To be on the safe side, use the duplicates removal actions only for your own documents, music files, videos, etc.

DiskBoss Duplicate Files Removal Process

In order to execute the selected duplicates removal actions, press the 'Execute' button located in the bottom-right corner of the 'Preview' dialog. DiskBoss will process the selected duplicate files and execute the specified duplicates removal actions.

Saving Duplicate Files Search Reports

DiskBoss allows one to save duplicate files search reports into a number of standard formats including HTML, PDF, Excel, XML, text and CSV. In the simplest case, perform a duplicate files search operation and press the 'Save' button located on the duplicate files search results dialog. On the save report dialog, select an appropriate report format, enter a report file name and press the 'Save' button.

DiskBoss Duplicate Files Save Report

For the HTML, PDF, Excel, text, CSV and XML report formats, the user is provided with the ability to save a short summary report or a longer detailed report, which may be very long for large file systems containing millions of files. By default, DiskBoss will save a short, summary duplicate files search report in the HTML report format, which will include a list of top 20 duplicate file sets sorted by the amount of duplicate disk space and a list of tables showing the amount of duplicate disk space and the number of duplicate files per file extension, file type, top-level directory, user name, etc.

DiskBoss Duplicate Files HTML Report

In addition, the user is provided with the ability to save duplicate files search results to the DiskBoss native report format, which preserves all information related to each specific duplicate files search operation and may be loaded at any time just by clicking on a report file in the DiskBoss file navigator.

Microsoft Excel Reports

Sometimes, it may be required to perform additional analysis of duplicate files search results using external tools such as Microsoft Excel. In order to export duplicate files search results to the Excel report format, perform a duplicate files search operation, press the 'Save' button located on the duplicate files search results dialog, select the 'Excel Summary' report format for a short summary report or the 'Excel Report' format for a detailed duplicate files search report.

DiskBoss Duplicate Files Save Excel Report

A summary Excel report will include a list of top 20 duplicate file sets sorted by the amount of duplicate disk space and a number of tables showing the amount of duplicate disk space and the number of duplicate files per file extension, file category, file creation time, last modification time, top-level directory, user name, etc.

DiskBoss Duplicate Files Excel Report

A detailed Excel report will include the list of duplicate file sets sorted by the amount of the duplicate disk space followed by lists of duplicate files in each set, which may be very long for large file systems containing millions of files. In order to control how many duplicate file sets are exported in the detailed report, press the 'Advanced Options' button located on the 'Save Report' dialog and customize the duplicate files search report for your specific needs.

Saving Graphical PDF Reports

One of the most useful ways to export duplicate files search results is to use the PDF summary or the PDF report formats. Both of these report formats include various types of graphical pie charts showing the amount of duplicate disk space and the number of duplicate files per file extension, file category, creation time, last modification time, user name, etc. In order to save duplicate files search results to a PDF report file, press the 'Save' button located on the duplicate files search results dialog and select the 'PDF Summary' report format for a short, summary report or the 'PDF Report' format for a detailed duplicate files search report.

DiskBoss Duplicate Files Save PDF Report

A summary PDF report will include a list of top 20 duplicate file sets sorted by the amount of the duplicate disk space followed by a number of pie charts showing the amount of duplicate disk space and the number of duplicate files per file extension, file category, file creation time, last modification time, user name, etc. A detailed PDF report will include a list of duplicate file sets sorted by to the amount of duplicate disk space followed by lists of duplicate files in each set, which may be very long for large file systems containing millions of files.

DiskBoss Duplicate Files PDF Report

In addition to the list of duplicate file sets sorted by the amount of duplicate disk space, detailed PDF reports include pie charts showing the duplicate disk space per file category and the number of duplicate files per file category according to the currently selected file categorization mode. For example, if the second-level file categories mode is set to categorize duplicate files search results by the file extension, the PDF report will display pie charts showing the amount of duplicate disk space and the number of duplicates per file extension.

Exporting Reports to an SQL Database

IT professionals and storage administrators are provided with the ability to submit reports listing duplicate files detected on multiple storage systems, servers and desktop computers to a centralized SQL database enabling system and storage administrators to gain an in-depth visibility into amounts of duplicate files and wasted disk space across the entire enterprise.

DiskBoss Duplicate Files Save SQL Database Report

In order to submit a report to an SQL database, press the 'Save' button located on the duplicate files search results dialog toolbar, select the 'SQL Database' report format and press the 'Save' button. Before exporting a report to an SQL database, the user needs to open the options dialog, enable the ODBC interface and specify the name of the ODBC data source, the database user name and password to use for database export operations.

DiskBoss SQL Database Configuration

For each report in the database, DiskBoss shows the report date and time, the name of the host computer the operation was performed on, disks and directories that were processed, the total amount of disk space and the number of files that were processed and the report title. In order to open a report, just click on the report item in the report list.

Analyzing Duplicate Files Per Host

DiskBoss Server and DiskBoss Enterprise provide the ability to automatically detect all servers and NAS storage devices on the network, search duplicate files in hundreds of servers and/or NAS storage devices via the network, submit duplicate files search reports to a centralized report database and display charts showing the number of duplicate files and the amount of duplicate disk space per server or NAS storage device across the entire enterprise.

Analyzing Duplicate Files Per Server

In order to analyze duplicate files per server, perform one or more duplicate files search operations on multiple servers and/or NAS storage devices, open the 'Reports' dialog, press the 'Analyze' button and select the 'Analyze Disk Space Usage Per Host' menu item. DiskBoss will analyze all reports saved in the reports database and display the hosts analysis dialog showing the number of duplicate files and the amount of duplicate disk space per host.

DiskBoss Duplicate Files Per Server

The hosts analysis dialog provides the ability to display pie charts and bars charts showing the number of duplicate files and the amount of duplicate disk space per host according to duplicate files search reports saved in the reports database. The user is provided with the ability to select the types of duplicate files search operations and file system locations to analyze, edit the chart header and footer, copy the chart image to the clipboard and export graphical PDF reports including pie charts.

Analyzing Duplicate Files Per User

DiskBoss Server and DiskBoss Enterprise provide the ability to automatically detect all servers and NAS storage devices on the network, search duplicate files in hundreds of servers and/or NAS storage devices via the network, submit duplicate files search reports to the reports database and display charts showing the number of duplicate files and the amount of duplicate disk space per user across the entire enterprise.

Analyzing Duplicate Files Per User

In order to analyze duplicate files per user, perform one or more duplicate files search operations on multiple servers and/or NAS storage devices, open the 'Reports' dialog, press the 'Analyze' button and select the 'Analyze Disk Space Usage Per User' menu item. DiskBoss will analyze all reports saved in the reports database and display the users analysis dialog showing the number of duplicate files and the amount of duplicate disk space per user.

DiskBoss Duplicate Files Per User

The users analysis dialog provides the ability to display pie charts and bars charts showing the number of duplicate files and the amount of duplicate disk space per user according to duplicate files search reports saved in the reports database. The user is provided with the ability to select the types of duplicate files search operations and file system locations to analyze, edit the chart header and footer, copy the chart image to the clipboard and export graphical PDF reports including pie charts.

IMPORTANT: In order to be able to display duplicate files per user, the duplicate files search operation should be configured to process and display files user names.

Search Duplicate Files in Network Servers and NAS Storage Devices

DiskBoss allows one to scan the network, discover network servers and NAS storage devices, automatically detect all accessible network shares and search duplicate files in hundreds of network servers and NAS storage devices. In addition, the user is provided with the ability to export the list of detected servers and NAS storage devices (including lists of network shares for each server) into HTML, PDF, text and Excel CSV reports.

Search Duplicate Files in Network Servers

In order to discover all network servers and NAS storage devices on the network, press the 'Network' button located on the main toolbar and wait while DiskBoss will scan the network and show a list of detected network servers and NAS storage devices. In order to search duplicate files in one or more servers or NAS storage devices, select the required servers and NAS storage devices and press the 'Duplicates' button.

Search Duplicate Files in Network Shares

DiskBoss will show all accessible network shares hosted on the selected servers and NAS storage devices allowing one to search duplicate files and save various types of duplicate files pie charts and reports. In addition, the user is provided with the ability to customize a large number of advanced duplicate files search options allowing one to tune duplicate files search operations for user specific needs and hardware configurations.

Batch Duplicate Files Search Operations

DiskBoss Server and DiskBoss Enterprise provide the ability to execute one or more pre-configured duplicate files search operations on all network servers and NAS storage devices on the network and generate an individual duplicate files search report for each server and NAS storage device. In order to be able to use batch duplicate files search operations, the user needs to pre-configure one or more duplicate files search commands customized to generate duplicate files search reports according to user-specific needs and requirements.

Batch Duplicate Files Search Operations

In order to start a batch duplicate files search operation, press the 'Network' button located on the main toolbar, search all servers and NAS storage devices on the network, select one or more servers and NAS storage devices, press the right mouse button and select the 'Execute Batch Command' menu item.

Select Batch Duplicate Files Search Commands

DiskBoss will display a list of pre-configured duplicate files search commands allowing one to select one or more commands to be executed on all selected servers and NAS storage devices. In addition, the user is provided with the ability to select how to save duplicate files search reports - for each server or for each network share. By default, all duplicate files search reports will be saved in the DiskBoss internal reports database allowing one to open each report, review results, generate various types of pie charts and export reports into a number of standard formats including HTML, PDF, text, Excel CSV and XML.

Searching Specific Types of Duplicate Files

One of the most powerful capabilities of DiskBoss is the ability to search specific types of duplicate files according to one or more user-specified file matching rules. Files not matching the specified rules, will be just skipped from the duplicate files search process.

DiskBoss Duplicate Files Rules

In order to add one or more file matching rules to a duplicate files search operation, open the command dialog, select the rules tab and press the 'Add' button located on the right side of the dialog. Once finished adding file matching rules, select an appropriate rules logic and press the 'Save' button.

DiskBoss Duplicate Files Negative Rules

Another option is to exclude specific types of duplicate files from the search process using one or more negative file matching rules. For example, in order to exclude all types of programs and executable files from the duplicate files search process, add a file category rule, select the 'Not Categorized As' rule operator and select the 'Programs and Executable Files' file category.

Excluding Subdirectories from Duplicate Files Search Process

Sometimes, it may be required to exclude one or more subdirectories from the duplicate files search process. For example, if you need to search duplicate files in a disk excluding one or two special directories, you may specify the whole disk as an input directory and add the directories that should be skipped to the exclude list. By default, in order to prevent accidental deletion of critical system files, DiskBoss automatically adds the operating system directory to the list of exclude directories in all duplicate files search commands.

DiskBoss Duplicate Files Exclude Directories

In order to add one or more directories to the exclude list, open the duplicate files search command dialog, press the 'Options' button, select the 'Exclude' tab and press the 'Add' button. All files and subdirectories located in the specified exclude directory will be excluded from the duplicate files search process. In addition, advanced users are provided with a number of exclude directories macro commands allowing one to exclude multiple directories using a single macro command.

DiskBoss provides the following exclude directories macro commands:

  • $BEGINS <Text String> - this macro command excludes all directories beginning with the specified text string.
  • $CONTAINS <Text String> - this macro command excludes all directories containing the specified text string.
  • $ENDS <Text String> - this macro command excludes all directories ending with the specified text string.
  • $REGEX <Regular Expression> - this macro command excludes directories matching the specified regular expression.

For example, the exclude macro command '$CONTAINS Temporary Files' will exclude all directories with 'Temporary Files' in the full directory path and the exclude macro command '$REGEX \.(TMP|TEMP)$' will exclude directories ending with '.TMP' or '.TEMP'.

Automatic Duplicate Files Removal Actions

DiskBoss Ultimate and DiskBoss Server provide the user with the ability to automatically execute one or more duplicate files removal actions for files matching user-specified rules. In order to define one or more automatic duplicate files removal actions, open the duplicate files search command dialog, select the 'Actions' tab and press the 'Add' button.

DiskBoss Duplicate Files Removal Actions

On the 'Action' dialog select the original file detection mode, an appropriate duplicates removal action and specify one or more file matching rules defining files the action should be applied to. During runtime, DiskBoss will process detected duplicate files, apply the specified file matching rules, detect the original file and execute the duplicates removal actions for files matching the specified rules and policies.

DiskBoss Duplicate Files Removal Action

By default, DiskBoss executes automatic duplicates removal actions in the 'Auto-Select' mode, which selects the specified actions and displays the duplicates removal actions preview dialog allowing one to review and manually confirm each specific action. After testing the duplicate files search command in the preview mode, change the actions mode to 'Execute' to automatically execute the specified duplicates removal actions without showing the actions preview dialog.

Duplicate Files Search Performance

DiskBoss detects duplicate files by calculating hash signatures for files with an identical file size and files with the same hash signature are reported as duplicates. There are many different types of hash signature algorithms providing different reliability and performance levels. Simple algorithms are usually faster, but less reliable and more complicated algorithms are very reliable, but require more computational resources. DiskBoss provides support for 5 different types of hash signature algorithms allowing one to select an appropriate hash signature algorithm according to user-specific needs.

DiskBoss Duplicate Files Search Hash Signatures

In order to change the hash signature algorithm, open the duplicate files search command dialog, press the 'Options' button, select the 'General' tab and click on the 'Hash Signature Type' combo box. By default, DiskBoss uses the SHA256 signature type, which is the slowest of all the supported algorithms, but the most reliable one. The MD5 algorithm is the fastest one, but less reliable causing some inaccuracies when processing very large amounts of data. The BLAKE2-B algorithm is one of the newly added algorithms, which is supposed to provide the same reliability as the SHA256 algorithm and the same performance as the MD5 algorithm, but only on 64-Bit operating systems using the native 64-Bit version of the product.

DiskBoss Duplicate Files Search Performance

According to our performance tests, when searching duplicate files stored on a fast SSD disk, the SHA256 algorithm is two times slower than the SHA1 algorithm and more than two times slower than the newly added BLAKE2-B algorithm. But, when searching duplicate files stored on a NAS device, the impact of the selected hash signature algorithm is less important because the performance of a regular NAS device is usually much slower than the performance of the slowest SHA256 hash signature algorithm.

Advanced Duplicate Files Search Options

The DiskBoss duplicate files finder provides a large number of advanced options allowing one to customize duplicate files search operations for user-specific hardware and storage configurations. The 'General' tab allows one to control the file signature type, the file scanning mode, the maximum number of duplicate file sets to display in the results dialog.

DiskBoss Duplicate Files Removal Action

The 'Advanced' tab provides the ability to intentionally slow down the duplicate files search process in order to minimize the potential performance impact on running production systems. The 'Exclude' tab allows one to define one or more subdirectories to be excluded from the duplicate files detection process.

Pre-Configured Duplicate Files Search Commands

One of the most powerful and flexible capabilities of DiskBoss is the ability to pre-configure custom duplicate files search operations as user-defined commands and execute such commands in a single mouse click using the DiskBoss GUI application or direct desktop shortcuts.

Pre-Configured Duplicate Files Search Commands

User-defined commands may be managed and executed through the commands dialog or the commands tool pane. In order to add a new command through the commands pane, press the right mouse button over the pane and select the 'Add New - Duplicate Files Search Command' menu item. In order to execute a previously saved command, just click on the command item in the commands tool pane or create a direct desktop shortcut on the Windows desktop.

Searching Duplicate Files Using DiskBoss Command Line Utility

In addition to the DiskBoss GUI application, DiskBoss Ultimate and DiskBoss Server provide a command line utility allowing one to search and remove duplicate files from batch files and shell scripts. The command line tool is located in the '<ProductDir>\bin' directory.

diskboss -duplicates -dir <Directory 1> [ ... <Directory X> <Options> ]

This command searches duplicate files in the specified disks, directories or network shares.

diskboss -duplicates -server <Host Name 1> [ ... <Host Name X> <Options> ]

This command searches duplicate files in all network shares in the specified servers.

diskboss -duplicates -network [ Options ]

This command searches duplicate files in all network shares in all servers on the network.

diskboss -execute <User-Defined Duplicate Files Search Command>

This command executes the specified user-defined duplicate files search command.

Parameters:

-dir <Directory 1> [ ... <Directory X> ]

This parameter specifies the list of input disks or directories to search. In order to ensure proper parsing of input directories, directories containing space characters should be double quoted. By default, DiskBoss will generate a combined duplicate files report showing information about all processed disks, directories and network shares. In order to generate an individual report for each input directory, use the '-batch' command line option to enable the batch report generation mode.

-server <Host Name 1> [ ... <Host Name X> ]

This parameter specifies the list of host names or IP addresses of servers or NAS storage devices to search. DiskBoss will enumerate all network shares accessible in the specified servers or NAS storage devices, search duplicate files and generate reports if required. By default, DiskBoss will generate a combined duplicate files report showing information about all processed network shares. In order to generate an individual report for each network share, use the '-batch' command line option to enable the batch report generation mode.

-network

In the network-wide duplicate files search mode, DiskBoss will discover servers and NAS storage devices accessible on the network, enumerate all network shares available in all detected servers and NAS storage devices, search duplicate files and generate reports if required. By default, DiskBoss will generate a combined duplicate files report showing information about all processed network shares. In order to generate an individual report for each network share, use the '-batch' command line option to enable the batch report generation mode.

Options:

-signature_type <MD5 | SHA1 | SHA256>

This option sets the type of the algorithm used to calculate signatures of files. By default, DiskBoss uses the SHA256 algorithm.

-exclude_dir <Exclude Directory 1> [ ... <Exclude Directory X> ]

This option specifies the list of directories that should be excluded from the duplicate files search operation. In order to ensure proper parsing of command line arguments, directories containing space characters should be double quoted.

-save_html_report [ Report File Name ]

This option saves duplicate files search results to an HTML report file. If no file name is specified, DiskBoss will automatically generate a file name according to the following template: diskboss_[date]_[time].html and save a report file in the user's home directory.

-save_csv_report [ Report File Name ]

This option saves duplicate files search results to an Excel CSV file. If no file name is specified, DiskBoss will automatically generate a file name according to the following template: diskboss_[date]_[time].csv and save a report file in the user's home directory.

-save_text_report [ Report File Name ]

This option saves duplicate files search results to a text report file. If no file name is specified, DiskBoss will automatically generate a file name according to the following template: diskboss_[date]_[time].txt and save a report file in the user's home directory.

-save_pdf_report [ Report File Name ]

This option saves duplicate files search results to a PDF report file. If no file name is specified, DiskBoss will automatically generate a file name according to the following template: diskboss_[date]_[time].pdf and save a report file in the user's home directory.

-save_xml_report [ Report File Name ]

This option saves duplicate files search results to an XML report file. If no file name is specified, DiskBoss will automatically generate a file name according to the following template: diskboss_[date]_[time].xml and save a report file in the user's home directory.

-save_report [ Report File Name ]

This option saves duplicate files search results to a native DiskBoss report file, which may be later loaded in the DiskBoss GUI application for future review and analysis. If no file name is specified, DiskBoss will automatically generate a file name according to the following template: diskboss_[date]_[time].flr

-save_to_database

This option saves duplicate files search results to an SQL Database using the ODBC interface configured in the DiskBoss GUI application options dialog.

-title <Report Title>

This option sets a custom report title.

-label <Report Label>

This option sets a custom report label.

-compress

This option instructs to save compressed report files.

-batch

This option enables the batch report generation mode. In the batch report generation mode DiskBoss saves an individual report file for each input disk, directory or network share.

-v

This option shows the product's major and minor versions.

-help

This option shows the command line usage information.