Utilizing additional network mechanisms such as the IP protocols, and other approaches to further boost File Transfer performance
If you are transmitting across a WAN, there are a number of WAN Acceleration/Optimization products (Such as from Riverbed, Juniper, etc.) that can do this for you on the fly - some use Molecular Sequence Reduction algorithms to identify large chunks of data that have already been transmitted and merely sends a "token" to the other side of the WAN link, instead of redundantly transmitting that same data chunk redundantly.
In addition, some WAN acceleration/optimization algorithms perform ("spoof") the TCP three-way handshake interaction locally in order to reduce latency. This can have a greater effect than switching to a huge pipe (especially where the far - receiving - end cannot send the three-way TCP handshake responses fast enough.
Another way to improve effective throughput to employ a UDP-based network protocol (instead of a TCP-based protocol). UDP allows data packets to be sent back-to-back without an individual three-way handshake for each packet transmitted. It is also used to transmit latency-sensitive data, such as video or voice, but can be used for other purposes as well. For example, (TFTP Trivial File Transfer Protocol) is UDP-based. There can still be throttling/windowing mechanisms pushed back from the compute receiving that data packets (via transmitting messages on the same or a different port number or via a CONTROL message vs a DATA BEARING message). This method could also be used to indicate packets that never arrived (so that they can be retransmitted, for example). However, these retransmit and windowing requests could take place at the application layer, for example or somewhere in between that layer and layer 3 of the network stack.
A high-performance IP protocol, such as DDS, is an example of a very low latency protocol that uses UDP and can provide sub-millisecond response times across the same Ethernet switch. While DDS may not be oriented toward large file transfers, a two-way DDS session might be used as an out-of-band communication network to control rate of transfer, request retransmissions, etc.
If you have an Enterprise-grade Ethernet switch (such as from Cisco or Juniper), utilizing Jumbo packets can help eliminate the percentage of headers that make up the data stream to squeeze even more performance out of the system.
Another mechanism found in Enterprise-grade Ethernet switches is the network QoS settings that could prioritize certain network traffic (i.e.: the file copy protocol) over less-critical, less-time-sensitive data traffic.
If you want to send a SINGLE file image to MULTIPLE receivers, one area where the network switch MAY be able to provide some additional mechanisms to keep file transfer times to a minimum is to utilize IP MULTICAST. The beauty of IP MULTICAST is that ONE data stream can be effectively "split" into two or more data streams by the Layer 3 routing function within the Enterprise Ethernet Switch. This means that the SOURCE end of the file copying transaction would only send a SINGLE data stream with the file contents while MULTIPLE servers could receive their own individual copies of that exact same data stream. In Multicast, each receiver "subscribes" to that same multicast data stream that is being transmitted as ONE stream into the Layer 3 Enterprise-Grade switch. Assuming that the Multicast function of the data switch is capable of handling these extremely high-speed data streams, you would only transmit the file content ONCE to the switch and it would then distribute that stream to each receiving server that subscribed to that multicast. In the case of retransmit requests, they could be handled out-of-band via a high performance protocol, such as DDS, which could request the specific data block that was not received. This requires some programming to be performed that would coordinate the "out-of-band" DDS-based CONTROL messaging from the "BEARER" protocol that is used to transmit the file image data. Also, the software would need to take the steps to either sign up as the SOURCE of the multicast or as one of the DESTINATIONS of the multicast. Once initiated, a virtually unlimited number of DESTINATIONS could subscribe to the multicast and they would all receive a copy of the data stream SIMULTANEOUSLY (think of it as a limited broadcast).
NOTE: Even if the IP Multicast processing presented a performance slowdown, the overall transmission time should still be less than sending multiple unicasts one machine at a time. In the case of MULTICAST, we let the NETWORK perform part of the heavy lifting. Interestingly, that is how video data streams are delivered to thousands of endpoints simultaneously over large Enterprise networks - like the one where I work.
Also, you might want to consider creating a delta file PRIOR to transmitting to the destination servers. There are technologies such as those used by Harman RedBend MSM that pre-processes the before version and the updated version of the file to create a DELTA file. This technology was originally developed as a way to minimize network impact on cellular carrier networks when transmitting new firmware updates over the air to mobile devices.
One last technology that can be interesting is that which is used to keep remote SAN (Storage Area Network) clusters synchronized with each other. In those cases, only disc BLOCKS that have been altered since the last transmission are sent (another method of a differential update).
Anyway, it would be VERY interesting to experiment with a HYBRID approach using more than just one of these technologies to improve overall throughput of these file transfers in a complementary manner.
We we're a bit surprised when we look at UDP protocols before writing FDT. Specifically we found that there was no advantage to using UDP in the cases we tested on. We didn't drill deeply into the reasons why, but I assume that, once you get up to speeds well above 1 Gbps, there's a higher per-packet overhead in UDP because the OS provides you less support for sending groups of packets in bulk.
Thanks for the suggested about Molecular Sequence Reduction algorithms.