bup split concatenates the contents of the given files
(or if no filenames are given, reads from stdin), splits the
content into chunks of around 8k using a rolling checksum
algorithm, and saves the chunks into a bup repository.
Chunks which have previously been stored are not stored again (ie.
they are "deduplicated").
Because of the way the rolling checksum works, chunks tend to be
very stable across changes to a given file, including adding,
deleting, and changing bytes.
For example, if you use bup split to back up an XML dump
of a database, and the XML file changes slightly from one run to
the next, nearly all the data will still be deduplicated and the
size of each backup after the first will typically be quite small.
Another technique is to pipe the output of the tar(1) or
cpio(1) programs to bup split.
When individual files in the tarball change slightly or are added
or removed, bup still processes the remainder of the tarball
efficiently.
(Note that bup save is usually a more efficient way to
accomplish this, however.)
save the backup set to the given remote server.
If path is omitted, uses the default path on the remote
server (you still need to include the ':')
-b, ---blobs
output a series of git blob ids that correspond to the chunks in
the dataset.
-t, ---tree
output the git tree id of the resulting dataset.
-c, ---commit
output the git commit id of the resulting dataset.
-n, ---name=name
after creating the dataset, create a git branch named name
so that it can be accessed using that name.
If name already exists, the new dataset will be considered
a descendant of the old name.
(Thus, you can continually create new datasets with the same name,
and later view the history of that dataset to see how it has
changed over time.)
-q, ---quiet
disable progress messages.
-v, ---verbose
increase verbosity (can be used more than once).
---git-ids
stdin is a list of git object ids instead of raw data.
bup split will read the contents of each named git object
(if it exists in the bup repository) and split it.
This might be useful for converting a git repository with large
binary files to use bup-style hashsplitting instead.
This option is probably most useful when combined with
--keep-boundaries.
---keep-boundaries
if multiple filenames are given on the command line, they are
normally concatenated together as if the content all came from a
single file.
That is, the set of blobs/trees produced is identical to what it
would have been if there had been a single input file.
However, if you use --keep-boundaries, each file is split
separately.
You still only get a single tree or commit or series of blobs, but
each blob comes from only one of the files; the end of one of the
input files always ends a blob.
---noop
read the data and split it into blocks based on the
"bupsplit" rolling checksum algorithm, but don't do
anything with the blocks.
This is mostly useful for benchmarking.
---copy
like ---noop, but also write the data to stdout.
This can be useful for benchmarking the speed of
read+bupsplit+write for large amounts of data.
---bench
print benchmark timings to stderr.
---max-pack-size=bytes
never create git packfiles larger than the given number of bytes.
Default is 1 billion bytes.
Usually there is no reason to change this.
---max-pack-objects=numobjs
never create git packfiles with more than the given number of
objects.
Default is 200 thousand objects.
Usually there is no reason to change this.
---fanout=numobjs
when splitting very large files, never put more than this number of
git blobs in a single git tree.
Instead, generate a new tree and link to that.
Default is 4096 objects per tree.
---bwlimit=bytes/sec
don't transmit more than bytes/sec bytes per second to the
server.
This is good for making your backups not suck up all your network
bandwidth.
Use a suffix like k, M, or G to specify multiples of 1024,
10241024, 10241024*1024 respectively.
EXAMPLE
$ tar -cf - /etc | bup split -r myserver: -n mybackup-tar
tar: Removing leading /' from member names
Indexing objects: 100% (196/196), done.