None
Version Data¶
With log_data_version
and log_s3_data_version
helpers you can log data location and data hash to Neptune. It will be stored as property and can be viewed both in the Details
section of an experiment:
and in the experiment dashboard as a column.
Check this example project to see more.
File data version¶
[ ]:
from neptunecontrib.versioning.data import log_data_version
FILEPATH = '/path/to/data/my_data.csv'
with neptune.create_experiment():
log_data_version(FILEPATH)
Folder data version¶
[ ]:
from neptunecontrib.versioning.data import log_data_version
DIRPATH = '/path/to/data/folder'
with neptune.create_experiment():
log_data_version(DIRPATH)
S3 bucket data version¶
We can log both a version of a particular key
which is similar to file versioning.
[ ]:
BUCKET = 'my-bucket'
PATH = 'training_dataset.csv'
with neptune.create_experiment():
log_s3_data_version(BUCKET, PATH)
We can log a combined version of all the keys
that start with a particular string which is similar to versioning a directory
[ ]:
BUCKET = 'my-bucket'
PATH = 'train_dir/'
with neptune.create_experiment():
log_s3_data_version(BUCKET, PATH)
Prefixing¶
If you want to track multiple data sources make sure to prefix them before logging. For example:
[ ]:
from neptunecontrib.versioning.data import log_data_version
FILEPATH_TABLE_1 = '/path/to/data/my_table_1.csv'
FILEPATH_TABLE_2 = '/path/to/data/my_table_2.csv'
with neptune.create_experiment():
log_data_version(FILEPATH_TABLE_1, prefix='table_1_')
log_data_version(FILEPATH_TABLE_2, prefix='table_2_')