Archive and Import a Dataset
Beginning with Brainspace v6.2, Brainspace Administrators have the ability to archive datasets to free-up disk space and then import archived datasets and associated work products back into Brainspace at a later date.
Note
If you are using Brainspace v6.0 through Brainspace v6.1.6, you must upgrade to Brainspace v6.2 to use the Archive and Import feature. If you have a separate Postgres server, this feature is not currently supported.
Archiving and importing datasets must be performed by Brainspace Administrators who have a basic understanding of Linux file management and command-line access to the Brainspace servers’ operating systems and database.
If this is your first attempt to archive and import a dataset, we recommend that you archive and then import a test dataset before deleting datasets from Brainspace after they have been archived. After a dataset is deleted from Brainspace, it cannot be recovered if it has not been archived correctly.
The archive and import process involves the following high-level steps:
Disable a dataset in the Brainspace user interface.
Archive a dataset in the command-line interface.
Note
The archive/import script does not handle build directories with spaces in their names, like those generated in Discovery 5.5 and older, and will require updates to the database. Please contact Brainspace Support before attempting to archive these datasets.
Import a dataset in the Brainspace user interface and command-line interface.
Activate a dataset in the Brainspace user interface.
Remap a dataset's fields.
Disable a Dataset
The first step in the archive process is to disable a dataset by changing its status from Active to Inactive.
To disable a dataset:
In the Brainspace user interface, click Administration. The Datasets screen will open.
Locate the dataset in the list, and then click the Change Dataset… status icon. A confirmation dialog will open.
Click the Disable button.
The Datasets screen will refresh, and the dataset's status will change from Active to Inactive, which indicates that the dataset has been disabled.
Archive a Dataset
After disabling a dataset, you are ready to archive it.
Note
The archive/import script does not handle build directories with spaces in their names, like those generated in Discovery 5.5 and older, and will require updates to the database. Please contact Brainspace Support before attempting to archive these datasets.
To archive a dataset:
As the root user in the command-line interface, run
/var/lib/brains/scripts/archive-brainspace-dataset.sh --archive
Type your brsarchive user password. A list of datasets available in your environment will appear. Note: If a user password was not previously created, the password entered will be used to create the user. Please remember this password.
Type the dataset’s ID number, and then press Enter on your keyboard.
Type the archive directory path (e.g., /data/brainspace/archive) where you would like to keep the archive, and then press Enter on your keyboard. After the script runs through the archive process and compresses the archive directory, the following confirmation message will display:
[2019-04-24 10:53:09] - INFO - Archive Completed. Please find all files located in /opsshared_data/apollo-data/brainspace/archive/TestingScript-04-24-2019_1040/TestingScript-bf73bd4a-ce7b-4daf-bae0-500880b4434c.tar.gz
[2019-04-24 10:53:09] - INFO -
[2019-04-24 10:53:09] - INFO - the checksum of the file is /opsshared_data/apollo-data/brainspace/archive/TestingScript-04-24-2019_1040/TestingScript-bf73bd4a-ce7b-4daf-bae0-500880b4434c.shazam
[2019-04-24 10:53:09] - INFO - To complete the process manually remove the files located in /opsshared_data/apollo-data/brainspace/archive/TestingScript-04-24-2019_1040/TestingScript
[2019-04-24 10:53:09] - INFO - And return to the Brainspace user interface
Navigate to the archive location identified in the confirmation message, and then verify that the dataset has been archived successfully.
Note
Copy and store the path for the directory location for future reference (e.g., /brainspace/archive/TestingScript-04-24-2019_1040/TestingScript-bf73bd4a-ce7b-4daf-bae0-500880b4434c.tar.gz). You will need this path when using the import script.
After successfully archiving a dataset, you can delete it from the Brainspace user interface to free-up disk space; however, when a dataset is deleted from Brainspace without being correctly archived, it cannot be recovered.
Import a Dataset
After archiving a dataset, you can import it into Brainspace at any time.
Note
If your archive is included on a separate or remote storage volume, the directory must be mounted or linked within /data/brainspace, or the archive tar.gz must be copied to /data/brainspace.
To import a dataset:
As the root user in the command-line interface, run
/var/lib/brains/scripts/import-brainspace-dataset.sh --expand-compressed-archive
Type your brsarchive user password. A list of datasets available in your environment will appear.
Note
If a user password was not previously created, the password entered will be used to create the user. Please remember this password.
Type the path of the archived tar.gz file as noted during the archive process, and then press Enter on your keyboard. You will be prompted to create a new dataset using a provided patch ending in /data as shown in the following example:
[2019-04-24 11:08:32] - INFO - Please use the following path in Brainspace UI to Load From Disk in newly created Dataset.
[2019-04-24 11:08:32] - INFO - /brainspace/archive/TestingScript-04-24-2019_1040/04-24-2019_1104/TestingScript/data
Note
The archived dataset can take an extended period of time to uncompress.
Create a dataset:
Important
When creating a new dataset, do not click the Build button.
In the Brainspace user interface, click Administration. The Datasets screen will open.
Click the Add Dataset button. The New Dataset dialog will open.
In the New Dataset dialog, type a dataset name, and then toggle switches in the Dataset Groups pane to add the new dataset to one or more groups.
Click the Create button.
Click the Choose Connector button, scroll to the bottom of the list, and then click Load Existing Dataset.
Type the path for the archived build folder.
Click the Create Dataset button.
Important
Do not click the Build button.
Close the window and wait for the dataset to enable.
Disable the dataset (see Disable a Dataset).
After creating and disabling the new dataset, return to the command-line interface, and then choose either Option 1 or Option 2.
When prompted, type the brsarchive user password, and then press Enter on your keyboard.
Type the new dataset ID, and then press Enter on your keyboard. The import process will begin.
After the import process has completed, you can remove the uncompressed directory and enable the dataset in the Brainspace user interface.
Activate a Dataset
After importing an archived dataset, you must enable the dataset to use it in Brainspace.
To enable a dataset:
In the Brainspace user interface, click Administration. The Datasets screen will open.
Locate the dataset in the list, and then click the Change dataset… icon. A confirmation dialog will open.
Click the Enable button.
The Datasets screen will refresh, and the dataset’s status will change from Inactive to Active. After activating the dataset, you must remap the dataset's fields.
Remap Dataset Fields
After activating the dataset, you must remap the dataset's fields.
To remap the dataset's fields:
Download the dataset's Schema XML report as described in Dataset Reports.
Navigate to the Field Mapping dialog:
In the user drop-down menu, click Administration:
The Datasets screen will open.
In the Datasets screen, locate the dataset, and then click the Settings icon:
The Dataset Settings dialog will open.
Click the Reconfigure Data Source icon.
The dataset configuration dialog will open.
Click the Proceed button.
The License Checks dialog will open.
Click the Proceed button.
The Field Mapping dialog will open.
Using the Schema XML report, remap the fields to recreate the original dataset mappings that existed before the dataset was archived, and then click the Continue button.
The Dataset Settings dialog will will refresh.
Click the Run This Build Type button next to Full Analytics with Ingest or Full Analytics without Ingest.
Choosing Full Analytics with Ingest will re-ingest the documents from the data source using the new field mapping that was configured previously.
Choosing Full Analytics without Ingest will rebuild the dataset with the fields that were mapped. The Schedule Build dialog will open.
Click the Build as soon as possible button.
Note
If you choose to build the dataset in the future, click the Schedule Build Time field, select a date and time, and then click the Save button.
The Datasets page will refresh and show the Dataset Queue build in progress:
While the build is in progress, you can click the View Status button to view the build steps in progress. For information on each step in the build process, see Build Steps.
After the build completes successfully, the dataset will move from the Dataset Queue to the list of active datasets in the Datasets page.