This user manual will guide you through the process of uploading data to the DataHub infrastructure via the drop zones method.
Before you start!
Prerequisites and general conditions for uploading new data are:
- You have been assigned a project from the DataHub intake interview. If not, please refer to this page: Starting a new project
- You are connected to the UM- or MUMC-network. Either physically or via VPN.
- You are using a recent version of a compatible web browser
- You have an UM- or MUMC-account
- You have the drop zones network drive mounted:
- (Windows users) A drive W:\ should be visible from Windows Explorer - My Computer
- (Mac en Linux users)... Make sure to mount the SMB/CIFS network location \\um-nas201.unimaas.nl\RIT-iRODS-ingestzones\prod
- The data that you are uploading does not contain any personal identifiable information, such as patient names, addresses, BSN-registrations etc.
The end result will be a data set that is:
- annotated with ontology-enriched metadata;
- safely stored in the iRODS storage environment;
- findable (on metadata) for all MUMC+ users;
- accessible by the members of your project.
If you meet the requirements listed above, you can begin uploading a new data set (also called Collection) to the DataHub infrastructure.
Step 1 - Open the portal
Point your browser to https://datahub.mumc.maastrichtuniversity.nl/
Use the menu on the right side to login with your UM- or MUMC+ credentials.
Don't forget to check the mark with the I'm not a robot Captcha.
Step 2 - Listing drop zones
After logging in, you should be able to click the Drop zones item in the menu at the left side. This will generate a list of drop zones that are currently open for you.
If it is your first time, you will see an empty screen with a parachute.
Step 3 - Create a new drop zone and enter metadata
Click on New drop zone. This will open a new form. In this form you fill out meta-data associated with the data set that you want to store in the DataHub infrastructure.
The first option requires you to select the project. Here you choose the project to which the to-be-uploaded data set belongs to.
Should your project not be listed here, please create a new one or contact your DataHub contact person.
After you have picked a project, you have to enter metadata fields that are described in the table below.
|Field in the form||Explanation|
|Title||Enter the title that best describes your data.|
|Description||Enter a description that best describes your data here.|
|Date||Enter the date on which your data set was collected / your data set was finished. (Required)|
|Factors||Enter variables that influenced the outcome of your experiment. For example: age, gender or chemicals|
Now we arrive at the Organism field.
This field is special because data that is entered is being enriched with information from an ontology.
The advantage of using an ontology are:
- Less chance of typing errors;
- Consistent naming;
- The ability to enrich with external knowledge and adding semantics (= machine readability) to data
For more information about ontologies, please visit wikipedia.
Please enter the main organism of your data-set here.
For example, if your dataset mainly focuses on humans, choose the term for Homo Sapiens.
The autocomplete box also support the use of synonyms.
You can also type human", rat" or mouse" (etc.) to retrieve the proper ontology terms.
The next two fields Organ and Technology also have an ontology lookup mechanism.
In the field Organ you can enter the main body part on which your data set is generated.
Examples include heart, lung or bone.
The field Technology is meant to store the measurement technique that is primarily used in your data set.
Examples are Surgery, Western blot analysis or RNA sequencing,
Now you are offered to enter Related Publications.
Here you define publication(s) relevant to your data set. These publications may be written by yourself, but can also contain references to third-party publications.
Enter a valid DOI for the publication and press Add article. You can add as many related publications as you like.
The Creator field will be automatically filled based on the user that is currently logged in.
The Contacts table allows you to specify contact persons for this data set.
This is very important because people may find your data set interesting and would like to have an opportunity to contact you,
Finally, you can specify your Protocol on this form.
A name and a filename are requested.
If you have a specified a filename here, you MUST copy that file into your drop zone.
Press Submit to store the metadata information and return to the dropzones listing.
Step 4 - Access the new drop zone
By pressing Submit in step 3, you return to the listing of drop zones and will see a new card containg the information of your drop zone and most important, the internal name that this drop zone has been assigned (e.g. prickly porcupine).
The metadata that you've just entered has been saved in an XML file and the new drop zone is also created on your network drive W:
Now when you go to your W:\ network drive using Windows Explorer, you will see this folder appear.
Windows users who do not have the W:\ network disk , the drop zones can also be manually accessed by typing the following address into the address bar of Windows Explorer.
Linux and / or Mac users must create an SMB connection with address below.
Step 5 - Copy data to the drop zone
What you should do now is to add files that are part of your data set to the newly created folder. This may involve, for example, raw data, the derived data, and / or protocols.
Just copy-and-paste the data from your local hard drive (or USB-drive) to this drop zone folder.
Step 6 - Ingest the data
Now that you have finished uploading data in the appropriate drop zone, we will ingest the data into the iRODS system for persistent storage.
To do so, choose the appropriate drop zone card (here: prickly-porcupine) and click Edit
Please make sure that all the metadata is still correct and click Ingest at the very end of the form.
What happens now is that your data-set is moved from the temporary drop zone to a permanent place in the DataHub infrastructure.
Your will also see that the status of your data set changes sequentially to validating, ingesting and finally ingested.
After a succesfull ingest, the card will disappear from the list and the drop zone will be deleted from the network drive.
You can view your data via the Cloud Browser or the Data Warehouse applications. Please see Research data warehouse
Click on the screenshots to enlarge them!