Remote access to data¶
Video
Watch this in our courses: 2022 February, 2021 January
The cluster is just one part of your research: most people are constantly transferring data back and forth. Unfortunately, this can be a frustrating experience if you haven’t got everything running smoothly. In this tutorial, we’ll explain some of the main methods. See the main storage tutorial first.
Abstract
Data is also available from other places in Aalto, such as desktop workstations in some departments, shell servers, and https://vdi.aalto.fi.
Transferring data is available via ssh (the standard
rsync
andsftp
)Data can be mounted remotely using ssh (
sshfs
, from anywhere with ssh access) and SMB mounting on your own computer (within Aalto networks, Linux/mac:smb://data.triton.aalto.fi/PATH
, Windows:\\data.triton.aalto.fi\PATH
and uses\
,PATH
could bework/USERNAME
orscratch/DEPT/GROUPNAME
)
History and background¶
Historically, ssh
transfers have been the most common (which
includes rsync
(recommended these days), scp
, sftp
, and
various other graphical programs that use these protocols) - and this
is still the most robust and reliable method. There are
other modern methods, but they require other things.
There are two main styles of remote data access:
Transferring data makes a new copy on the other computer. This is generally efficient for large data.
Remote mounting makes a view of the data on the other computer: when you access/modify the data on the other computer, it transparently accesses/modifies in the original place without making a copy. This is very convenient, but generally slow.
We have this already set up for you from many computers at Aalto.
Data availability throughout Aalto¶
Data is the basis of almost everything we do, and accessing it seamlessly throughout Aalto is a great benefit. Various other Aalto systems have the data available. However, this varies per department: each department can manage its data as it likes. So, we can’t make general promises about what is available where.
Linux shell server mounts require a valid Kerberos ticket (usually
generated when you log in). On long sessions these might expire, and
you have to renew them with kinit
to keep going. If you get a
permission denied, try kinit
.
Virtual desktop interface¶
VDI, vdi.aalto.fi, is a Linux workstation
accessible via your web browser, and useful for a lot of work. It is
not Triton, but has scratch mounted at /m/triton/scratch/
.
Your work folder can be access at /m/triton/scratch/work/USERNAME
. For SCI departments the
standard paths you have on your workstations are also working /m/{cs,nbe}/{scratch,work}/
.
Shell servers¶
Departments have various shell servers, see below. There isn’t a generally available shell server anymore.
NBE¶
On workstations, work directories are available at /m/nbe/work
and group scratch
directories at /m/nbe/scratch/PROJECT/
, including the shell
server amor.org.aalto.fi
.
PHYS¶
Directories available on demand through SSHFS. See the Data transferring page at PHYS wiki.
CS¶
On workstations, work directories are available at /m/cs/work/
, and group scratch
directories at /m/cs/scratch/PROJECT/
. The department shell
server is magi.cs.aalto.fi
and has these available.
Remote mounting¶
There are many ways to access Triton data remotely. These days, we recommending figuring out how to mount the data remotely, so that it appears as local data but is accessed over the network. This saves copying data back and forth and is better for data security, but is slower and less reliable than local data.
Remote mounting using SMB¶
By far, remote mounting of files is the easiest method to transfer files. If you are
not on the Aalto networks (wired, eduroam
, or aalto
with
Aalto-managed laptop), connect to the Aalto VPN first. Note that
this is automatically done on some department workstations (see
above) - if not, request it!
The scratch filesystem can be remote mounted using SMB inside secure Aalto networks at the URLs
scratch:
\\data.triton.aalto.fi\scratch\
.work:
\\data.triton.aalto.fi\work\%username%\
.
To access these folders: To do the mounting, Windows Explorer → Computer → Map network drive → select a free letter.
scratch:
smb://data.triton.aalto.fi/scratch/
.work:
smb://data.triton.aalto.fi/work/USERNAME/
.
To access these folders: Finder → Go menu item → Connect to server → use the URLs above.
scratch:
smb://data.triton.aalto.fi/scratch/
.work:
smb://data.triton.aalto.fi/work/USERNAME/
.
To access these folders: Files → Left sidebar → Connect to server → use the URLs above. For other Linuxes, you can probably figure it out. (It varies depending on operating system, look around in the finder)
From Aalto managed computers, you can use lgw01.triton.aalto.fi
instead of data.triton.aalto.fi
and it might auto-login.
Depending on your OS, you may need to use either your username
directly or AALTO\username
.
Warning
In the future, you will only be able to do this from Aalto managed computers. This remote mounting will really help your work, so we recommend you to request an Aalto managed computer (citing this section) to make your work as smooth as possible (or use vdi.aalto.fi, see below.
Remote mounting using SFTP (Linux and Mac)¶
You can use sftp and one of the shell servers to mount triton directly to your machine.
Easiest is to set up your ssh config (.ssh/config
) on your machine as follows (replace USERNAME
by your username):
Host kosh
User USERNAME
Hostname kosh.aalto.fi
Host triton_via_kosh
User USERNAME
Hostname triton.aalto.fi
ProxyJump kosh
Instead of kosh
, you can also use taltta
or any other shell server (see Remote Access) as a proxy to jump the firewall.
You can now open a graphic file manager that supports the sftp protocol (e.g. Files on Aalto Linux), and open:
sftp://triton_via_kosh
which will direct you to the root folder of triton. To access scratch use:
sftp://triton_via_kosh/scratch
And to access your home folder use:
sftp://triton_via_kosh/home/USERNAME
Remote mounting using sshfs¶
sshfs
is a neat program that lets you mount remote filesystems via
ssh only. It is well-supported in Linux, and somewhat on other
operating systems. Its true advantage is that you can mount any
remote ssh server - it doesn’t have to be set up specially for SMB or
any other type of mounting. On Ubuntu, you can mount by “File → Connect to
server” and using sftp://triton.aalto.fi/scratch/work/USERNAME
.
This also works from any shell server with data (see previous section).
The below uses command line programs to do the same, and makes the
triton_work
on your local computer access all files in
/scratch/work/USERNAME
. Can be done with other folders.:
mkdir triton_work
sshfs USERNAME@triton.aalto.fi:/scratch/work/USERNAME triton_work
Note that ssh
binds together many ways of accessing Triton (and
other servers), with a
similar syntax and options. Learning to use it well is a great
investment in your future. Learn more about ssh on the ssh
page.
For Aalto Linux workstation users: it is recommended that you mount /scratch/
under the local disk /l/
. You should be able to create the subfolder folder under /l/
and point sshfs to that subfolder as in the example here above.
Transferring data¶
This section tells ways you can copy data back-and-forth between Triton and your own computers. This may be more annoying for day-to-day work but is better for transferring large data.
Using sftp¶
The SFTP protocol uses ssh to transfer files. On Linux and Mac, the
sftp
command line program are the must fundamental way to do this,
and are available everywhere.
A more user-friendly way of doing this (with a nice GUI) is the Filezilla program. Make sure you are using Aalto VPN, then you can put triton.aalto.fi as SFTP server with port 22.
Below is an example of the “raw” SFTP usage:
# Copying from HOME to local PC
user@pc123 $ sftp user12@triton.aalto.fi:filename
Connected to triton.aalto.fi.
Fetching /home/user12/filename to filename
# copying to HOME
user@pc123 $ sftp -b - user12@triton <<< 'put testCluster.m'
sftp> put foo
# copying to WRKDIR
user@pc123 $ sftp -b - user12@triton:/scratch/work/USERNAME/ <<< 'put testCluster.m'
...
With all modern OS it is also possible to just open your OS file manager (e.g. Nautilus on Linux) and just put as address in the bar:
sftp://triton.aalto.fi
If you are connecting from remote and cannot use the VPN, you can connect instead to department machines like kosh.aalto.fi, amor.org.aalto.fi (for NBE). The port is 22. Note: If you do not see your shared folder, you need to manually specify the full path (i.e. the folder is there, just not yet visible).
Using rsync¶
Prerequisites
To install rsync on windos please refer to this guide
Rsync is similar to sftp, but is smarter at restarting files. Use rsync
for large file transfers. rsync
actually uses the ssh protocol so
you can rsync
from anywhere you can ssh
from. rsync
is installed
by default on Linux and Mac terminals. On Windows machines we recommend using GIT-bash.
While there are better places on the internet to read about rsync, it is good to try it out to sychronise a local folder on your triton’s scratch. Sometimes the issue with copying files is related to group permissions. This command takes care of permissions and makes sure that all your local files are identical (= same MD5 fingerprint) to your remote files:
rsync -avzc -e "ssh" --chmod=g+s,g+rw --group=GROUPNAME PATHTOLOCALFOLDER USERNAME@triton.aalto.fi:/scratch/DEPT/PROJECTNAME/REMOTEFOLDER/
Replace the bits in CAPS with your own case. Briefly, -a
tries to preserve all attributes of the file, -v
increases verbosity to see what rsync is doing, -z
uses compression, -c
skips files that have identical MD5 checksum, -e
specifies to use ssh (not necessary but needed for the commands coming after), --chmod
sets the group permissions to shared (as common practice on scratch project folders), and --group
sets the groupname to the group you belong to (note that GROUPNAME == PROJECTNAME on our scratch filesystem).
If you want to just check that your local files are different from the remote ones, you can run rsync in “dry run” so that you only see what the command would do, without actually doing anything.:
rsync --dry-run -avzc ...
Sometimes you want to copy only certain files. E.g. go through all folders, consider only files ending with py
:
rsync -avzc --include '*/' --include '*.py' --exclude '*' ...
Sometimes you want to copy only files under a certain size (e.g. 100MB):
rsync -avzc --max-size=100m ...
Rsync does NOT delete files by default, i.e. if you delete a file from the local folder, the remote file will not be deleted automatically, unless you specify the --delete
option.
Please note that when working with files containing code or simple text, git is a better option to synchronise your local folder with your remote one, because not only it will keep the two folders in sycn, but you will also gain version controlling so that you can revert to previous version of your code, or txt/csv files.
Version control¶
Don’t forget that you can use version control (git, etc.) for your code and other small files. This way, you transfer to/from Triton via a version control server (Aalto Gitlab, Github, etc). Often, one would develop locally (committing often of course), pull on Triton, do whatever some minor development directly on Triton to make it work there, then push back to the server.
Exercises¶
RemoteData-1: Mounting your work directory
Mount your work directory by SMB - and alternatively sftp or sshfs - and transfer a file to Triton.
Note that you must be connected to the Aalto VPN (from outside campus), or on eduroam
, the aalto
with Aalto
laptop (from campus).
(advanced) RemoteData-2: rsync
If you have a Linux or Mac computer, study the rsync
manual page and try to transfer a file.
What’s next?¶
The next tutorial is about interactive jobs.