Effortless File Download: Transferring Files from Databricks File System to Your Local Computer

Introduction

Databricks provides a FileStore feature, a special folder within Databricks File System (DBFS), offering accessibility to files for web-based applications, particularly within HTML and JavaScript, allowing the downloading of output files to the local desktop and facilitating data processing by uploading files from the local desktop to Databricks.

FileStore and its functionality

The FileStore comprises essential directories where specific files are stored:

  • /FileStore/jars: Contains libraries uploaded. Deleting files in this directory might impact linked libraries in your workspace.
  • /FileStore/tables: Holds files imported via the UI. Deleting files here may affect the accessibility of created tables.

Save Files to FileStore

In the below notebook, we create a DataFrame with 3 columns and 3 rows and save it as a CSV file in the FileStore directory within the Databricks File System (DBFS). The DataFrame is written in CSV format using the write.csv method with the “header” option set to “true” to include the column headers in the CSV file.

To access the stored file in FileStore, navigate to Catalog, then browse DBFS, and open FileStore accordingly:

Note: If the “Browse DBFS” feature is inaccessible, proceed by clicking on Admin Settings located on the right corner next to your user. Then, navigate to Workspace Settings -> Advanced -> enable the DBFS File Browser from there.

Download Files from FileStore

To download a file stored in the /FileStore directory in Databricks, you would use this URL pattern https://<databricks-instance>/files/<filename> and replace <databricks-instance> with the appropriate URL of your Databricks workspace and <filename> with the name of the file that you want to download.

Consider the scenario where we wish to retrieve the ’employees_filestore.csv’ file, stored in the ‘/Filestore/employees_filestore.csv‘ location. Although the file’s actual path is ‘/Filestore/employees_filestore.csv’, it is accessed and downloaded through the URL: ‘https://adb-XXXXXXXXXXXXXX.azuredatabricks.net/files/employees_filestore.csv’.

In conclusion, FileStore stands as a very useful component within Databricks DBFS, facilitating seamless exchanges between local systems and the Databricks platform.