Downloading a Dataset in Salesforce Analytics Cloud

When using Salesforce Analytics Cloud there maybe instances where you require to download data held in Salesforce Analytics Cloud. Using the example of the OnTime dataset - which can be found here. Downloading a CSV of a dataset in Salesforce Analytics cloud, can now be done using the Salesforce Dataset Utility.

The Salesforce Dataset Utility can be found here:

The Salesforce Dataset Utility, is a client side tool, with many features and functions to assist with datasets on Salesforce Analytics Cloud. Features include:

1. Uploading Datasets
2. Delete a Dataset
3. Edit Dataset XMD
4. Edit Dataflow JSON

To start Salesforce Dataset Utility , follow the instructions on the tools github page. Found Here:

Once authenticated you will be met with a list of all datasets included in your instance, select which dataset you would like to download, and click on the dataset name.

Once you have selected a dataset ( I have selected OnTime ), You will be met with a default query including a number of columns and 100 rows. To access to look or edit the query that is pulling the data from Salesforce Analytics Cloud, click on the Action Button drop down in the top right hand corner and select "EDIT SAQL".

Once the "Edit SAQL" button is pressed you will see the SAQL Query Editor, allowing you to adjust the query on the page. In order for you to download the dataset all of the columns will need to be included as well as the rows.

In order for you to download the dataset you will need to understand how many rows the dataset has, this can be done by looking at the "count of rows" in the dataset's lens. The maximum number of rows that I have been able to download at once is 75000. Anything above this amount times out, and therefore requires that seperate queries be created to ensure that you can download a complete dataset.

To adjust the number of rows being called at a single time the query limit needs to be adjusted. In this example the query looks like the above Gist, however to download more rows the query needs to be adjusted:

From:

q = limit q 100;

To:

q = limit q 3000;

Once adjusted rerun the query, when the dataset has been queried with the new SAQL limit and the query data has been downloaded locally, it will be presented in the dataset utility app. You can now use the "Export to CSV" button to export the query results to a CSV file.

Below is the results of using a query for 3000 rows on the OnTime dataset and exporting the results to CSV. As you can see in the image below, 3000 rows have been downloaded.

Uploading a Dataset into Salesforce Analytics

One of the first steps when looking to utilise and work with Salesforce Analytics is investigating how to get data into the platform. There are a few ways to upload a dataset into Salesforce Analytics:

  1. ELT Platform - hosted on Salesforce Analytics instance the ELT tool is used to pull data from the Salesforce instance it is connected to
  2. Dataset UI - Via the UI you can goto the upload page for the Dataset and choose a local file to upload
  3. Dataset Utility - A Salesforce built JAVA utility that will use the Salesforce API to upload a dataset into Salesforce Analytics
  4. A Middleware or ETL tool - (Informatica, Mulesoft, Snaplogic) A third party integration tool of which many have connectors that will automatically integrate into Salesforce Analytics and can take current data being loaded and transformed and can push it into Salesforce Analytics.

The easiest way to get data into the Salesforce Analytics platform is to use the UI and upload a CSV through the Dataset Utility. Choose CSV under "Select a Data Source"

 
 

Once you have chosen to upload a CSV , you will be met with the dataset page where you can detail:

  • The CSV file to be uploaded.
  • A custom JSON file
  • Dataset Name
  • App - Where you would like to store the dataset

There is a current limit on the size of the CSV file you can upload via the UI process - 500MB. You can use a variety of other options for example the Dataset Utility , API, or Middleware tools if you require a larger dataset to be imported into Salesforce Analytics.

We will now look at the easiest of all the ways to upload a dataset into Salesforce Analytics, upload a CSV. The first step is to find an appropriate dataset to upload. Any CSV dataset is fine, one that is commonly used for testing and navigating through a variety of measures and attributes on the platform is a dataset called ontime. Ontime is a dataset including measures and attributes from the US airline industry. You can find this demo dataset here:

US Airlines Ontime Dataset

Once you have the dataset downloaded, you can go to the Dataset UI and choose to upload this dataset. Give the dataset a name on the left handside and also a App or place where to store the dataset. Your private app is a good place, this way no one can access the dataset automatically. When you start the uploading process you have the option to keep the data in a .ZIP format or upload as a CSV file. If you would like to have the measures and attributes configured for you , you will need to upload as a CSV file.

Once the file has been uploaded, it may take some time dependent on the size of the dataset. Navigate to where you stored the dataset and open it up. You will be met with your dataset with the results of how many rows of data are in the dataset. To verify that the measures and attributes have been generated by Salesforce Analytics, create a few groupings and measures to see if the data is correct. In the case of the dataset Ontime - Choose the following:

  • Sum of Flights
  • Sum of DepartureDelayMinutes
  • Group by Carrier

Looking at the results, you can see the total amount of minutes that American Airlines was delayed for during the total amount of flights held in the dataset. In this case it looks like the dataset has been uploaded and measures and attributes have been defined successfully.

 

Salesforce Analytics - On time Dataset - Measures and Attributes