Creating a dataset and uploading the data
Let's have a look at the available data before we define the corresponding dataset object.
Customers table
The data we will visualize in this tutorial represent our customers. The anonymized table contains each customer's internal ID, city, address, sex, age group and most importantly - latitude and longitude of the address. The table also contains code of the neighborhood to which the address belongs, which we'll use later.
This tutorial contains geocoded data - we know the latitude and longitude of the address.
This does not always have to be the case. In case you need to geocode your data - use one of the web services, e.g. OpenCage Geocoder, or contact us.
The CSV file can be downloaded here: customers.csv
Name | Title | Data type |
---|---|---|
| Customer ID | integer |
| Neighborhood code | string |
| Customer's city | string |
| Customer's address | string |
| Customer's sex | string |
| Customer's age group | string |
| Address latitude | latitude |
| Address longitude | longitude |
Download the CSV file and put it in the /data
folder of your dump.
Creating a dataset
Now, we will create the corresponding dataset. Dataset object has some specifics which differ it from other metadata objects:
it contains
properties
withfeatureTitle
andfeatureSubtitle
settingshas a
ref
object instead ofcontent
The properties.featureTitle
and properties.featureSubtitle
properties specify the content of the tooltip shown when hovering the dataset's features in the map. In this case, it will be customer_id
and an address
.
Now, to the ref
object. The type
of the dataset is dwh
, and the subtype
is geometryPoint
, because the table represents customers' addreses (points) that have a latitude and longitude. The table's primaryKey
is the customer_id
property. In the visualizations
object, we say that we want to visualize it as a dotmap
and a heatmap
(the only two available for geometryPoint
). The dataset is not categorizable
by default, and none of its properties are filterable
, as they will not appear in filters (more about filters later). It's data are also not allowed to be searched by full text search - fullTextIndex
property.
The zoom
object at the end can be used to modify the zoom levels for the dotmap
visualization. This can be handy when there's a lot of dots, which could be a performance problem.
The dwh ref.properties
list must correspond with the columns in the CSV file, including order and data types.
Customers dataset syntax
{
"name": "customers",
"type": "dataset",
"title": "Customers",
"description": "Customers registered in the loyalty program.",
"properties": {
"featureTitle": {
"type": "property",
"value": "customer_id"
},
"featureSubtitle": {
"type": "property",
"value": "address"
}
},
"ref": {
"type": "dwh",
"subtype": "geometryPoint",
"visualizations": [
{
"type": "dotmap"
},
{
"type": "heatmap"
}
],
"primaryKey": "customer_id",
"categorizable": true,
"fullTextIndex": false,
"properties": [
{
"name": "customer_id",
"title": "Customer ID",
"column": "customer_id",
"type": "integer",
"filterable": false
},
{
"name": "neighborhood_code",
"title": "Neighborhood code",
"column": "neighborhood_code",
"type": "string",
"filterable": false
},
{
"name": "city",
"title": "City",
"column": "city",
"type": "string",
"filterable": false
},
{
"name": "address",
"title": "Aaddress",
"column": "address",
"type": "string",
"filterable": false
},
{
"name": "sex",
"title": "Sex",
"column": "sex",
"type": "string",
"filterable": true
},
{
"name": "age_group",
"title": "Age group",
"column": "age_group",
"type": "string",
"filterable": true
},
{
"name": "lat",
"title": "Address latitude",
"column": "lat",
"type": "latitude",
"filterable": false
},
{
"name": "lng",
"title": "Address longitude",
"column": "lng",
"type": "longitude",
"filterable": false
}
],
"zoom": {
"min": 7,
"optimal": 9,
"max": 18
}
}
}
Using your text editor, save this dataset as customers.json
to the /metadata/datasets
subdirectory in your dump directory.
Using the status
command, the dataset and the corresponding CSV file will be listed as new.
Use addMetadata
to add the dataset to the project, and pushProject
to upload the CSV file.
tomas.schmidl@secure.clevermaps.io/project:k5t8mf2a80tay2ng/dump:$ status
Checking status of project k5t8mf2a80tay2ng (First project) against dump ...
No files have been modified locally
No files have been modified on the server
2 new files have been detected:
/var/local/metadata/k5t8mf2a80tay2ng/metadata/datasets/customers.json
/var/local/metadata/k5t8mf2a80tay2ng/data/customers.csv
tomas.schmidl@secure.clevermaps.io/project:k5t8mf2a80tay2ng/dump:$ addMetadata
Adding all new objects to the server...
Added object customers.json
1 new object has been successfully uploaded to project k5t8mf2a80tay2ng
tomas.schmidl@secure.clevermaps.io/project:k5t8mf2a80tay2ng/dump:$ pushProject
No metadata objects were changed - nothing to push
Asynchronous data upload started...
CSV file customers.csv successfully loaded into dataset customers (4822 rows loaded)
DWH data of project k5t8mf2a80tay2ng successfully updated from dump
Validating DWH model/data integrity of project k5t8mf2a80tay2ng...
OK
That's it! In the next chapter of this tutorial, we will define a metric and an indicator to finally see the data in the map.