The Data Quantity Report page allows you to query the number of data records each participant has uploaded per unit of time during their participation period, for a given data source. This provides a quick way to monitor the data quantity and spot non-compliant situations which might require intervention.
One of the main uses of these graphs is to show whether data is available for a given time window or not, and if not, whether data is missing only for a specific data source, or from all data sources. Such information can be interpreted as follow:
- If the data is available for all data sources, it means the app was functioning as expected during that time window (though the quality of data should be investigated separately).
- If the data is missing for a set of data sources, it shows the app was functioning properly, though an external factor was preventing the app from collecting data for that specific data source. For example, while the app was running, the user had manually turned off the GPS. So the app could collect motion sensors, but not GPS data.
- If the data is missing from all data sources, it's possible that the app was not operational during that time window.
The graphs presented on this page do not represent the quality of the data provided. For example, assume your study requires capturing GPS data. The data quantity report can easily show how often the participant provided GPS data, but it doesn't distinguish between cases where the participant had forgotten to carry their phone and abandoned it on the desk, versus the cases where the participant carried the phone with her at all times.
To access the graphs on data quantity, navigate to the
Participation -> Data Quantity Report page for your study. The page allows you to select one or more participants, one or more data sources, a time period, and a unit of time. It then plots the graph for those parameters. For example, in the image below we are requesting the compliance report for participant #231, from May 8th to June 7th, 2016, for
Wi-Fi data sources. We are also asking the data to be aggregated per day.
Go will extract the number of records uploaded by participant #231 for each of these data sources, for each day in the specified period, as shown in the image below. You can see that each data source generates a different number of records per day. For example, Wi-Fi has generated between 5000 to 15,000 records per day (it can be said user's device recorded 10,000 Wi-Fi access points in proximity, where likely many of the access points were visited more than once), while GPS records were usually below 10,000 records per day, and for Bluetooth it's mostly below 1500 records.
It's important to note that the numbers shown for each data source cannot be compared to other data sources. For example, recording 100 Bluetooth data in 1 hour indicates participant's device was in proximity of less than 100 Bluetooth devices (as many might have been reported multiple times), and it cannot be compared in any form to recording 1000 GPS locations in the same time window.
For each graph, if you move your cursor on the plotted bar, you can see the time and the number of records represented by that data point. You also can drag your cursor over a specific period to zoom-in, or double-click on the graph surface to zoom-out.
The following image shows the number of GPS records uploaded from a given participant during May 8th, 2016:
You can see the first spike in the number of collected records is at 7 am, which shows there have been more than 600 GPS records at this hour. It means the participant has uploaded more than 600 GPS records from May 8th, 2016 7:00 am to May 8th, 2016 7:59 am, inclusive. The timezone of these values is the participant's local timezone.
This graph also shows starting from 7 am, the participant's device has uploaded considerably more GPS data until 4 pm, at which point again the device has reported a modest number of GPS records. This is in accordance with GPS data collection logic (as explained in the GPS data source description), which tries to monitor participant's mobility, and record GPS data only if necessary.
Therefore, this graph can be interpreted as follow:
The participant has visited many locations during the day (from 7 am to 4 pm), while after 4 pm and before 7 am, she has been mainly in the place and has not moved as much.
The following image shows the number of (potentially duplicate) Wi-Fi access points observed by the participant's device per hour, during May 8th.
The peak point in the graph shows the participant's device has recorded 1400 data points indicating proximity to access points from 4:00 pm to 4:59 pm inclusive. Note that these access points are not unique, so it does not mean the participant was in proximity of 1400 unique access points. Ethica scans for access points in proximity on average every 5 minutes, and assuming the participant is been stationary for the whole hour, the same access point can be scanned 12 times. Therefore a given access point can be counted on average 12 times per hour.
While the number shown here does not indicate unique access points, the higher number usually indicates the denser Wi-Fi networks in proximity, which is a good indication of a densely populated, mostly commercial, areas.
Survey responses plot shows the number of responses recorded at each time unit (which was set to
Day for the following graph). In this graph, the number of responses which are fully or partially Completed by the participant are shown in green, the survey prompts which were Canceled are shown in red, and the prompts which were Expired are shown in gray. You can read more about what is counted as completed, expired, or canceled survey in the Surveys section.
Selecting Bluetooth in the data sources section will plot the number of Bluetooth devices scanned in proximity per selected unit of time. Similar to Wi-Fi access points, the number of visited Bluetooth devices does not show unique devices visited, but each device can potentially be scanned on average 12 times per hour, once for every 5-minute interval which the destination device is been in proximity of the participant's device.
The following graph shows the number of devices visited by a participant on May 7th, 2016. You can see that no device is recorded from 3:00 am to 2:00 pm. This can be due to multiple reasons:
- There have been no Bluetooth devices in proximity.
- Participant has turned off the Bluetooth on her device to save battery.
- The phone or the app has turned off and was not operating at that time.
To distinguish between these scenarios, we can use another data source to cross check. The battery is usually a good choice in these cases, as it's always available and cannot be disabled by the participant. The following graph shows the number of data recorded per hour for both Battery and Bluetooth data sources:
Using the plots shown here we can find out how much data is uploaded by each participant at any period of time. Here we describe the main reasons causing a gap in each participant's data.
Some sensors can be turned off externally, such as Wi-Fi, Bluetooth, or GPS, which prevents data collection from that sensor. In this case, Ethica shows a relevant notification to the user, letting them know the study is partially interrupted and they need to turn on the sensor to resume the data collection. This article explains how you can use the participant's history report to check if a given participant had disabled a given sensor at a certain time or not.
In order to ensure participant's privacy, Ethica allows participants to snooze the app. Snoozing the app puts the app to sleep for 1 hour, and participants can put the app to snooze as many times as needed.
While the app is paused, no data is being collected from any data source. While participants do still receive previously scheduled surveys during this time and can respond to it, none of the other data sources are operational during this time. You can check the participant's history report to see if the app was put on snooze at a given time or not.
While selecting data sources for a new study, you can choose the data source to be optional or mandatory. If the data source is marked as optional, participants will have the option to opt-out from the data source completely and not provide the requested data.
If participants opt-out from a data source, no data from that source is recorded and uploaded, therefore the data quantity plot for that source will be blank.
Arguably of the most common reason for a participant not to upload the data is if their device has run out of battery and has been off for some time. This obviously leads to no data being collected or uploaded during that period.
If a device turns off for any reason, none of the data collected prior to it will be lost. Also, any surveys in progress will be resumed when the device turns back on again, and participants will be able to continue answering the surveys. When the device turns on, Ethica will start operation automatically and participants don't have to remember to start the app.