Mobile health and privacy: cross sectional study

Copyright © Author(s) (or their employer(s)) 2019. Re-use permitted under CC BY-NC. No commercial re-use. See rights and permissions. Published by BMJ.

This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.

Associated Data

Supplementary material: appendices A-G GUID: 64FDE019-AB96-44AF-B21C-E45E98E96494

Abstract

Objectives

To investigate whether and what user data are collected by health related mobile applications (mHealth apps), to characterise the privacy conduct of all the available mHealth apps on Google Play, and to gauge the associated risks to privacy.

Design

Cross sectional study

Setting

Health related apps developed for the Android mobile platform, available in the Google Play store in Australia and belonging to the medical and health and fitness categories.

Participants

Users of 20 991 mHealth apps (8074 medical and 12 917 health and fitness found in the Google Play store: in-depth analysis was done on 15 838 apps that did not require a download or subscription fee compared with 8468 baseline non-mHealth apps.

Main outcome measures

Primary outcomes were characterisation of the data collection operations in the apps code and of the data transmissions in the apps traffic; analysis of the primary recipients for each type of user data; presence of adverts and trackers in the app traffic; audit of the app privacy policy and compliance of the privacy conduct with the policy; and analysis of complaints in negative app reviews.

Results

88.0% (n=18 472) of mHealth apps included code that could potentially collect user data. 3.9% (n=616) of apps transmitted user information in their traffic. Most data collection operations in apps code and data transmissions in apps traffic involved external service providers (third parties). The top 50 third parties were responsible for most of the data collection operations in app code and data transmissions in app traffic (68.0% (2140), collectively). 23.0% (724) of user data transmissions occurred on insecure communication protocols. 28.1% (5903) of apps provided no privacy policies, whereas 47.0% (1479) of user data transmissions complied with the privacy policy. 1.3% (3609) of user reviews raised concerns about privacy.

Conclusions

This analysis found serious problems with privacy and inconsistent privacy practices in mHealth apps. Clinicians should be aware of these and articulate them to patients when determining the benefits and risks of mHealth apps.

Introduction

With the improved accessibility of smartphone devices, mobile applications (or apps) available through a variety of marketplaces have grown exponentially. As of 2021, almost 2.87 million apps were available on the Google Play store alone. 1 Two popular apps come under the categories of medical and health and fitness. Referred to collectively as mobile health or mHealth apps, such apps encompass a wide range of functions, from the management of health conditions and symptom checking to step and calorie counters and menstruation trackers. 2 Mobile health is a booming market that targets not only patients and clinicians but also those with an interest in health and fitness.

Although the potential of mHealth apps to improve access to real time monitoring and health care resources is well established, 3 4 they pose problems concerning data privacy because of the sensitive information they can access, the use of a business model that is centred on selling subscriptions or sharing user data, 5 and the lack of enforcement of privacy standards around the world. For example, the European Union General Data Protection Regulation 6 (GDPR) defines eight rights of individual users, and several rules implemented under the US Health Insurance Portability and Accountability Act 7 (HIPAA) establish a baseline of privacy protection and patient rights.

In line with the HIPAA, the US Food and Drug Administration released guidance for the postmarket management of cybersecurity in medical devices in 2016. 8 The FDA recommended that manufacturers of medical devices (ie, app developers) should incorporate risk management into the life cycle of their products and implement controls to ensure that the devices were secure and protected patients. Specifically, the guidance covers cybersecurity and privacy factors and stipulates risk management programmes that “address vulnerabilities which may permit the unauthorized access, modification, misuse, or the unauthorized use of information that is stored, accessed, or transferred from a medical device to an external recipient, and may result in patient harm.”

However, regulation and guidance are difficult to enforce in practice. Several recent episodes have highlighted the problem of app data being collected and shared in an unauthorised manner. For example, a Norwegian not-for-profit organisation found that 10 popular apps, including one on health and fitness, shared data with advertising companies without informed user consent, in a clear breach of GDPR. 9 Forty one popular apps, some developed by leading technology companies, have been called out by the Chinese Ministry of Industry and Information Technology for illegal data collection. 10 A 2019 decision by CNIL, the French data protection authority, found Google to be in breach of the principle of transparency 11 because the information on the use of personal data was presented in a vague manner that was difficult to understand.

Because of the inadequate privacy disclosures of top mHealth apps, 4 12 we used a suite of app collection and analysis tools to carry out a large scale privacy analysis of mHealth apps and performed a privacy audit of more than 20 000 mHealth apps available in the Google Play store, the largest mobile app marketplace. 13

Compared with previous analyses, 4 12 14 15 our study covers virtually all the Google Play store mHealth apps accessible from Australia, as a proxy for the worldwide Google Play app marketplace. Google Play store 16 provides various filters and configurations to developers, facilitating the localisation and distribution of releases of Android apps to specific countries or geographical locations. 17 From this information we determined that most of the collected (91.1% (19 101)) and analysed (75.7% (15 983)) mHealth apps were not specific to Australia but are also present and available in other locations such as Europe and the US. Our study was large and we also refined the granularity and depth of our analysis. For example, Dehling et al categorised mHealth apps into the low, medium, and high privacy risk groups, 18 disregarding the type of user information being leaked, the recipients of the information, and whether this was disclosed in the app’s privacy policy. We considered the security of the communication protocols used by the apps, the presence of advertising and tracking libraries in the app code, and the users’ reviews on the app’s privacy conduct.

Methods

Since 2015, app marketplaces such as Google Play and Apple Store have grown by about 38%, and are expected to generate 111.1 billion apps by 2025. 19 The number of mHealth apps available in app stores continues to increase. 20 Of the 2.8 million apps on Google Play and the 1.96 million apps on Apple Store, an estimated 99 366 belong to medical and to health and fitness categories. These apps account for 2% (47 890) of those available through Google Play and 3% (51 476) available through the Apple store. 21 22 Our analysis focused on Google Play, the largest app store, which virtually covers all the Google Play mHealth apps accessible from Australia, as a proxy for the worldwide Google Play app marketplace.

mHealth app dataset

Google Play does not provide a complete list of mHealth apps and its search functionality does not show all the available apps. To overcome this problem and to detect as many mHealth apps as possible, we developed a crawler that interacted directly with the app store’s interface. 23 Starting from the top 100 medical and health and fitness apps on Google Play, the crawler systematically searched through other apps considered to be similar by Google Play. For each app, the crawler collected several metadata: app category and price, locations where the app is available, app description, number of installs, developer information, user reviews, and app rating. From 1 October to 15 November 2019, the crawler searched through more than 1.7 million apps.

We selected apps belonging to the medical and health and fitness categories on Google Play. Overall, we identified 20 991 mHealth apps, of which 15 893 (75.7%) were free to download, 3 228 (15.4%) were purchased instore, and 1 872 (8.9%) were geoblocked (that is, could not be downloaded in Australia). In addition, we used the crawler to sample a random set of popular non-mHealth apps to be used as a baseline comparator. This set contained 8 468 apps from the tools, communication, personality, and productivity categories. Table 1 shows the dataset characteristics.

Table 1

Characteristics of the 20 991 mHealth apps and 8468 baseline (non-mHealth) apps, collected from the Google Play store

CharacteristicsNo (%) of mHealth apps (n=20 991)No (%) of non-mHealth apps (n=8468)
Medical8074 (38)-
Health and fitness12 917 (62)-
Download status:
Instore purchase3288 (15.4)-
Free15 893 (75.7)8468 (100)
Geoblocked1872 (8.9)-
No of downloads:
≥5007481 (35.6)1394 (17.3)
≥10004009 (19.1)74 (0.9)
≥50001683 (8.0)37 (0.4)
≥10 0003582 (17.1)206 (2.4)
≥50 0001253 (6.0)206 (2.4)
≥100 0001882 (9.0)1625 (19.2)
≥500 000375 (1.8)820 (9.7)
≥1 000 000462 (2.2)2512 (29.7)
≥5 000 000127 (0.6)1527 (18.1)
Contains adverts and includes tracking and analytics services (yes/no):
All (non) mHealth apps13 163 (63.0)/7928 (37.0)7960 (83.2)/508 (6)
Medical apps4516 (55.9)/3558 (44.1)-
Health and fitness apps8547 (66.2)/4370 (33.8)-
Includes privacy policy link on Google Play’s webpage (yes/no):
All (non) mHealth apps15 088 (71.9)/5904 (28.1)6329 (74.7)/2140 (25.3)
Medical apps5439 (67.4)/2635 (32.6)-
Health and fitness apps9649 (74.7)/3269 (25)-
Users’ perception (% range)*:
0-2010 371 (49.4)1437 (17.0)
21-404157 (19.8)30 (4.0)
41-602663 (12.7)337 (4.0)
61-801474 (7.0)2125 (25.1)
81-1002326 (11.1)4539 (53.6)
* Determined by 100%×number of negative reviews/total number of reviews.

Statistical analysis

We analysed the mHealth app files and source code (static analysis), investigated the network traffic generated during execution of the app (dynamic analysis), and inspected reviews provided by users of the apps ( fig 1 ).

An external file that holds a picture, illustration, etc. Object name is tang063318.f1.jpg

Privacy analysis of mobile health (mHealth) apps

App files and code analysis—of the initial set of 20 991 apps, we downloaded all 15 893 (75.7%) free apps and excluded the instore purchasable and geoblocked ones. To access the apps’ resources, we processed the downloaded app packages using apktool, a tool that reverse engineers Android apps and decodes them to nearly their original form. 24 In addition, for all 15 893 mHealth apps, we extracted the app’s publicly available privacy policy, which discloses the collection and use of personal data and describes the app’s privacy practices. Typically, the link to the privacy policy is included in the app page on Google Play. If the link was broken or directed users to a page with no text, we considered the app to have no privacy policy. We analysed the extracted resources as follows:

Third party presence in app resources—to retrieve and classify all third party libraries included in the app, we performed a dictionary based search of the folder containing the decoded app files and embedded libraries. To achieve this, we used a comprehensive dictionary of third party libraries, 25 which comprises 338 third parties, including adverts (eg, GoogleAds); analytics (eg, GoogleAnalytics); utilities (eg, Github); and other social, banking, and gaming services (eg, Facebook or PayPal).

Data collection operations in the app code—we extracted the set of Android operating system functions associated with access to users’ personal data. For example, the presence of the function android.telephony.TelephonyManager.getLine1Number in the app code indicates the retrieval of the user’s contact phone number. In addition, we extracted the set of permissions requested by the app to access components of the operating system such as contact list or global positioning system (GPS) location. Using the permissions, we checked whether each data collection function had all the required authorisations for execution, and, if not, it was discarded. The final set of functions represented all the potential data collection in the app: in practice, it is a superset of the actual user data collection, because some parts of the app code might rarely (or never) be triggered during execution of the app.

Privacy policy analysis—the disclosure of privacy practices is a legal requirement set by privacy regulations (eg, GDPR), and Google Play store has been mandating the inclusion of app privacy policies since 2018. Manually reviewing and annotating the app privacy policies is not feasible owing to the scale of the dataset. To overcome this, we analysed the automatic privacy policy using supervised machine learning to predict the disclosure of personal data in the privacy policy text. 26 We trained the machine learning with a large public dataset of annotated privacy policies, APP-350. 27 This is a set of 350 privacy policies of popular mobile apps annotated by legal experts. The accuracy of this method has been validated at more than 97% for all disclosure types, an average precision of 87%, and an average recall of 77%. Supplementary appendix B presents the detailed prediction performance.

Traffic analysis—we intercepted and analysed all the network traffic generated by the apps during the execution of automated app testing. 28 To achieve this, we built a dedicated testbed composed of a smartphone that connects to the internet through a computer configured as a WiFi access point, which runs a tool 29 intercepting all the traffic transmitted to the internet. Each of the 15 893 downloaded free apps were individually tested (apps purchased in-store or geoblocked were excluded): for each app, on average we performed 35 different activities (eg, opened app, opened menu, clicked on button) in a 180 second test session.

The intercepted traffic was analysed as follows:

Adverts and trackers in app traffic—we extracted the communications with external advert and tracking services—most likely third party recipients of personal data. 30 To isolate the traffic components associated with adverts and trackers, we used two comprehensive filter lists: EasyList, 31 an advert block list, and EasyPrivacy, 32 a supplementary block list for tracking.

Personal data transmission in app traffic—we identified the transmissions of user data performed by the apps during testing. A machine learning method 33 was used to find personally identifiable information in the app traffic considered to be the specific device identifier (eg, Android ID), user identifier (eg, name or email), credentials (eg, password), or location. The machine learning was trained on a large public dataset of annotated mobile app traffic flows 34 and yielded a validation accuracy of 97%, with 97% precision and 96% recall. The result only includes data collection practices that are actually performed when the app is used; this set is, however, not complete owing to coverage limitations of dynamic app testing—which might not trigger some menus, views, or functionalities of the app. For this reason, we studied the user data collection in mHealth apps by leveraging both the app code and the app traffic.

Secure transmission of user data—using the HTTP/HTTPS protocol we measured the fractions of user data transmissions. Whereas HTTP based communications are unencrypted, HTTPS encrypts all messages to protect app users from malicious data interception and content tampering. In the light of recent reports of widespread internet surveillance 35 and legislation permitting internet service providers to sell user information extracted from network traffic, 36 the adoption of the HTTPS protocol is essential to protect users’ privacy. 30

App review analysis—to obtain the complete list of reviews for each app we downloaded the content of the app’s page in the Google Play store. After excluding those reviews with no text, we obtained a dataset of 2 130 684 reviews for 6 938 mHealth apps, of which 366 198 (17.2%) referred to medical apps and 1 764 486 (82.8%) to health and fitness apps. We categorised these reviews as positive (4 or 5 stars), negative (1 or 2 stars), or neutral (3 stars), resulting in 1 788 463 (83.9%) positive reviews and 235 210 (11.0%) negative reviews.

Patient and public involvement

No patients or members of the public were directly involved in the study. The subject of the study was mHealth mobile apps publicly available on Google Play. The data collection and analysis methods leveraged an automated testing platform designed by the authors, not requiring the involvement of mHealth app users or developers. Likewise, we analysed public app reviews from Google Play, which were voluntarily contributed by the app users. To raise awareness of privacy risks in mHealth, we plan on sharing the collected datasets, the analyses library, and our findings with clinicians, patients, app developers, and the public.

Results

Personal data collection practices

The analysis of apps files and codes identified 65 068 data collection operations; on average four for each app. This result provided the broad set of all information that the apps can potentially access and share with third parties. At the same time, analysis of apps traffic identified 3148 transmissions of user data across 616 (3.9%) different apps. The main types of data collected by mHealth apps include contact information, user location, and several device identifiers. Part of these identifiers (specifically, international mobile equipment identity (IMEI), a unique identifier used for fingerprinting mobile phones; media access control (MAC), a unique identifier of the network interface in the user’s device; and international mobile subscriber identity (IMSI), a unique number that uniquely identifies every user of a cellular network) are unique and persistent (ie, they are immutable and cannot be changed or replaced) and can be used by third parties to track users across networks and applications. Supplementary appendix A provides further details about the collected data types.

Most of the mHealth apps included codes for collecting the MAC identifiers (67.0% (14 064) of apps) and app cookies (64.0% (13 434) of apps; fig 2 )—that is, small text files used for customising web browsing and app experience, but also for generating online user profiles. Other common types of data were the user’s email address and current cell tower location (33.0% (6927) and 25.0% (5248) of apps, respectively). User data transmissions were observed in 3.9% (616) of mHealth apps, mostly for health and fitness apps ( fig 3 ). This percentage is substantial and should be taken as a lower bound for the real data transmissions performed by the apps, because some transmissions might not be triggered in automated app testing. The most common transmissions were for contact (user’s first or full name) and location (eg, zipcode; fig 3 ). When compared with baseline (non-mHealth) apps, mHealth apps, especially medical ones, were considerably less likely to collect personal data ( fig 2 ).

An external file that holds a picture, illustration, etc. Object name is tang063318.f2.jpg

Data collection operations in mobile health (mHealth) apps files and code. IMEI=international mobile equipment identity; SSID BSSID=service set identifier basic service set identifier; MAC=media access control; SIM=subscriber identity module; IMSI=international mobile subscriber identity

An external file that holds a picture, illustration, etc. Object name is tang063318.f3.jpg

Personal user data transmissions in mobile health (mHealth) app traffic. MAC=media access control; GPS=global positioning system

Third parties that can access the personal data were also studied by distinguishing between collection on behalf of the first party (app’s own entities and domains) and collection on behalf of third party services (eg, external adverts, analytics, and tracking providers). The results show a predominant role of third parties ( fig 4 ); 54 155 of 61 920 data collection operations in the app codes (87.5%, fig 4 ) were related to third party services—that is, they originated from third party libraries embedded in the apps. The result might in part overestimate the actual role of these services, as some embedded libraries may never be used. The strong presence of third parties, however, was confirmed by the apps’ traffic, where 1756 of 3148 detected transmissions of user data (55.8%, fig 5 ) were towards third party servers.

An external file that holds a picture, illustration, etc. Object name is tang063318.f4.jpg

Personal data recipients in mobile health (mHealth) app files and code. IMEI=international mobile equipment identity; SSID BSSID= service set identifier basic service set identifier; MAC=media access control; SIM=subscriber identity module; IMSI=international mobile subscriber identity

An external file that holds a picture, illustration, etc. Object name is tang063318.f5.jpg

First party and third party personal data transmission in mobile health (mHealth) app traffic. MAC=media access control; GPS=global positioning system

Third party data recipients

Overall, 665 unique third party entities were identified, of which a small list of prominent third parties (the top 50) were responsible for most data collection operations in app code, and data transmissions in app traffic (68.0% (2140), collectively).

Third party presence—in general, a strong integration (in app code and files) and interaction (in app traffic) with third parties indicated an increased collection of user data by these services. This is crucial, as these entities might also share personal information with commercial partners or transfer the information as a business asset.

To quantify the third parties in the app code, the number of third party libraries for each app was measured across the different app categories. Although 63.0% (13 224) of mHealth apps embedded at least one third party service, this proportion was substantially lower than for non-mHealth apps ( table 2 ). In particular, only 6.0% (1260) of mHealth apps included six or more third party libraries compared with 43.0% (3641) of non-mHealth apps. Although medical and health and fitness categories showed similar trends, health and fitness apps integrated slightly more third party libraries. This difference could explain why data collection operations were less common in medical apps ( fig 2 ).

Table 2

Number of third party libraries found in app code and percentage network traffic related to advert and tracker services in mobile health (mHealth) apps

No (%) of apps
mHealth (n=20 991)Medical (n=8074)Health and fitness (n=12 917)non-mHealth (n=8468)
No of embedded third party libraries
07928 (37.8)3558 (44.1)4370 (33.8)508 (6.0)
14618 (22.0)1857 (23.0)2713 (21.0)423 (5.0)
22729 (13.0)969 (12.0)1679 (13.0)847 (10.0)
31889 (9.0)565 (7.0)1292 (10.0)1101 (13.0)
41469 (7.0)404 (5.0)1033 (8.0)1016 (12.0)
51250 (6.0)323 (4.0)1033 (8.0)931 (11.0)
≥61250 (6.0)404 (5.0)775 (6.0)3641 (43.0)
Adverts in network traffic (% of requests)
0.019 888 (94.7)7696 (95.3)12 087 (93.6)6942 (82.0)
0.0-1.9183 (0.9)116 (1.4)111 (0.9)431 (5.1)
2.0-4.9181 (0.9)58 (0.7)143 (1.1)382 (4.5)
5.0-9.9206 (1.0)44 (0.5)189 (1.5)116 (1.4)
10.0-19.9165 (0.8)24 (0.3)143 (1.1)332 (3.9)
>=20.0368 (1.8)136 (1.7)255 (2.0)265 (3.1)
Trackers in network traffic (% of requests)
0.019 075 (90.9)7395 (91.6)11 534 (89.3)6759 (79.8)
0.0-1.9161 (0.8)58 (0.7)113 (0.9)340 (4.0)
2.0-4.9426 (2.0)117 (1.4)324 (2.5)398 (4.7)
5.0-9.9381 (1.9)107 (2.0)263 (2.0)373 (4.4)
10.0-19.9401 (1.9)165 (2.0)274 (2.1)232 (2.7)
≥20545 (2.6)232 (2.9)409 (3.2)366 (4.3)

Table 2 also reports the fractions of communications with third party services in the app traffic, focusing on advert and tracking services (other third-party services (eg, social, widgets) have negligible presence in the intercepted traffic). mHealth apps tended to have fewer interactions with advert and tracking services than non-mHealth apps. For example, advert related traffic was observed for only 5.3% (1103) of mHealth apps compared with 18.0% (1526) of non-mHealth apps. Supplementary appendix C shows the top 10 mHealth apps for presence of adverts, along with popular health and fitness apps.

Most common third parties—third party libraries Google Ads (adverts) and Google Analytics (analytics) were detected in mHealth apps code and files in 45.3% (3659) of medical apps and almost 50.0% (6453) of health and fitness apps ( fig 6 ). Results were mainly consistent across the two mHealth app categories, although mHealth apps incorporated fewer Facebook widgets. Similarly, compared with non-mHealth apps, mHealth apps adopted SquareApp payment and Amazon services less often. The most common advert and tracking services contacted by the apps were Google ads (domains googlesyndication.com and doubleclick.net, which indicate the use of Google AdSense or Google Ad Manager for loading and managing adverts) and trackers (domain google-analytics.com) ( fig 7 ).

An external file that holds a picture, illustration, etc. Object name is tang063318.f6.jpg

Third party libraries in mobile health (mHealth) app categories and non-mHealth apps. *For example, social networks, banking, games