GitHub Healthcare Data Leaks Impact Credentials, Company Data and 150,000+ Patients’ PHI

A new report showed that the personal and protected health information (PHI) of patients and other sensitive information had been exposed on the internet via the public GitHub repositories without the covered entities and business associates’ knowledge about it.

Security researcher Jelle Ursem from the Netherlands, uncovered around 9 entities in the US – which include HIPAA-covered entities and business associates – were leaking sensitive information through GitHub. The 9 leaks had around 150,000 to 200,000 patient records. The scanning for exposed information was stopped to make sure the entities involved may be contacted and to create the report to emphasize the risks to the healthcare community.

Although your provider doesn’t use GitHub, you may still be affected. The activities of one employee or a third-party contracted provider might have permitted unauthorized people to get access to sensitive information.

PII and PHI in Public GitHub Repositories Exposed

Jelle Ursem had identified a lot of data leaks on GitHub in the past. Some of the data leaks involved Fortune 500 companies, publicly traded firms, and government institutions. Ursem conducted a search to see if medical information was also leaked on GitHub. In just 10 minutes, he found something to confirm that and it wasn’t an isolated instance.

Ursem performed queries like “medicaid password FTP” and “companyname password” and found a number of hard-coded usernames and passwords uploaded to GitHub. He was able to use the usernames and passwords to sign in to Google G Suite and Microsoft Office 365 accounts and could access a variety of sensitive data including user information, contracts, activities, internal records, group chats, and patients’ PHI. GitHub search is a very dangerous hacking tool because it’s possible to find leaked company data on GitHub.

Ursem tried to contact the companies involved to notify them about the exposed data and make sure the data was protected, however contacting those companies to get the data secured was difficult, therefore Ursem approached databreaches.net.

DataBreaches.net’s Dissent Doe and Ursem tried to contact and notify the companies involved. With some companies, they succeeded, but others still have their data unsecured.

9 Data Leaks Identified

According to the report, the U.S. entities affected by data leaks were MedPro Billing, Xybion, Texas Physician House Calls, MaineCare, VirMedica, Waystar, AccQData, Shields Health Care Group – and one unknown entity because the data remains accessible.

The following were the common reasons for the GitHub data leaks:

  • The developers uploaded the embedded hard-coded credentials into public GitHub repositories
  • The usage of public repositories rather than private ones
  • The developers left the repositories instead of deleting them when they were not needed anymore

For instance, Ursem discovered that one developer at Xybion left hard-coded credentials in a public GitHub repository in February 2020. The data enabled Ursem to access Xybion’s billing back-office systems containing 7,000 patients’ PHI and over 11,000 insurance claims since October 31, 2018.

The same thing happened with MaineCare. The leaked hard-coded credentials allowed Ursem to have administrative access to the website, access its internal server infrastructure, the MaineCare SQL data sources, and 75,000 persons’ PHI.

The Typhoid Mary of Data Leaks

The report focused on one developer, whose GitHub practices affected a big number of its healthcare companies’ clients. The credentials and PHI of about 200,000 clients were exposed. That is why the developer was labeled as the “Typhoid Mary of Data Leaks.”

The developer committed a lot of mistakes that led to the exposure of client data on GitHub, including leaving the credentials of 5 employers fully accessible in the GitHub repositories after concluding its work. In one instance, the developer’s actions had permitted access to a large debt collection provider’s central telephone system and gave access to very sensitive data of individuals with a background of substance abuse.

Though it wasn’t possible to get in touch with that person directly, it seems that DataBreaches.net and Ursem’s message has reached the developer. The repositories were already gone or made private, however, not prior to the cloning of the data by one third-party.

This was only an example of a few outsourced or contracted developers whose practices exposed information unknown to the HIPAA-covered entities and business associates.

The joint report of Jelle Ursem and DataBreaches.net details how the leaks happened, why they were not noticed for such a long time and gave a number of recommendations on preventing data breaches on GitHub or addressing the issue quickly. The full PDF report can be downloaded here.