HospitalSites.org
  • Blog
  • About
  • Contact

Linguistic Features of Sites

6/12/2018

0 Comments

 
Every hospital website conveys a different message with the images and text on their home pages.  Some want to talk about the quality of care, others about the compassion and bedside manner of their staff.  In this post, I do text analysis to uncover themes across sites and show the most common ones.
I’ve analyzed 4,138 sites from seed #4 and made use of the Natural Language Processing tool Spacy to extract insights.  I looked at both entity labeling and part of sentence analysis. The first one takes a look at elements that relate to real-world objects such as person, places, or organizations. The latter takes a look at verbs, pronouns, and adjectives.

Here’s a description of the four entities I extracted from all hospital sites main page:
​

ORG: Companies, agencies, institutions, etc.
LOC: Non-GPE locations, mountain ranges, bodies of water.
PERSON: People, including fictional.
GPE: Countries, cities, states.

Looking through all the websites, here's the distribution.  The entity organization (ORG) comes up on large numbers:
Picture
Organization (ORG)

The library Spacy is pre-trained to detect some of these organizations and it’s not perfect, some of detected organizations may be classified incorrectly.  Across the sites, here’s a distribution of the top ten ones:
Picture
The most recognized organization (ORG) is Facebook, which indicates most sites have a link to it from their homepage. The second one is patient portal, which is technically not an organization, but nonetheless appears in many of the sites.  One that stands out as a real organization is the Joint Commission, which accredits and certifies many hospitals.
​

Location (LOC)

The most mentioned entity for location (LOC) was Google, and this because sites include a link to Google maps with latitude and longitude, which Spacy identifies correctly as a location.  The second and third are incorrectly classified as location, but the four following ones are correct: valley, south, east, and southern california. Here’s the figure showing these results:


​
Picture
Country, City or State (GPE)

The entity classification for countries, cities, or states (GPE) missed many, but it got some correct. For some reason, it thinks youtube is one of these entities and because it appears in many sites is number one. It also confused ‘md’ as Maryland, and because these are healthcare sites, it would appear as one of the top ones. In this list, it also includes sitemap, a common word on websites as a location.  As for real location, Texas appears on the top as TX and Texas.
Picture
Person

The entity person was not classified correctly for the top 10 most common ones. It got confused with Bill Pay, presumably thinking that Bill was a person. Likewise with the term bill.  Here’s a distribution of the top 10 detected ones:
Picture
Verbs, Adjectives and Nouns

Now let’s take a look at the parts of sentence classified by Spacy (verbs, adjectives and nouns).  This analysis did a little better than the previous one. Starting with adjectives, the number one is ‘best’, followed by ‘free’. Here’s the top 10 list:
Picture
There's no surprise that many hospitals have the best program, or the best doctors and as such it appears as one of the most used adjectives.  Here's a sample of how it's used at the Yale New Haven Hospital:
Picture
Now let’s take a look at the top 100 sites with the most number of adjectives on their page.  Here’s a list of the top 10 and the number of adjectives in each:
Hospital URL
Adjective Count
www.tbhcare.com
301
www.ormc.org
240
www.susquehannahealth.org
​208
www.charlotteregional.com
182
​ksmedcenter.com
​180
​bestheartcare.com
​178
www.nm.org
178
www.faithcommunityhospital.com
177
​www.nmbhi.org
165
The distribution of adjectives is very similar, as 'best' was the number one again,  but there's a few new ones such as 'many', 'first' and 'better':
Picture
Verbs

Looking at the top verbs, there’s no much surprise here:
Picture
Pronoun

The most common pronouns, as most hospitals do, they put patients first, emphasizing in ‘you’, followed shortly by ‘us’ and ‘we’:
Picture

​This was a quick look at analyzing hospital websites main pages through Spacy, a quick and effective way to do natural language processing.

The Jupyter notebook used to crunch some of these data can be found here and it has a deeper look at the sites with a large number of entities and parts of speech.  The data files used to extract these information can also be found in this directory.

This one lists all parts of speech found in each one of the 4,138 sites. As well as this one, lists all the entities per site. There’s a lot of information found there, for instance, the website with the most verbs was
ocalahealthsystem.com, which consist of 16,370 verbs on the site.
0 Comments

Your comment will be posted after it is approved.


Leave a Reply.

    Author

    I've been working on the healthcare space for more than 10 years and I am passionate about the technology that runs it.

    Archives

    June 2018
    January 2018
    May 2017
    March 2017
    February 2017
    January 2017
    December 2016
    September 2016
    August 2016
    July 2016
    June 2016
    August 2015
    June 2015

    Categories

    All
    Seeds

    RSS Feed

Proudly powered by Weebly