This latest dataset cleans many of the hospital websites that no longer exist or that have merged. The list was reduced to 4,150 hospitals. It was created by starting with Seed #3 and merging with hospital data found in data.world site.
Page speed, the amount of time it takes to load a web page, is very important for user satisfaction. If a site takes too long to load, it can frustrate and make users go somewhere else. Sites that are too heavy on the resources they need, can lead to slow loading pages. In this post, I used the listing on Seed #3 and compare each one of them based on their speed to load.
One of the questions that I wanted to answer when I started hospitalsites.org, was to know the types of technologies that allow for healthcare sites to run. Taking a look at HTTP response headers no only answer this question, but it gives an interesting insight into other factors that make up these websites.
This post finds answers to multiple questions through the analysis of the HTTP headers that are returned when requesting hospital websites. Through multiple scripts, I collected the response headers into JSON objects and grouped together to extra the following information.
Heading tags are an important piece of a well-formed HTML page. It is especially important as these tags send a signal to search engines about the relevance of the content that are surrounded by them. Among these tags, the <h1> one is considered of great significance as it signals the main idea of a web page.
As such, it is usually best practice to have one and only one <h1> tag that represents the main idea of the page being presented. Using all the sites in Seed #3, this post analyzes the use of the tag on hospital sites’ home pages and compares them according to its frequency of use and to the content that is surrounded by it.
After months of carefully gathering Hospital information from different sources, I have cleaned up and come up with the latest dataset which I will refer to as ‘Seed #3’. This is a more comprehensive list of hospitals across the U.S. and it’s five times the size of ‘Seed #2’, with a total of 5,942 hospitals included in this seed. Analysis of hospital sites moving forward will use this dataset.
I the first post of this year, I present the homepages of 879 hospitals in Seed #2. These were taken the first day of this month. A quick note of interest is that more and more sites are using fixed top navigations that stay in place as the user scroll down the page. Because of this, some screenshots appear to have the top navigation bar duplicated.
In this post I explore the site structure of 474 sites from Seed 1 and look at things such as the most common folder names, number of branches from its root, and the number of links on each.
This is the last post in a series that analyzes images on hospital’s home pages, and it makes use of face detection with IBM’s Watson Visual Recognition library. The idea is to look the largest image on each home page and see how many of them contain faces. The visual recognition library not only can detect a face, but it can also determine if the face it’s male or female.
Continuing the analysis of images on hospital site’s home pages, this post takes a look at the type of images that are presented. It does so by making use of IBM’s Watson Visual Recognition library, which can identify objects in an image and make classifications accordingly. Each one of the images referred to in the previous post was used to come up with the following classifications.
This is the first on a series of posts that analyzes the images on the home pages of hospital sites. This post in particular takes a look at all the images that make up a web page and finds the largest on each as determined by its size in kilobytes. Using the list on Seed #2, I created a script that accomplishes this task.