What movie will you see this weekend? That depends on where you live.

What do people do before they go see a movie?

The movie industry tries to answer this question through proxies employed by marketers: surveys, data on past successes, search data, and more recently social media listening or interaction tools.

Given our dataset of billions of internet visitors per month to the largest media properties in the world, we thought we’d try to visualize actual reader attention, as measured by page views, for movies. We removed the need for online audiences to take an action in order to measure their behavior and instead focused on information they’re taking in.

What happens when you remove the need for proxies and focus on actual attention?

To start, we examined the amount of attention a movie receives in the media and the correlation to box office success. In the scatterplot below, each dot represents a single film. Dots located further to the right received more internet attention in the three days prior to their release, and those located towards the top received more total US box office revenue.

movie_data_readership

The high correlation between pageviews and box office revenue is likely simple: people tend to read articles about a movie before buying a ticket. So, the more readers a movie’s articles receives, the more money made. It’s interesting to note the exception, PG movies (which are represented by the hollow blue dots in the scatterplot were less correlated), also makes perfect sense: kids are less likely to read an article about a movie before attending it.

What else can we we see by analyzing reader interest and attention in movies? One thing that became clear when digging through the data: geography affects audience attention for movies and entertainment online.

Captain America vs. Deepwater Horizon in the USA

Movies, at least as measured by their box office earnings, still require an understanding of localized viewing habits. For movies released in 2016 through August 2017, our team analyzed how many total views each movie’s articles received in each of the U.S.’s media market areas. For each movie, we found every article from our database where the movie was mentioned in the text or headline. Then, using IP address, we matched each visit to these articles with the extracted the geographic location of the visitor.

Here’s two examples of the results. (You can explore all movies in the visualization at the end of this post.)

We can see that the regions around the Gulf of Mexico, which were most heavily affected by the real-life Deepwater Horizon disaster, paid the most attention to the film, which makes sense and is a good sanity check for our data. The film also received high levels of attention in Western North Dakota and Eastern Montana. The two deep-green media markets in this area are at the epicenter of the shale oil boom, and are presumably home-away-from-home to many oil workers.

Compare that map to the people reading about Captain America: Civil War in the US. Captain America: Civil War grossed the third-highest total box office revenue in 2016, which presumably means it had very broad appeal. With the map of audience attention for this movie, we see a broad spread across the United States, though with slightly higher concentration in the Midwest.

What movies were you most likely to read about?

Using data science can quickly identify patterns that we wouldn’t be able to see just by looking at the individual movie patterns. We used a technique called Latent Dirichlet allocation (LDA) to find these hidden geographic trends in how online readers pay attention to movies in the United States.

We uncovered five distinct groups. Each group has three distinct components:

  1. Patterns based on geography, or visually, where the most views for the group took place. The maps below show the geographic clusters. We compared these to census data for area density, race and other factors to help describe each group.
  2. Most read about films in the cluster: this simply shows what percentage of pageviews each audience gives to each movie. Because this ranking is based on absolute volume of pageviews, large-budget, popular films show up in this list.
  3. Each audience’s most characteristic films compares that audience’s most popular list above to the average across all audiences. Because this ranking is relative (based on comparing interest to the average), both small and large budget movies both have a chance of appearing here, so this list highlights what makes each cluster unique.

Using this information, we’ve described each group below. Read on to see the groups for yourself, and see if we matched the movies you watched to the area where you live.

Group 1: Tentpole movies and comic book heroes: Middle America loves you

The largest grouping (or cluster) of views included 38% of all the pageviews analyzed. For each cluster, certain movies that this segment was uniquely interested in were characterized by action films that appeal to different age groups, especially comic-book action films. This audience has a large presence in most media markets, but is especially prevalent in rural areas.

We described this grouping as “Mainstream,” based on the size and the spread of the interest across the country. The film’s most characteristic of this cluster are extremely expensive to produce: the average production budget for this group’s ten most characteristic films was $150.3 million, indicating that movies with widespread appeal have their price.

Group 2: Urbanites like their indie flicks

The second grouping the algorithm showed us generates 22% of all movie focused pageviews. In an almost inverse pattern of the previous group, the reader interest comes mostly from urban, coastal regions. The movies, like The Zookeeper’s Wife and The Big Sick, focus more on adult themes and less on special effects.

The average production budget for this group’s ten most characteristic films was $22.9 million, or 85% less than the budget of the films most characteristic of the mainstream group.

Group 3: Southern Corridor of the US, the most profitable and very, very interested

In the third grouping, we discovered interest in movies with largely black casts, including When the Bough Breaks and Girl’s Trip. We also observed that the geographic trends very closely resembles that of the African American population according to the U.S. census.

Using the relative interest levels as an indicator, it appears these films aren’t generally popular across the U.S., but in this group, they’re very popular.  Additionally, when we looked at the budget vs. the box office gross for these movies, the movies that this group was interested in were especially profitable, often beating industry predictions.

Group 4: Southwest interest in Hispanic casts, but very little inventory

Somewhat similar to the previous group, this pattern of readership closely resembles a distinct ethnic group, specifically the Hispanic population (we compared it to census data compiled by Pew Research). While 7% of total pageviews visited about movies went to these films, it appears that this market is underserved—very few Hollywood films feature predominantly hispanic casts.

The movies that did do well include Lowriders and How to be a Latin Lover, along with Phoenix Forgotten, which centers around events in Phoenix, Arizona.

Group 5: Brrr, it’s cold in here

The final audience, which accounts for 24% of pageviews, is the most difficult to interpret. It is more prevalent in the northern half of the US, and many of its most characteristic films were winter releases, so we tentatively call this the Winter audience. The movies unique to this cluster included A Dog’s Purpose and Fist Fight.



Now that we’ve taken a shot at defining these groups, take yours! Explore the data below.

See more data on the movies and grouping on Parse.ly.

Interested in Readership and the Movies?

We’ll be tracking online readership of movies each week going forward through the Parse.ly Network. Interested? Sign up to receive the live update and see if your favorite films are the ones that are getting the most attention online.