Machine Learning Newsletter

IMDB Top 100K Movies Analysis in Depth Part 4

This post is fourth in the series. See the first, second and third one. I explained the data sources in the third post. In this post, I will look at age and height distribution of the actresses and actors over time, and revenue of the movies per category. This is the last post for this series, I promise.

Mean Ages over Years

In [10]:

The age difference in the early movies may be due to selection bias as it is quite high. After 1970, even if the ages change over time, more or less it is stable. As the number of actors is higher than the number of actresses, the average is close to the age of actors.

Median Ages over Years

In [11]:

Actor age distribution still dominates for average age distribution, but now we could see the age distribution much better. Median age for actresses do not go up higher than 32(max) whereas the smallest median age of actors is 32 for all of the years!

In [12]:

Standard deviation grows over time which supports the selection bias in early years.

In [14]:
In [15]:

One thing is clear from these graphs is that, older actresses/actors play more and more movies over the years. The age change in actors are more drastic than actresses. Considering ages, we saw the distribution of the ages of actresses and actors differ a lot in the third post. Over the time, the difference between actors and actresses increase which support the age distribution in the third post.

Mean Height

In [18]:

Median Height

In [19]:

More taller actresses play more movies over time, but as far as actors are concerned, they seem to getting shorter even though the change in the height is low(1-2 inches over 70 years).

< $400M Revenue Movies per Category

In the following graphs, purple is the each category revenue distribution of the movies and turquoise is the average revenue distribution for all of the movies.

In this one, we will look at the revenue distribution of movies whose revenue is smaller than 400 M per category. In the next one, we will look at those whose revenue is bigger than 400 M.

In [76]:
  • Drama and Comedy fits quite good in the low revenue area.
  • Biography, Music, Romance, War and Western has quite a lot of movies in the very low revenue range.
  • Film-Noir has the lowest revenue across all categories. This may be the reason why it went extinct.(See the second post for category distribution over time)
  • Animation as we will see in the second graph better has the best middle-low revenue distribution for all the categories.
  • Family has generally the higher revenue for all of the categories except very low revenue range, which is somehow surprising.

> $400M Revenue Movies per Category

In [91]:
  • Animation is best in terms high revenue distribution for the category.
  • Family, Adventure, Fantasy are the good ones in high revenue range.
  • Most of the categories are non-existent in the high revenue areas which is somehow expected.

What is next?

This is about it. Unlesss I pull different data for the movies, this post is likely to be the last post in this series.

comments powered by Disqus