Hollywood vs Bollywood: Endgame
A deep dive into what makes them different, and the reductionnist view of indian cinema.
Bollywood and Hollywood are very famous film industries nowadays. A westerner might assumes that Bollywood has a long way to go before having the same importance in term of production, compared to the American industry. When we look at some numbers, the Indian movie industry is extremely famous in India, which has more than 1.4 billion people (more than 1/8 of the population on Earth ! (source)). The Indian film industry’s export to other countries is also growing, notably in China, and its productions have appeared in occidental cinemas more and more in recent years. Therefore, it could be interesting to deconstruct this bias and ask ourselves : what assumptions we have as westerners?
For example, what comes to your mind when I talk about Indian cinema?
Drama?
Singing?
Dancing?
Romantic?
Foreign?
Unlike Hollywood, Bollywood is not a real place, but only a contraction of Hollywood, the term representing the place of reference for American cinema, and Bombay, the capital of India (now Mumbai) but also the historical base of Indian cinema. The association between these two industries may lead one to believe that Indian cinema is strongly influenced by American cinema. Thus, in this project we will look at the features of their films and analyze the similarities and differences between these two world famous film factories.
Are Indian films heavily influenced by American production or do they have their own completely different identity?
To answer these questions, we use data from the CMU Movie Summary Corpus, which contains a wealth of information on different films from around the world, as well as their summaries. We will take the films from the American and Indian production and extract the most interesting features such as movie genre, topics spoken in the summaries or the characteristics of the actors and then compare them to each other.
As the plot shows, "drama" is the prominent movie genre in both American and Indian movies.
American movies seems to have a better distribution of different genres than Indian movies, that are more focused around "drama".
Some questions arises here :
Is American drama the same as Indian drama ?
What if these notions are completely different in both culture ?
We will investigate these questions further towards the end of our analysis.
Furthermore, both of Indian and American movies have a large number of movie genres, so it could be interesting to select a restricted group of movie genres that allows better and more precise results for our further analysis.
For example, genre like "world cinema" seems rather not specific enough to categorize the movies. Indeed, the term "world cinema" refers to films produced outside of the United States and Europe, or to films that are made in a particular country or region but are intended for international audiences. World cinema includes a wide variety of film genres, such as action, drama, comedy, romance, horror, and more. It encompasses films from many different countries and cultures, each with their own unique traditions and styles of storytelling.
It is rather reducer to put every genre produced outside of the Occident in a global term like "world cinema". We hence a lot of information about the spectrum of differences and different themes of movies produced around the world. This could be another bias that our data set has, on the fact that Indian movies are not represented the same way as American movies.
This could explain the fact that there is more movie genres attributed to American movies than Indian movies, although we have to be careful of the fact that we have less Indian movies than American movies.
Finally, some genres are similar to each other or some of them include other genres (e.g. "action/aventure"). For these types of genres, we separated them and re-classified them in each one of the single genres.
We notice that Indian movies have a tendency to be longer (by ~48 minutes) and than American movies.
There could be a bias in our dataset, because early Indian cinema might be under-represented compared to early American cinema.
Although we can add that in general, Indian movies tend to have more elaborate plots and subplots, and they often include songs and dance sequences, which can contribute to their longer runtime. This is often a characteristic of Bollywood movies, in particular, which are known for their elaborate storylines and large casts. American movies, on the other hand, may be more focused on action and special effects, which may not require as much screen time.
These wordcloud represent the prevalence of words in the movie plots for Indian an American movies respectively. The size of a word is proportional to its frequency in the movie summaries.
We notice that the words family, love, father occur a lot more often in Indian movies compared to American ones. This might be the reflection of underlying cultural differences, but concluding anything is not appropriate at this level of the analysis. It is also suprising to see that the word woman is more important than the word girl in the US whereas it is the opposite in India.
This can maybe be put in perspective with what we observed in the previous section : the mean female actor age in American-dominated cluster of romantic movies is 33 years old whereas it is 27 years old in Indian-dominated clusters (6 years gap!). Moreover, the topics of family and marriage is also much more important in Indian romance than Americans.
Although this representation gives us a good idea of the differences between the subjects covered by the two industries, it stay shallow and doesn't give us informations about the context or the type of movie those words appear in.
We can observe that there is a significant difference (we computed a t-test between the two data and have a significant p-values, c.f. notebook) between actress in Indian and in American. The actresses are younger in the Indian movies than in the American ones.
It might mainly be driven by the sample size, but we can propose some possible factors to explain why Indian actresses might be younger than the American actresses:
It is also worth noting that there is a wide range of ages among actresses in Indian movies, and many actresses in India continue to work in the film industry well into their 40s, 50s, and beyond.
We probably have a bias in our dataset in the quantity of information we have in our dataset regarding the American films. We are far more likely to have the names of actors that make a single brief appearance in an American film. However, regarding the other extreme, we can see that there is a lot more ultra-prolific Indian film actors than American film actors.
In both film industries we see however that most of the hyper-prolific actors are males, which is probably linked to the difference in career prospects with age.
Also, there is some evidence to suggest that female actresses tend to have fewer opportunities to play leading roles in movies than male actors in both Hollywood and Bollywood. Indeed, in 2015, the Center for the Study of Women in Television and Film at San Diego State University published a report titled "It's a Man's (Celluloid) World," which found that women made up just 12% of protagonists in the top 100 grossing films of 2014.
This phenomenon, known as the "gender gap" or the "female movie deficit," refers to the underrepresentation of women in the film industry, both in front of and behind the camera.
There are also more factors that may contribute to the gender gap in the film industry, including societal and cultural biases, the lack of diverse and complex female characters in film scripts, and the limited number of female directors and producers.
This is once again an angle that could be included in further analysis regarding trends and film success (ratings) prediction.
India is a country with more than a dozen languages talked! Silent films brought all audiences together! But when Indian cinema entered the sound era, the use of music and dance became a way to homogenise the national market across linguistic divides!
This topic analysis was really successful in detecting and segregating topics. We find a diversity of topics that span a wide range of lexicons. However a limitation to this analysis is that we are not able to detect the semantic relationship between the words in a given topic, nor can we infer the context in which they appear.
This plot, while being very informative, doesn't inform us about the use of said topics in each movie industry. We extracted those topics from a dataset containing both movie industries, and we are not able to see how they are used in each industry.
This is why our next step is to explore the
Let us identify the topics and label them. To do that, we have observed every topic bubble and identified the keywords that made the most sense to describe the topic.
This enable us to label in a specific manner each topic and not to give a general genre, since we already have that in our features. It was difficult to pinpoint a specific theme since movies' stories can be very broad and unique, so we decided to chose the keywords as the most relevant terms to describe the topics.
If at any point you need a refresher as to what each topic is about, you can click on the blue square to the bottom right of your screen
We therefore came up with a small description for each topics:
This plot shows the distribution of the mean normalized topic prevalence in each movie industry. We replaced topic number by words that are the most representative of the topic.
We can see that the topics are not distributed uniformly across the movie industries. For example, the topic containing love, marriage, wedding is significantly more prevalent in the Indian movie industry than in the American one. We can also see that the topic containing home, school, work, party is more prevalent in the American movie industry than in the Indian one.
Those differences could be due to underlying cultural differences between the two countries, but also could be due to a bias in our dataset.
Topics inderectly inform us about the content of the movies through their summary, but so does their genre. How does the movie genre relate to the topics covered in the movie? Can we identify differences between the movies industry and refine our understanding of them through the intersection of the topics and the genre?
In this part, we offer you an in-depth study of 4 movie genres that we have selected to show you that the notion of movie genre is not uniform across the globe. This part of the analysis can be explored in many ways and we encourage you to play with the different visualizations and the different genres. We also include an analysis of one genre below the plot to help you understand how to navigate through the visualizations.
To change the focus from a movie genre to another, nothing simpler, just click on the button of interest just below, it will update the left and right pannels accordingly.
Left pannel contains a plot of the t-SNE (t-Distributed Stochastic Neighbor Embedding) latent space for Indian and American movies. This plot has been created using standardized actor data and the prevalence of various topics in the movies. The t-SNE latent space is a way of visualizing high-dimensional data in a lower-dimensional space.
There are two visualization options
On the right panel, you can switch between 3 cluster representations. You can relate each cluster to the data that describe it via its color.
There are three visualization options
You will find many examples by navigating through this plot. We will detail one of them here
Let's take the genre 'comedy'. The K-Means algorithm was set to 4 clusters. We can observe on the right panel by selecting Countries that in the cluster 1, that Indian movies are largely over-represented. On the other hand, for the cluster 2, it's the American movies that are largely over-represented.
Now let's focus on the standardized topic prevalence.
(You can access topic differences by clicking on the box with the downward arrow on the top left of the right-side plot and selecting)
The most important topic in the cluster 1 is the topic centered on family and daily life whereas in the cluster 2 this topic doesn't seem to be more covered than the average. Moreover we see that in cluster 2 the most important topic is about ships, aliens and discoveries, and in cluster 1 this topic is much less present than on average (negative mean normalized topic prevalence).
We can deduce from this analysis that in India, more comedies are made involving a family's daily life and very rarely add a sci-fi/adventure component.in the US, comedies mixed with some action seem to be very frequent. However, it is also important to notice that there are also other clusters, or sub-genres, that are represented equally in both countries like cluster 0 or cluster 3. This shows that there are some sub-genres that are equally represented in India and the US.
We can also notice than when investigating the differences in the Actor visualization between cluster 1 and cluster 2, women happen to represent a smaller proportion of the cast in the cluster 2 than in the cluster 1.
With our study, we can determine that “Bollywood” and “Hollywood” films are surely different. By making a first analysis on different characteristics, we see that they accumulate differences notably on their genres, actors features and runtimes. For the topics talked in movies, they have similarities but some subjects are more present in Indian movies such as family or wedding, while American movies have more stories about school or parties. This analysis shows that there is a cultural difference in American and Indian cinema, even when they are labeled as belonging to the same genre, for example.
We have seen that the genre drama the most common genre in America and India. However, in the last part of our analysis, we see that these type of movies have similar topics keywords for some of the clusters, but also have their differences. Indeed, two clusters stand out for each industry. We have observed that a type of Indian drama is more focused around love, romance and relationship, whereas a type of American drama is more revolved around action with crime and violence (c.f. t-SNE results and country proportion/topic repartition on drama genre, cluster id 0 and 3). Furthermore, it is interesting to see that drama has a lot of subgenres in it and that there is a whole spectrum of different stories within it.
This result supports the idea that the movie industry cultures of two radically different countries are rather difficult to compare and quite complex. Much care needs to be taken when trying to capture the subtle differences that lie between the two cultures.
It is hence reducing to try to label each movies (which is, to some extent a representation of the country's culture : c. f. « Movies and Culture: The Role of Films in Shaping Societal Norms and Values » by Jennifer A. Fritsche, published in the Journal of Social and Political Psychology in 2016.) and have a meaningful, sense-making comparison. It was important here for us to rather observe the differences and to deconstruct our biases towards a different culture and try to take a step back. It was extremely fulfilling to have a critical view of how the Hollywood or American heavily influences western culture (c.f. "The Americanization of European Cinemas: Hollywood's Influence on Local Film Industries" by Mark Jancovich, published in the journal Screen in 2002.).
In addition, we have to keep in mind the bias that we have in the dataset. Indeed, there are many more American films than Indian ones in our data, although the Indian industry is known for its large number of films produced. The films in our data are potentially just the tip of the iceberg. We also may only have the famous films which is not representative of the entire Indian film culture. To support the idea that the Indian film selection in the dataset was shaped by the western view of Indian cinema, we noted the overwhelming presence the (reducing) genre 'world cinema' when labeling the Indian movies.
Finally, we started this analysis by having the assumption that Bollywood, by the essence of its name, was inevitably influenced and shaped by Hollywood. Going through the analysis, learning more and more about Indian movie industry was enlightening and we realized how much these two industries can be considered as entities with their own identities.
Indian cinema, now a $2 billion industry, has much to offer. A treasure trove of stories and songs, its films are an art form that has grown to encompass every facet of the nation's culture. From the epic historical films to the glossy masala movies, from the arthouse parallel cinema to the song-and-dance spectaculars, Indian cinema has something for everyone. In fact, it is the largest and most diverse film industry in the world, and it is now finding an increasingly global audience.
Shah Rukh Khan, Indian actor and film producer