After some data cleaning, our final graph consists of 514 nodes and 1891 edges.
Let's compute some statistics to our network data, and study some concepts from graph theory that we seen in this course!
TABLE OF CONTENTS
Let's plot and reflect on the distribution of our 4 character attributes.
By analyzing the repartition of gender, we clearly observe more male than female characters: there are precisely 4.4x more males than females.
Therefore Rick and Morty is far to respect gender equity... BUT! There is a "BUT"!
Besides we tagged as "genderless" the characters which are not gendered, like some aliens or robots. The only character tagged with "Both" genders is the duo Dipper and Mabel Mortys who represents a little Morty boy and a little Morty girl found at The Citadel of Ricks.
Knowing the Rick and Morty cartoon show and all Rick's enemies, we would expect more deaths to be honest. Though there is still a part of mystery in the "Unknown" fates of characters, and we keep in mind that the "universe" is large.
Then, we could wonder: is the anti-hero we're all thinking about responsible of the dead people? Let's get the data speak... By tracking the "Category: Killed by Rick" in Wiki pages - yes, this category exists... We learn that 39% of deceased people were actually killed by Rick!
We note how various this universe is! Although, we mostly count Human beings - from Earths of different dimensions, and Aliens, in the Rick and Morty universe.
Origin is the attribute where we count the highest ratio of "Unknown" values... Indeed, neither the Wiki nor the external API we used can provide enough answers to this question. Most of the time, the origin of characters is not mentioned: they assume that knowing the associated species is sufficient, or the show isn't explicit enough on this characteristic. Therefore it was quite difficult to populate this attribute manually, in comparison with previous presented attributes.
We must not fail to create network visualisations with Force Atlas spatialisation, in order to present the Rick and Morty characters.
First, we generated a visualisation with a node coloration based on status attribute.
We can distinguish some hubs of dead characters, like Parasites at the right side, or The Council of Ricks - destroyed in The Rickshank Rickdemption (S03E01).
Besides, we could use this data to predict the status of unknown characters.
In the second network visualisation below, color matches gender attribute and node size is based on degree.
The two biggest nodes are Rick and Morty, the two main protagonists and adventurous explorers of our project. It is easy to notice the rest of the Smith family as well: the two biggest yellow nodes are Summer and Beth, and - of course, last but not least, Jerry (the stupid husband), as the third biggest purple node at right side, between his wife and his daughter.
Now, your task is to find where is the only green node - the duo Dipper and Mabel Mortys, we mentioned previously in the gender attribute section!
This is the IN and OUT degree distributions of our "Rick and Morty" network.
The two distributions are clearly different: extreme values (from 0 to 250) of IN-degrees are reached, especially the lowest values. In contrast, by analyzing the OUT degree distribution, degrees fluctuate from 0 to almost 40, which is quite lower than IN-degree.
In other words, it means first that a lot of characters are targetted by a few other characters, but some quite and very popular characters exist - those should be the main protagonists. In addition, the way characters target other ones is less unbalanced: we are at the vicnity of 3,7 = about 4 targetted characters per Wiki character page in average (std = 4). Besides, we note that the mean value in the same for in and out degree, but the variance is much greater in "in-degree" case, and the median is lower.
By viewing overall degree distribution, in particular by applying a log-scale, we recognize the manifestion of the scale-free property, with a power law shape, specific to the "Barabási-Albert" network - where nodes follow a preferiential attachment law (source: Network Science Book, by Barabási).
Thus, we can fit a power law to compute the degree exponent γ of both IN and OUT degree distributions, using powerlaw Python package.
We computed the optimal xmin to minimize the error σ, which is low enough in both fits to validate the model.
Both degree exponents γ are between 2 and 3, so we are in "Ultra Small World" theorical regime: the groups of higly connected nodes (hubs) have a significant influence of connections in the network, hence they radically reduce the path length.
In addition, in-degree exponent is higher than out-degree exponent: this fact confirms our interpretation based on degree distributions.
From the plots, it can be noticed that for the in and out degree, the first characters are the same: we find the Smith family in both distributions.
Then, for in-degree case, we find Evil Morty as first non-Smith C-137 character.
For ou-degree case, Memory Parasites appear as a highly connected node. Indeed they probably refer to all specific Parasites.
We also plotted the scatter plot of IN versus OUT degree, to study the reciprocity and answer the question: if a character is targetted a lot, is he/she targetting a lot of characters as well?
In general, as we seen in scale-free analysis with degree exponents for both IN and OUT degrees, we notice that characters are more likely to be targeted than target to other characters. However we detect an increase for both degrees at some point, especially for highest values of IN degree.
Thus, we tried to fit a linear regression on the last node. We studied the R-squared correlation coefficient to conclude that considering the last 9 nodes could be used in a linear model. We chose 9 nodes to fit the linear regression by computing the R-squared for different parts of the nodes, starting from the last ones and adding nodes progressively.