Let's see how we collected the data used to build our Rick and Morty network!
We decided to use the Rick and Morty universe's characters as nodes and the references made from the Wiki descrption page of a character to one another as edges.
We started from the list of all characters available in the Wiki Fandom.
We parsed the HTML pages by using BeautifulSoup Python package. We checked if each item of characters' list redirected well to the Wiki description page of a character - not a Category. We extracted the title of Wiki page as the name of character, and the Wiki page hyperlink which is the unique key that we will use later for edges' collection.
Some manual operations were requested, like removing "Evil_Morty/Theories" that appears as an actual character, or adding "Jerry Smith" which doesn't exist in characters' Wiki list!
Thus we obtained a first basis of 547 characters.
To obtain nodes' attributes, we called an API, that provides the origin, the species, the gender and the status (dead/alive) of characters.
Unfortunately a lot of information were missing so we populated remaining "Unknown" attributes manually.
Another issue appeared: the number of characters and the character name given by the API differed from the Wiki. Consequently, we set specific rules to match our initial characters dataframe, based on the Wiki information, with the API output.
After the nodes and attributes collection, we are still missing how these nodes are connected!
We consider that a relation exists from a character to another one when its associated Wiki page redirects to the page of the other character, like we did during the Social Graphs & Interactions course with the Zelda study case.
But getting the edges in the Rick and Morty universe was a bit tricky! Indeed, we understood here that the hyperlink of a character Wiki page was the only unique piece of information to distinguish characters. Thus we used those hyperlink to create relations between character' Wiki pages, instead of considering the character name alone.
First, we searched for all references ("[[<aCharacter>]]") in the source code of the studied character Wiki page, like we did for Zelda in the course. In parallel, we looked for all hyperlinks (href: tags) by analysing the HTML code. In order to ensure that we remove unnecessary data, we interesected those two lists with the displayed text ("value") of each link.
Lastly, we extracted the title of each involved redirection Wiki page: if it was the actual name of a character found in our characters dataset, the relation between the character's page and the mentioned character was established.