In this project, I aimed to expand the use of computational language processing tools to analyze Dickens’ opinions of certain people groups, categorized by occupation. Across all his novels, does Dickens reveal a dislike for clergymen? Does he appear to have a remarkable respect for grocers?
The way we decided to determine how a group was described was to search for all references to that group in Dickens' literature, and then find all of the adjectives which are used with those nouns. 
The overall approach to performing the analysis was as follows:
1) Collected raw text data from the Project Gutenberg database, using python’s requests library.
2) Cleaned the data to remove non-english, duplicate and non-literature files. Also remove all headings and text which are not a part of the work.
3) Use the NLTK library to break data into word and punctuation tokens.
4) Break the larger text into a list of sentences which can be used as context for keywords, these keywords are a list of professions and their synonyms. 
5) Find all occurrences of a given list of keywords, synonyms arranged by professional category.

After running our analysis we made a series of word clouds to visualize our results, one of which can be seen below for lawyers, a group Dickens was known for disliking. We also ran sentiment analysis on the adjectives to see how positive or negative Dickens was with his descriptions of different groups. One of the most significant results we saw was that Dickens tended to use more positive language when describing people who were higher ranking or of the upper class while using more negative language to describe pheasants or foot soldiers.  

If you're interested you can view this project repository on GitHub here, or read more about it through a computational essay here. This was a three person group project for my class Software Design. 

Wordmap of adjectives Dickens used to describe lawyers in his writing

Back to Top