Rap music (commonly referred to as hip hop music) is a very popular genre of music that has been around since the 1970s. Its origins are generally credited to African-Americans in the South Bronx area in New York City (East side!!)

Did you know: Hip hop music developed as part of hip hop culture. The culture comprised: deejaying (turntabling), MCing (rapping), graf (graffiti paiting), and B-boying (Break dancing).

Of all the components of hip hop culture, rap music has turned out to be the most dominant aspect. It has taken over the world by storm!!!

You can read detailed articles on hip hop, here and here. Also, Why there will never be a GOAT in hip hop

Over time, rap music has come to be associated with violence, sex, drugs, extreme luxury etc. Anectdotally, this might be due to the topics that early hip hop artists were singing about (a case of art of imitating life) and the propagation of myths and stereotypes. Beefs and murders involving people in the hip hop industry has not helped hip hop's case.

But is rap music primarily about drugs, sex, money, and violence? Or is it one of those things we all believe since we hear it all the time? I set out to answer this question by examining music from some 16 rap artists who have been in the industry both new-age and seasoned rappers.

If you can name all the rappers shown in the collage above, comment below for a gift.

The data

Lyrics from all available songs off metrolyrics.com (about 6000 songs) was collected for the following artists (the choice was arbitrary):

Drake, Lil Wayne, Cardi B, Kanye West, Future, Rick Ross, Migos, Eminem

Kendrick Lamar, Dr. Dre, Lil Yatchy, J Cole, Nas, Wiz Khalifa, Jay Z, Nicki Minaj

Join me, and let's find out how inciteful this data can get…

But first, what is in the data?

Let's dig into the data a bit, and see what we find…

Artist activity by year

Psst.. You can click on the legend or the chart

How long are the songs?

Most hiphop songs range from 3 to 5 minutes long, but how many words do they really contain? Excluding common stopwords, here's how the length compared

I understand how Eminem's songs get their length, you know, lyricsm… but Cardi B? That was a shocker!!!

Rappers are beefing aaaaalll the time!! It's pretty much what the rap game thrives on. Are they really that different? Are their public differences really founded, lyrically speaking? Comparisons were done on the 100 most frequently uttered words for the following:

Jay Z vs Nas

Not much difference… at face value anyway

Cardi B vs Nicki Minaj

Cardi's really into money, bags, and herself.. One could attribute that to her still being fresh in the game.

Notice how the ladies have higher occurence of the words b**** while the guys utter ni**** more.

You can view all word maps, here. J. Cole's map will shock you!!

Artist's music sentiment

Rap music is generally considered negative. But how negative is it, really?

Each word (exlcuding stopwords) was given valence (-5 most negative to +5 most positive) and an aggregate value for all lyrics calculated.

Excluding Cardi B and Lil Yatchy (since they don't have as many songs as the rest) Kanye is quite positive!! Hell yea, Yeezy. Lil Wayne is (not surprisingly) not doing very well.

Onto the question of this post…

What is hiphop?

Let's start with the simplest approach… What are the most frequently used words by ALL the rappers under examination?

Well, that doesn't tell us much, now does it?

To have an (approximately accurate) answer in the easiest way possible, let's break down the lyrical content:

Hiphop music. A lyrical breakdown

For all the music collected, the average length of a song was computed i.e. how many words a song contains (excluding stopwords). The lyrical content was then broken down into drugs, sex, money, and the rest. Each category was computed by checking for existence of words of that category per song e.g. count the number of times 'weed, joint, henny' etc exist in a song, to check for drugs.

Here are the results…

Each block is one word.

For an average length of 128 words, only 7 words are about drugs, sex, and money. I was surprised too!!!

Worst offenders

Now that we've seen what hiphop really is about, which turned out not to be what 'facts' have been telling us this whole time, who are the worst offenders? How do the artists rate in their usage of lyricsm about sex, drugs, and money?

Excluding Cardi B and Lil Yatchy (since they don't have as many songs as the rest) notice how Kanye has the least occurence, and also the most positive music? Again, go Yeezy!!!!

So, is hiphop only about drugs, sex, and money?

A word

16 artists cannot represent the entirety of the hiphop industry, so take these results with a grain (or two)of salt!!!

Kanye rocks!!!

The following section is an in-depth explanation/ tutorial of how everything in this post was done… code, technique and all. If uninterested, you may skip to the comments section here.

THE TUTORIAL

In this section, I provide a full tutorial on how to recreate exactly what I did

Prerequisites:

Python >2.7

The code repository off Github, here

Basics of web crawling using Scrapy (start here)

1. DATA COLLECTION

Refer to LyricCrawler/

A. SETUP

- For this project, Scrapy was the primary data collection tool. Set-up, create project and move into it

pip install scrapy
scrapy startproject scraperName
cd scraperName

B. METROLYRICS.COM SITE STRUCTURE

- To collect the lyrics, first, the artists were chosen. I ended up with a list of 16 artists… a mixture of new and old, male and female.
- Several lyrics sites were studied. I ended picking Metrolyrics due to its simplicity.
- For each musican, Metrolyrics has a list of their music (every song ever registered on there) and the year of production.
* For instance, Kanye's page… http://www.metrolyrics.com/kanye-west-lyrics.html
Note: The name of the song is a link to a page with the full lyrics of the song

C. CRAWLING

- To crawl and collect all the lyrics

Open the grand list of music by each artist (the URLs were collected manually)

Collect the years of production and the links to the lyrics off the list

Go to the lyrics page and copy the lyrics

- Scrapy makes this process very easy. Refer to LyricCrawler/spiders/lyricsbot.py
The output of the spider:

2. DATA CLEANING AND PREPROCESSING

Refer to eda.py cleanDataset() method
-The data collected cannnot be used as is. Our primary concern is words therefore we need to remove entries without lyrics, remove stopwords and other unneccessary characters
a) Remove empty entries

import pandas as pd
import numpy as np
#Check for, and remove empty lyrics
df['lyric'].replace('',np.nan,inplace=True)
df.dropna(subset=['lyric'], inplace=True )

b) Remove unneccesary characters and stopwords from lyrics, song and artist name

from sklearn.feature_extraction import text 
#Stopwords
stopWords = text.ENGLISH_STOP_WORDS.union(["I", "like", "I'm", "don't", "ain't", "got", "know", "You", "And", "But", "can't", "just"])
##Remove '\n' from lyric and song name, and stopwords
#From lyric
df['lastLyric'] = df['lyric'].apply(lambda x: ' '.join([word for word in x.split() if word not in stopWords]))
#From name
df['cleanName'] = df['name'].apply(lambda x: x.replace('\n',''))
##Remove the word 'Lyrics'
df['lastName'] = df['cleanName'].apply(lambda x: x.replace('Lyrics', ''))
#From artist
df['lastArtist'] = df['artist'].apply(lambda x: x.replace('Lyrics', ''))

3. EXPLORATORY DATA ANALYSIS

Refer to eda.py

STACKED AREA PLOT VIZ ON TABLEAU

Refer to eda.py genStackedPlotData() method
- The stacked area plot of artist activity was plotted using Tableau.
- To plot, the data needs to have year, artist name and number of songs per year
- The data was extracted from our raw dataset using Pandas, as follows

df.drop(['lyric','name','link'], axis=1, inplace=True) #Remove unneeded columns from dataframe
fd = df.groupby(['year','artist']).agg(np.size).reset_index(name='count') #Group data by year and artist, get the aggregate (count) and name the resulting column 'count'
fd.to_csv('data/forChart.csv') #Write data to a CSV file for Tableau

- The generated CSV file is plugged into Tableau to plot a stacked area plot as explained here: Tableau stacked area plot

SONG LENGTH

Refer to eda.py lenComp() method

- To compute the average song length of each artist, group lyrics by artist, and get median length of the lyrics.

df['lyricLength'] = df['lyric'].apply(lambda y: len(y.split())) #Get length of each song (Split lyrics into array and count)
df.drop(['year','name','link', 'lyric'], axis=1, inplace=True) #Remain with lyrics and artists columns only
fd = df.groupby(['artist']).agg(np.median).reset_index() #Group by artist and get median lenght. Create new dataframe

- Plot the data using Seaborn

import seaborn as sns
import matplotlib.pyplot as plt
sf = fd.sort_values('lyricLength', ascending=False) #Sort by length of lyrics
g = sns.barplot(x=sf['artist'], y=sf['lyricLength'], data=sf) #plot
g.set_xticklabels(sf['artist'], rotation=15)
plt.title("ARTIST SONG LENGTH")
plt.ylabel("SONG LENGTH (MEDIAN)")
plt.xlabel("ARTIST")
plt.show()

WORD CLOUDS

- Frequency of words used by an artist is an interesting aspect to investigate. One of the best method to do so is by plotting wordclouds

CREATING WORD MAP

Refer to eda.py drawMap() method
- To draw compelling and insightful wordclouds, clouds that match what is being represented are very ideal e.g. having a world cloud spelling out 'Nas' when doing a frequency analysis for the artist Nas.
- The word to be spelled out is called a map, and can be generated from text using Open CV.
a) Install OpenCV

pip install opencv-python

b) Generate map
- The most important function is the putText() function that generates an image when given a string.

import cv2
ln = len(nm)
#nm is the string to draw 
img = np.zeros((612, ln*420, 3), np.uint8)

 
font = cv2.FONT_HERSHEY_SIMPLEX
 
cv2.putText(img,nm,(0,500), font, 18, (255,255,255),85)
 
img = 255-img
 
cv2.imwrite('name.jpg', img)

- The generated image by OpenCV is then written to a file 'name.jpg' for later use

POPULATING MAP

Refer to eda.py drawCloud() method

- Once the map (the image) is generated, it's time to use it to show frequencies on it.

- The WorldCloud Library comes in handy

pip install wordcloud

- Populate the map

from wordcloud import WordCloud
img = Image.open("name.jpg")
 
hcmask = np.array(img)

 
wordcloud = WordCloud(mask=hcmask, min_font_size=10, background_color=None, mode='RGBA', colormap='gist_rainbow').generate(corpus)

MUSIC SENTIMENT

Refer to eda.py senti() method

- Sentiment in words can be approached in so many ways, everything from asking people to building machine learning models

- For this post, I wanted to do the bare minimum… calculate the level of negativity (or positivity) in a song. This too can be approached in so many ways. I chose the easiest way possible.. what if you assign each word a level of valence (from –5 most negative to +5 most positive) then aggregate? Well, this guy did exactly that, and he wrote a python module for it!!!

Here's the idea:

a) Get all music for each unique artist as a single string

b) Calculate the score(sentiment) for each string

c) Store the artist and score in a dictionary

d) Plot that mafaka!!

from afinn import Afinn

affin = Affin()
#The code goes here. It refuses to render properly on here. Check out Github

LYRICAL BREAKDOWN

Refer to eda.py occurences() method

- The idea behind lyrical breakdown is simple: Add the existence of words about sex, drugs, or money per song the do an analysis

#Define words to look for. This should be expanded.
drugs = ['weed', 'blunt', 'pot', 'chronic', 'ganja', 'cannabis', 'henny', 'hennessy', 'cristal', 'patron', 'bacardi', 'moet', 'molly', 'crack', 'cocaine', 'ganja', 'mdma', 'ecstasy', 'purple', 'coke']
 
sex = ['pussy', 'hoes', 'bitches', 'bitch', 'booty', 'balls deep', 'sex', 'ass', 'bone', 'DTF', 'fucking']
 
money = ['money', 'dough', 'bands', 'benjamins', 'cheddar', 'buck', 'bucks', 'dinero', 'pesos', 'quid']

 
#Calc item occurence
 
df['lyricLength'] = df['lyric'].apply(lambda y: len(y.split()))

 
df['drugs'] = df['lyric'].apply(lambda x: hesabu(drugs, x))
 
#Repeat for sex and money.
#The new columns contain a set,containing the total occurence of all words and a dict of the specific words and their counts. Access them appropriately for percentages n shii

#The function hesabu()...
from Collections import Counter
hesabu(needle, haystack):
 
 count = dict(Counter(haystack.split()).most_common())
 
 items = dict()
 
 total = 0
 
 for key in count.keys():
  
  if(key in needle):
   
   items[key] = count.get(key)
   
   total = total + count.get(key)
 
 return items, total

VISUALISING

- Let's draw some charts!!!!

THE WAFFLE CHART

- This is a chart that literally looks like a waffle and is a fantastic replacement for waffle charts. Check the Python implementation here

Install the bitch

pip install pywaffle

- Calculate the values to be represented i.e. average values of song length, occurence of sex, drugs, and money

medianLen = math.ceil(np.average(df['lyricLength']))
 
drugs = math.ceil(np.average(df['drugsTotal']))
 
sex = math.ceil(np.average(df['sexTotal']))
 
money = math.ceil(np.average(df['moneyTotal']))
 
rest= medianLen - (drugs+sex+money)

 
data = {'Drugs' : drugs,'Sex' : sex, 'Money': money, 'Others' :rest}

- Build the chart

from pywaffle import Waffle
fig = plt.figure(
  
 FigureClass=Waffle, 
  
 rows=4, 
  
 columns=71,
  
 values = data, 
  
 colors = ('#ff0000', '#0000cc', '#006600', '#000000'),
  
 legend = {'loc': 'lower left', 'bbox_to_anchor': (0, -0.4), 'ncol': len(data), 'framealpha': 0}
  
)
 
plt.title('A BREAKDOWN OF LYRICAL CONTENT FOR A SONG')
 
plt.show()

THE BAR CHART

- You're all grown up now. You can figure this one out!!!

THE END

Comments

Puppetter5 October 2018 at 10:20
I'm super stoked you did this looking forward to read more of your work.
Unknown11 October 2018 at 04:31
This comment has been removed by the author.
Unknown11 October 2018 at 04:34
Data analytics is super magic!! I mean who'd have thought that hip hop is not really about money,sex and drugs? Until you read between the "python" lines(if you know what i mean :) )
Unknown14 February 2019 at 00:30
Python is magic...The new oil in any company ....On a light note..Eminem be the inspiring for years..no nude,money,sex or drugs...just super words that touches my heart.
Mwadime22 February 2019 at 22:25
Great Work Gaks...Analysis is spot on

Search

I HAVE MOVED TO kelvingakuo.github.io

HIP HOP MUSIC. IS IT ONLY ABOUT DRUGS, SEX AND MONEY?

The data

Artist activity by year

How long are the songs?

Jay Z vs Nas

Artist's music sentiment

What is hiphop?

Hiphop music. A lyrical breakdown

Worst offenders

A word

THE TUTORIAL

1. DATA COLLECTION

A. SETUP

B. METROLYRICS.COM SITE STRUCTURE

C. CRAWLING

2. DATA CLEANING AND PREPROCESSING

3. EXPLORATORY DATA ANALYSIS

STACKED AREA PLOT VIZ ON TABLEAU

SONG LENGTH

- To compute the average song length of each artist, group lyrics by artist, and get median length of the lyrics.

WORD CLOUDS

CREATING WORD MAP

POPULATING MAP

MUSIC SENTIMENT

LYRICAL BREAKDOWN

Comments

Post a Comment