Bigram formation from given a Python list

A bigram is formed by creating a pair of words from every two consecutive words from a given sentence. In python, this technique is heavily used in text analytics. Below we see two approaches on how to achieve this.

Using enumerate and split

Using these two methods we first split the sentence into multiple words and then use the enumerate function to create a pair of words from consecutive words.

Example

list = ['Stop. look left right. go']
print ("The given list is : \n" + str(list))
# Using enumerate() and split() for Bigram formation
output = [(k, m.split()[n + 1]) for m in list for n, k in enumerate(m.split()) if n < len(m.split()) - 1]
print ("Bigram formation from given list is: \n" + str(output))

Output

Running the above code gives us the following result −

The given list is :
['Stop. look left right. go']
Bigram formation from given list is:
[('Stop.', 'look'), ('look', 'left'), ('left', 'right.'), ('right.', 'go')]

Using zip() and split()

We can also create the biagram using zip and split function. The zip() function puts tithers the words in sequence which are created from the sentence using the split().

Example

list = ['Stop. look left right. go']
print ("The given list is : \n" + str(list))
# Using zip() and split() for Bigram formation
output = [m for n in list for m in zip(n.split(" ")[:-1], n.split(" ")[1:])]
print ("Bigram formation from given list is: \n" + str(output))

Output

Running the above code gives us the following result −

The given list is :
['Stop. look left right. go']
Bigram formation from given list is:
[('Stop.', 'look'), ('look', 'left'), ('left', 'right.'), ('right.', 'go')]