A bigram is formed by creating a pair of words from every two consecutive words from a given sentence. In python, this technique is heavily used in text analytics. Below we see two approaches on how to achieve this.
Using enumerate and split
Using these two methods we first split the sentence into multiple words and then use the enumerate function to create a pair of words from consecutive words.
Example
list = ['Stop. look left right. go'] print ("The given list is : \n" + str(list)) # Using enumerate() and split() for Bigram formation output = [(k, m.split()[n + 1]) for m in list for n, k in enumerate(m.split()) if n < len(m.split()) - 1] print ("Bigram formation from given list is: \n" + str(output))
Output
Running the above code gives us the following result −
The given list is : ['Stop. look left right. go'] Bigram formation from given list is: [('Stop.', 'look'), ('look', 'left'), ('left', 'right.'), ('right.', 'go')]
Using zip() and split()
We can also create the biagram using zip and split function. The zip() function puts tithers the words in sequence which are created from the sentence using the split().
Example
list = ['Stop. look left right. go'] print ("The given list is : \n" + str(list)) # Using zip() and split() for Bigram formation output = [m for n in list for m in zip(n.split(" ")[:-1], n.split(" ")[1:])] print ("Bigram formation from given list is: \n" + str(output))
Output
Running the above code gives us the following result −
The given list is : ['Stop. look left right. go'] Bigram formation from given list is: [('Stop.', 'look'), ('look', 'left'), ('left', 'right.'), ('right.', 'go')]