0% found this document useful (0 votes)
77 views

Files in Python

The document discusses various file operations in Python such as opening, reading, writing, and closing files. It provides examples of how to create and write to a file, append data to an existing file, read a file line by line, split lines into words, and remove common stop words. The key points covered are how to open a file in different modes, write and read text from files, iterate through lines, and perform basic natural language processing techniques like word counting and stop word filtering on file contents.

Uploaded by

Riya Ram
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
77 views

Files in Python

The document discusses various file operations in Python such as opening, reading, writing, and closing files. It provides examples of how to create and write to a file, append data to an existing file, read a file line by line, split lines into words, and remove common stop words. The key points covered are how to open a file in different modes, write and read text from files, iterate through lines, and perform basic natural language processing techniques like word counting and stop word filtering on file contents.

Uploaded by

Riya Ram
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 22

Files - Python

Mrs.S.Karthiga
Files

 A file is a contiguous set of bytes used


to store data.
 This data is organized in a specific
format and can be anything as simple
as a text file or as complicated as a
program executable.
 In the end, these byte files are then
translated into binary 1 and 0 for easier
processing by the computer.
Opening a File
 In Python, there is no need for importing external library to read and
write files. Python provides an inbuilt function for creating, writing
and reading files.

Syntax:-
file = open(“a.txt”)

This is done by invoking the open() built in function.


Create a file
f= open(“hi.txt","w+")
 We declared the variable f to open a file named textfile.txt.
Open takes 2 arguments, the file that we want to open and a
string that represents the kinds of permission or operation we
want to do on the file
 Here we used "w" letter in our argument, which indicates write
and the plus sign that means it will create a file if it does not
exist in library
 The available option beside "w" are "r" for read and "a" for
append and plus sign means if it is not there then create it
Writing in a file

file = open("t.txt","w+")
file.write("hi")
file.close()

t.txt file contains the “hi” message


File open(), Close() and iteration
Hello.txt
def main(): This is line0
f= open("hello.txt","w+") This is line1
for p in range(10): This is line2
This is line3
f.write("this is line%d\n"
%p) This is line4
This is line5
f.close()
This is line6
if __name__=="__main__": This is line7
main() This is line8
hello.txt
This is line9
How to Append Data to a File
Hello.txt
def main(): This is line0
This is line1
f= open("hello.txt","a+") This is line2
This is line3
for p in range(3): This is line4

f.write("appended line%d\n" %p) This is line5


This is line6
f.close() This is line7
This is line8
if __name__=="__main__": This is line9
appended line0
main() appended line1
appended line2
 a plus sign in the code, it indicates that it will create a
new file if it does not exist. But in our case we already
have the file, so we are not required to create a new file.
How to Read a File
 Not only you can create .txt file from Python but you can also
call .txt file in a "read mode"(r). Hello.txt
This is line0
This is line1
 Ex:- This is line2
def main(): This is line3
This is line4
f= open("hello.txt","r")
This is line5
if f.mode=="r": This is line6

contents=f.read() This is line7


This is line8
print(contents)
This is line9
if __name__=="__main__": appended line0
appended line1
main()
appended line2
How to Read a File line by line

 You can also read your .txt file line by line if your data is too big to read.
 This code will segregate your data in easy to ready mode
Hello.txt
 Ex:-
This is line0
def main():
f= open("hello.txt","r") This is line1
f1=f.readlines()
This is line2
for x in f1:
print(x) This is line3
if __name__=="__main__":
main() This is line4

This is line5
…..
Writing multiple lines to a file at once
fh = open("hello.txt","w")
lines_of_text = ["One line of text here\n", "and another line here"]
fh.writelines(lines_of_text)
fh.close()

One line of text here


and another line here
With statment
 You can also work with file objects using the with
statement.
 It is designed to provide much cleaner syntax and
exceptions handling when you are working with code. That
explains why it’s good practice to use the with statement
where applicable. 
 One bonus of using this method is that any files opened
will be closed automatically after you are done. This
leaves less to worry about during cleanup. 
Ex:-1
with open("hello.txt", "w") as f:
f.write("Hello World")

Ex:-2 : To read a file line by line


with open("hello.txt", "w") as f:
data=f.readlines()
Splitting Lines in a Text File

with open("hello.text", "r") as f:


data = f.readlines()
for line in data:
[“hello”, “world”, “how”, “are”, “you”, “today?”]
words = line.split()
[“today”, “is”, “Saturday”]
print(words)
Word count example
wordstring = 'it was the best of times it was the worst of times '
wordstring += 'it was the age of wisdom it was the age of foolishness'

wordlist = wordstring.split()

wordfreq = []
for w in wordlist:
wordfreq.append(wordlist.count(w))

print("String\n" + wordstring +"\n")


print("List\n" + str(wordlist) + "\n")
print("Frequencies\n" + str(wordfreq) + "\n")
print("Pairs\n" + str(zip(wordlist, wordfreq))
Output
String

it was the best of times it was the worst of times it was the age of wisdom it was the
age of foolishness
List
['it', 'was', 'the', 'best', 'of', 'times', 'it', 'was','the', 'worst', 'of', 'times', 'it', 'was', 'the',
'age',
'of', 'wisdom', 'it', 'was', 'the', 'age', 'of','foolishness']
Frequencies
[4, 4, 4, 1, 4, 2, 4, 4, 4, 1, 4, 2, 4, 4, 4, 2, 4, 1, 4,4, 4, 2, 4, 1]
Pairs
[('it', 4), ('was', 4), ('the', 4), ('best', 1), ('of', 4),('times', 2), ('it', 4), ('was', 4), ('the', 4),
('worst', 1), ('of', 4), ('times', 2), ('it', 4),('was', 4), ('the', 4), ('age', 2), ('of', 4),
('wisdom', 1), ('it', 4), ('was', 4), ('the', 4),('age', 2), ('of', 4), ('foolishness', 1)]
Removing stop words Example
 The process of converting data to something a computer
can understand is referred to as pre-processing. One of
the major forms of pre-processing is to filter out useless
data. In natural language processing, useless words (data),
are referred to as stop words.
 Stop Words: A stop word is a commonly used word (such as
“the”, “a”, “an”, “in”) that a search engine has been
programmed to ignore
To check the list of stopwords you can type the following commands in the python shell.

import nltk
from nltk.corpus import stopwords
set(stopwords.words('english'))

{‘ourselves’, ‘hers’, ‘between’, ‘yourself’, ‘but’, ‘again’, ‘there’, ‘about’, ‘once’, ‘during’, ‘out’,
‘very’, ‘having’, ‘with’, ‘they’, ‘own’, ‘an’, ‘be’, ‘some’, ‘for’, ‘do’, ‘its’, ‘yours’, ‘such’, ‘into’,
‘of’, ‘most’, ‘itself’, ‘other’, ‘off’, ‘is’, ‘s’, ‘am’, ‘or’, ‘who’, ‘as’, ‘from’, ‘him’, ‘each’, ‘the’,
‘themselves’, ‘until’, ‘below’, ‘are’, ‘we’, ‘these’, ‘your’, ‘his’, ‘through’, ‘don’, ‘nor’, ‘me’,
‘were’, ‘her’, ‘more’, ‘himself’, ‘this’, ‘down’, ‘should’, ‘our’, ‘their’, ‘while’, ‘above’, ‘both’,
‘up’, ‘to’, ‘ours’, ‘had’, ‘she’, ‘all’, ‘no’, ‘when’, ‘at’, ‘any’, ‘before’, ‘them’, ‘same’, ‘and’,
‘been’, ‘have’, ‘in’, ‘will’, ‘on’, ‘does’, ‘yourselves’, ‘then’, ‘that’, ‘because’, ‘what’, ‘over’,
‘why’, ‘so’, ‘can’, ‘did’, ‘not’, ‘now’, ‘under’, ‘he’, ‘you’, ‘herself’, ‘has’, ‘just’, ‘where’, ‘too’,
‘only’, ‘myself’, ‘which’, ‘those’, ‘i’, ‘after’, ‘few’, ‘whom’, ‘t’, ‘being’, ‘if’, ‘theirs’, ‘my’,
‘against’, ‘a’, ‘by’, ‘doing’, ‘it’, ‘how’, ‘further’, ‘was’, ‘here’, ‘than’}

Note: You can even modify the list by adding words of your choice in the english .txt. file in the
stopwords directory.
Ex:- removing stop words
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
example_sent = "This is a sample sentence, showing off the stop words filtration."
stop_words = set(stopwords.words('english'))
word_tokens = word_tokenize(example_sent)
filtered_sentence = [w for w in word_tokens if not w in stop_words]
filtered_sentence = []
for w in word_tokens:
if w not in stop_words:
filtered_sentence.append(w)
print(word_tokens)
print(filtered_sentence)
Output

['This', 'is', 'a', 'sample', 'sentence', ',', 'showing', 'off', 'the',


'stop', 'words', 'filtration', '.']

['This', 'sample', 'sentence', ',', 'showing', 'stop', 'words',


'filtration', '.']
Performing the Stopwords operations
in a file
import io
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
#word_tokenize accepts a string as an input, not a file.
stop_words = set(stopwords.words('english'))
file1 = open("text.txt")
line = file1.read()# Use this to read file content as a stream:
words = line.split()
for r in words:
if not r in stop_words:
appendFile = open('filteredtext.txt','a')
appendFile.write(" "+r)
appendFile.close()
Command Line arguments
 The Python sys module provides access to any command-line arguments via thesys.argv. This
serves two purposes −
 sys.argv is the list of command-line arguments.
 len(sys.argv) is the number of command-line arguments.
 Ex:-

import sys
print (“Number of arguments:”, len(sys.argv), “arguments.”)
Print(“Argument List:”, str(sys.argv))
If you pass this in a command line Number of arguments: 7 arguments.
$ python test.py arg1 arg2 arg3 Argument List:
['main.py','$','python','test.py',
'arg1','arg2','arg3']

You might also like