Files - Python
Mrs.S.Karthiga
Files
A file is a contiguous set of bytes used
to store data.
This data is organized in a specific
format and can be anything as simple
as a text file or as complicated as a
program executable.
In the end, these byte files are then
translated into binary 1 and 0 for easier
processing by the computer.
Opening a File
In Python, there is no need for importing external library to read and
write files. Python provides an inbuilt function for creating, writing
and reading files.
Syntax:-
file = open(“a.txt”)
This is done by invoking the open() built in function.
Create a file
f= open(“hi.txt","w+")
We declared the variable f to open a file named textfile.txt.
Open takes 2 arguments, the file that we want to open and a
string that represents the kinds of permission or operation we
want to do on the file
Here we used "w" letter in our argument, which indicates write
and the plus sign that means it will create a file if it does not
exist in library
The available option beside "w" are "r" for read and "a" for
append and plus sign means if it is not there then create it
Writing in a file
file = open("t.txt","w+")
file.write("hi")
file.close()
t.txt file contains the “hi” message
File open(), Close() and iteration
Hello.txt
def main(): This is line0
f= open("hello.txt","w+") This is line1
for p in range(10): This is line2
This is line3
f.write("this is line%d\n"
%p) This is line4
This is line5
f.close()
This is line6
if __name__=="__main__": This is line7
main() This is line8
hello.txt
This is line9
How to Append Data to a File
Hello.txt
def main(): This is line0
This is line1
f= open("hello.txt","a+") This is line2
This is line3
for p in range(3): This is line4
f.write("appended line%d\n" %p) This is line5
This is line6
f.close() This is line7
This is line8
if __name__=="__main__": This is line9
appended line0
main() appended line1
appended line2
a plus sign in the code, it indicates that it will create a
new file if it does not exist. But in our case we already
have the file, so we are not required to create a new file.
How to Read a File
Not only you can create .txt file from Python but you can also
call .txt file in a "read mode"(r). Hello.txt
This is line0
This is line1
Ex:- This is line2
def main(): This is line3
This is line4
f= open("hello.txt","r")
This is line5
if f.mode=="r": This is line6
contents=f.read() This is line7
This is line8
print(contents)
This is line9
if __name__=="__main__": appended line0
appended line1
main()
appended line2
How to Read a File line by line
You can also read your .txt file line by line if your data is too big to read.
This code will segregate your data in easy to ready mode
Hello.txt
Ex:-
This is line0
def main():
f= open("hello.txt","r") This is line1
f1=f.readlines()
This is line2
for x in f1:
print(x) This is line3
if __name__=="__main__":
main() This is line4
This is line5
…..
Writing multiple lines to a file at once
fh = open("hello.txt","w")
lines_of_text = ["One line of text here\n", "and another line here"]
fh.writelines(lines_of_text)
fh.close()
One line of text here
and another line here
With statment
You can also work with file objects using the with
statement.
It is designed to provide much cleaner syntax and
exceptions handling when you are working with code. That
explains why it’s good practice to use the with statement
where applicable.
One bonus of using this method is that any files opened
will be closed automatically after you are done. This
leaves less to worry about during cleanup.
Ex:-1
with open("hello.txt", "w") as f:
f.write("Hello World")
Ex:-2 : To read a file line by line
with open("hello.txt", "w") as f:
data=f.readlines()
Splitting Lines in a Text File
with open("hello.text", "r") as f:
data = f.readlines()
for line in data:
[“hello”, “world”, “how”, “are”, “you”, “today?”]
words = line.split()
[“today”, “is”, “Saturday”]
print(words)
Word count example
wordstring = 'it was the best of times it was the worst of times '
wordstring += 'it was the age of wisdom it was the age of foolishness'
wordlist = wordstring.split()
wordfreq = []
for w in wordlist:
wordfreq.append(wordlist.count(w))
print("String\n" + wordstring +"\n")
print("List\n" + str(wordlist) + "\n")
print("Frequencies\n" + str(wordfreq) + "\n")
print("Pairs\n" + str(zip(wordlist, wordfreq))
Output
String
it was the best of times it was the worst of times it was the age of wisdom it was the
age of foolishness
List
['it', 'was', 'the', 'best', 'of', 'times', 'it', 'was','the', 'worst', 'of', 'times', 'it', 'was', 'the',
'age',
'of', 'wisdom', 'it', 'was', 'the', 'age', 'of','foolishness']
Frequencies
[4, 4, 4, 1, 4, 2, 4, 4, 4, 1, 4, 2, 4, 4, 4, 2, 4, 1, 4,4, 4, 2, 4, 1]
Pairs
[('it', 4), ('was', 4), ('the', 4), ('best', 1), ('of', 4),('times', 2), ('it', 4), ('was', 4), ('the', 4),
('worst', 1), ('of', 4), ('times', 2), ('it', 4),('was', 4), ('the', 4), ('age', 2), ('of', 4),
('wisdom', 1), ('it', 4), ('was', 4), ('the', 4),('age', 2), ('of', 4), ('foolishness', 1)]
Removing stop words Example
The process of converting data to something a computer
can understand is referred to as pre-processing. One of
the major forms of pre-processing is to filter out useless
data. In natural language processing, useless words (data),
are referred to as stop words.
Stop Words: A stop word is a commonly used word (such as
“the”, “a”, “an”, “in”) that a search engine has been
programmed to ignore
To check the list of stopwords you can type the following commands in the python shell.
import nltk
from nltk.corpus import stopwords
set(stopwords.words('english'))
{‘ourselves’, ‘hers’, ‘between’, ‘yourself’, ‘but’, ‘again’, ‘there’, ‘about’, ‘once’, ‘during’, ‘out’,
‘very’, ‘having’, ‘with’, ‘they’, ‘own’, ‘an’, ‘be’, ‘some’, ‘for’, ‘do’, ‘its’, ‘yours’, ‘such’, ‘into’,
‘of’, ‘most’, ‘itself’, ‘other’, ‘off’, ‘is’, ‘s’, ‘am’, ‘or’, ‘who’, ‘as’, ‘from’, ‘him’, ‘each’, ‘the’,
‘themselves’, ‘until’, ‘below’, ‘are’, ‘we’, ‘these’, ‘your’, ‘his’, ‘through’, ‘don’, ‘nor’, ‘me’,
‘were’, ‘her’, ‘more’, ‘himself’, ‘this’, ‘down’, ‘should’, ‘our’, ‘their’, ‘while’, ‘above’, ‘both’,
‘up’, ‘to’, ‘ours’, ‘had’, ‘she’, ‘all’, ‘no’, ‘when’, ‘at’, ‘any’, ‘before’, ‘them’, ‘same’, ‘and’,
‘been’, ‘have’, ‘in’, ‘will’, ‘on’, ‘does’, ‘yourselves’, ‘then’, ‘that’, ‘because’, ‘what’, ‘over’,
‘why’, ‘so’, ‘can’, ‘did’, ‘not’, ‘now’, ‘under’, ‘he’, ‘you’, ‘herself’, ‘has’, ‘just’, ‘where’, ‘too’,
‘only’, ‘myself’, ‘which’, ‘those’, ‘i’, ‘after’, ‘few’, ‘whom’, ‘t’, ‘being’, ‘if’, ‘theirs’, ‘my’,
‘against’, ‘a’, ‘by’, ‘doing’, ‘it’, ‘how’, ‘further’, ‘was’, ‘here’, ‘than’}
Note: You can even modify the list by adding words of your choice in the english .txt. file in the
stopwords directory.
Ex:- removing stop words
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
example_sent = "This is a sample sentence, showing off the stop words filtration."
stop_words = set(stopwords.words('english'))
word_tokens = word_tokenize(example_sent)
filtered_sentence = [w for w in word_tokens if not w in stop_words]
filtered_sentence = []
for w in word_tokens:
if w not in stop_words:
filtered_sentence.append(w)
print(word_tokens)
print(filtered_sentence)
Output
['This', 'is', 'a', 'sample', 'sentence', ',', 'showing', 'off', 'the',
'stop', 'words', 'filtration', '.']
['This', 'sample', 'sentence', ',', 'showing', 'stop', 'words',
'filtration', '.']
Performing the Stopwords operations
in a file
import io
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
#word_tokenize accepts a string as an input, not a file.
stop_words = set(stopwords.words('english'))
file1 = open("text.txt")
line = file1.read()# Use this to read file content as a stream:
words = line.split()
for r in words:
if not r in stop_words:
appendFile = open('filteredtext.txt','a')
appendFile.write(" "+r)
appendFile.close()
Command Line arguments
The Python sys module provides access to any command-line arguments via thesys.argv. This
serves two purposes −
sys.argv is the list of command-line arguments.
len(sys.argv) is the number of command-line arguments.
Ex:-
import sys
print (“Number of arguments:”, len(sys.argv), “arguments.”)
Print(“Argument List:”, str(sys.argv))
If you pass this in a command line Number of arguments: 7 arguments.
$ python test.py arg1 arg2 arg3 Argument List:
['main.py','$','python','test.py',
'arg1','arg2','arg3']