0% found this document useful (0 votes)

6 views20 pages

UnixCommands Day1

Uploaded by

byron7cueva

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views20 pages

UnixCommands Day1

Uploaded by

byron7cueva

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

Unix commands for data

editing

Daniela Lourenco

BLUPF90 TEAM, 02/2023

Hands on…getting some data

cp –r /home/guest002/course/labs/lab1linux_une .

curl http://nce.ads.uga.edu/wiki/lib/exe/fetch.php?media=lab1Linux_une.zip -o
lab1linux.zip
Popular commands
head file prints first 10 lines
head -20 file prints first 20 lines
tail file prints last 10 lines
less file lists file line-by-line or page-by-page
less -S file lists file line-by-line or page-by-page without wrapping the text

wc –l file counts the number of lines

grep text file finds lines that contains text
cat file1 file2 concatenates files

sort sorts a file

cut cuts specific columns
join joins lines of two files on specific columns
paste pastes lines of two files
expand replaces TAB with spaces
uniq retains unique lines on a sorted file
head / tail
$ head pedigree.txt head -20 pedigree.txt

UGA42011 UGA41101 UGA34199

tail pedigree.txt
UGA42012 UGA41101 UGA38407
UGA42013 UGA41101 UGA39798
UGA42014 UGA41101 UGA37367
UGA42015 UGA41101 UGA40507
UGA42016 UGA41101 UGA34449
UGA42017 UGA41101 UGA37465
UGA42018 UGA41101 UGA40205
UGA42019 UGA41101 UGA37513
UGA42020 UGA41101 UGA34836
Genomics - huge volume of information
• Example 50kv2 (54609 SNP)
• For 104 individuals
• Illumina final report file:
• 5,679,346 records
• 302 MB

• Not efficient to read/edit with regular editors (vi, vim, gedit…)

less command
• Allows to view the content of file and move forward and backward
• For files with long lines use option –S (disable line wrapping)
less -S genotypes.txt
Counting lines/characters inside files
• Command wc counts the number of lines/words/bytes
wc genotypes.txt
2024 4048 91108336 genotypes.txt

• Number of lines of a file(s)

wc -l genotypes.txt pedigree.txt
2024 genotypes.txt
10000 pedigree.txt
12024 total
Concatenating files
Put content of file1 and file2 in output_file
cat file1 file2 > output_file

Add content of file3 to output_file using >> redirection

Append content at the end of the file

cat file3 >> output_file

paste / expand
paste merges files line by line with a TAB delimiter
expand replaces TAB with spaces
paste –d “ “ merges files line by line with a space delimiter

head file1 file2

paste file1 file 2 | head
1 a
==> file1 <== 2 b
1 3 c
2
3
paste -d “ ” file1 file 2 | head
==> file2 <==
a
b 1a
c 2b
3c
sort
• Sorts a file in alphanumeric order
• specifying which column should be sorted
sort –k 2,2 file4 > a or sort +1 -2 file4 > a
sort –k 1,1 file4 > b or sort +0 -1 file4 > b

• Sorts a file in numeric order

sort –nk 2,2 file4 > a or sort -n +1 -2 file4 > a
sort –nk 1,1 file4 > b or sort -n +0 -1 file4 > b

• Sorts a file in reverse numeric order

sort –nrk 2,2 file4 > a or sort -nr +1 -2 file4 > a

• Sorts based on column 1 then column 2

sort -k1,1 -k2,2 file4 > ab
join
• Merges two files by column 1 in both (they should be sorted)

join -1 1 -2 1 phenotypes.txt pedigree.txt > new_file

• Merges two files by column 1 in both (sorting at the same time)

join -1 1 -2 1 <(sort -k1,1 phenotypes.txt) <(sort –k1,1 pedigree.txt) > new_file
OR
join -1 1 -2 1 <(sort +0 -1 phenotypes.txt) <(sort +0 -1 pedigree.txt) > new_file

• Merges two files by column 1 but suppresses the joined output lines
join –v1 phenotypes.txt pedigree.txt > new_file
grep
• grep finds patterns within a file and lists all lines that match the pattern
grep UGA42014 pedigree.txt

• grep -v shows all lines that do not match the pattern

grep -v UGA pedigree.txt

• Pattern with spaces use -e

grep -e “pattern with spaces” file1
sed
• Sed is a stream editor
It reads input file and apply commands that match the pattern
• Substitution of a pattern
sed ‘s/pattern1/new pattern/g’ file > newfile
sed ‘s:pattern1:new pattern:g’ file > newfile
sed ‘s:UGA:DL:g’ pedigree.txt > dl.temp

• Substitution of a pattern in the same file

sed -i ‘s/pattern1/new pattern/g’ file

• Substitution of a pattern in a specific line (e.g., line 24)

sed ‘24s/pattern1/new pattern/’ file > newfile

• Deletes lines that contain “pattern to match”

sed '/pattern to match/d' file
awk
AWK is a language for text processing and typically used as a data extraction and reporting tool

Alfred Aho
Peter Weinberger
Brian Kernighan
awk

• Interpreted program language, that process data stream of a file line by line

• Very useful and fast command to work with text files

• Can be used as a database query program

• Selects specific columns or creates new ones
• Selects specific rows matching some criteria

• Can be used with if/else and for structures

awk Implicit variables
NF - number of fields
• Print column 1, and last of pedigree file NR - record number
FS - input field separator
awk '{print $1,$NF}' pedigree.txt > anim_dam.temp OFS - output field separator
• Print all columns:
awk '{print $0}' phenotypes.txt > all_phen.temp
• Print column 1 based on occurrence in column 2:
awk '{if ($3==2) print $1}' phenotypes.txt > fem.temp
• Print columns 3 and 4 skipping the first 1000 lines:
awk '{if (NR>1000) print $3,$4}' phenotypes.txt > part.temp
• Print length of column 2 from line 1:
awk '{if (NR==1) print length($2)}' genotypes.txt

• Process CSV files

awk 'BEGIN {FS=","} {print $1,$2,$3}' pedigree.txt > ped_out.temp
awk hash tables
• Arrays can be indexed by alphanumeric variables in an efficient way

• awk version to count progeny by sire

• sire id is column 2

awk '{ sire[$2]+=1} END { for (i in sire)

{print "Sire " i, sire[i]}}' pedigree.txt
awk
• awk can be used for pretty much anything related to data processing in Unix
file1
• Sum of elements in column 1
awk '{ sumf += $1 } END { print sumf}' file1
6

• Sum of squares of element in column 1

awk '{ sumf += $1*$1 } END { print sumf}' file1
14

• Average of elements in column 1

awk '{ sumf += $1 } END { print sumf/NR}' file1
2
uniq
• Command uniq lists all unique lines of a file
• Option –c counts the number of times each level occurs in a file

Example: counting progeny by sire in a pedigree file

awk '$2>0{print $2}' pedigree.txt | sort | uniq –c > s.temp

awk '{if ($2>0) print $2}' pedigree.txt | sort | uniq –c > s.temp
Useful commands for Linux
• Several tutorials on the WEB !!

• unixcombined.pdf from Misztal web site

• http://nce.ads.uga.edu/~ignacy/ads8200/unixcombined.pdf

• Online
• https://tldp.org/LDP/Bash-Beginners-Guide/Bash-Beginners-Guide.pdf

Advanced_Unix_Commands-tmp.pptx
No ratings yet
Advanced_Unix_Commands-tmp.pptx
30 pages
20250122101321691_02-advanced-unix-commands-notes_px4D2Ov
No ratings yet
20250122101321691_02-advanced-unix-commands-notes_px4D2Ov
8 pages
Combinatorial Analysis: Week 01
No ratings yet
Combinatorial Analysis: Week 01
14 pages
How to perform command
No ratings yet
How to perform command
8 pages
Week4 6
No ratings yet
Week4 6
7 pages
hostname and host configuration in linux
No ratings yet
hostname and host configuration in linux
17 pages
Linuxsuite 6
No ratings yet
Linuxsuite 6
55 pages
Bash Shell from Zero to Hero: An SRE's Practical Guide to Terminal Skills, Scripting, and Automation
From Everand
Bash Shell from Zero to Hero: An SRE's Practical Guide to Terminal Skills, Scripting, and Automation
Nolan Reeves
No ratings yet
LP Lab Part1
No ratings yet
LP Lab Part1
40 pages
Unix Commands
No ratings yet
Unix Commands
15 pages
Unix Commands
No ratings yet
Unix Commands
13 pages
Linux 6
No ratings yet
Linux 6
25 pages
SW LAB 10 Filter
No ratings yet
SW LAB 10 Filter
45 pages
Basic Commands
No ratings yet
Basic Commands
17 pages
UNIxnkv
No ratings yet
UNIxnkv
25 pages
Amjathfinal
No ratings yet
Amjathfinal
113 pages
Sodapdf Converted
No ratings yet
Sodapdf Converted
13 pages
Book Report: Bash Scripting
No ratings yet
Book Report: Bash Scripting
18 pages
LinuxFilters
No ratings yet
LinuxFilters
11 pages
ExpNo5 Updated
No ratings yet
ExpNo5 Updated
7 pages
Fundamentals of Computer Science
No ratings yet
Fundamentals of Computer Science
43 pages
Lab03.Processing Text Streams
No ratings yet
Lab03.Processing Text Streams
12 pages
Textprocessingutilities Awk Command: Used To Print The Output Based On Our Requirement
No ratings yet
Textprocessingutilities Awk Command: Used To Print The Output Based On Our Requirement
11 pages
Linux Filters
100% (1)
Linux Filters
18 pages
Pipingfile
No ratings yet
Pipingfile
11 pages
Systems Lab MCCS1.8 Cycle-1 1.unix Commands: A. Text Processing and Backup Utilities
No ratings yet
Systems Lab MCCS1.8 Cycle-1 1.unix Commands: A. Text Processing and Backup Utilities
66 pages
Module 5
No ratings yet
Module 5
13 pages
Unix Basic Commands
No ratings yet
Unix Basic Commands
6 pages
Log Parsing Thomas Roccia
No ratings yet
Log Parsing Thomas Roccia
2 pages
Scripting Language Lab 2
No ratings yet
Scripting Language Lab 2
8 pages
LINUX Exercises 5 To 10 Cavimbi Alfeu
No ratings yet
LINUX Exercises 5 To 10 Cavimbi Alfeu
19 pages
Sheets
No ratings yet
Sheets
5 pages
Linux Commands
No ratings yet
Linux Commands
6 pages
Introduction To UNIX-Workshop On Genomics 2024 Fix
No ratings yet
Introduction To UNIX-Workshop On Genomics 2024 Fix
41 pages
UNIX II:grep, Awk, Sed: October 30, 2017
No ratings yet
UNIX II:grep, Awk, Sed: October 30, 2017
26 pages
Unix Utilities: Grep, Sed, and Awk
100% (1)
Unix Utilities: Grep, Sed, and Awk
81 pages
Agricultural Sciences NSC Memo P2 Sept 2021 Eng
No ratings yet
Agricultural Sciences NSC Memo P2 Sept 2021 Eng
11 pages
Linux
No ratings yet
Linux
7 pages
Unix Commands
No ratings yet
Unix Commands
76 pages
UNIX Filters
No ratings yet
UNIX Filters
18 pages
Emerging Technologies 2024 Vision Report
100% (1)
Emerging Technologies 2024 Vision Report
47 pages
Linux Lab CSC 371L 2 - Merged
No ratings yet
Linux Lab CSC 371L 2 - Merged
23 pages
Burns Case Report
No ratings yet
Burns Case Report
19 pages
Linux Basics
No ratings yet
Linux Basics
25 pages
linux commands
No ratings yet
linux commands
9 pages
UNIT-4: Filters
No ratings yet
UNIT-4: Filters
30 pages
Assignment (2) Linux
No ratings yet
Assignment (2) Linux
6 pages
Nadi Astrology in Soundarya Lahari
86% (7)
Nadi Astrology in Soundarya Lahari
14 pages
Hi MO X10 Leaflet en 4th Revision 18ffd902ef
No ratings yet
Hi MO X10 Leaflet en 4th Revision 18ffd902ef
8 pages
The Manual 2024
No ratings yet
The Manual 2024
199 pages
Essential Skills For Bioinformatics
No ratings yet
Essential Skills For Bioinformatics
37 pages
Useful Shell Commands: Virginie Orgogozo March 2011
No ratings yet
Useful Shell Commands: Virginie Orgogozo March 2011
6 pages
Gapb 2024 Aaos 101 Day3 Performance Analysis Tuning
No ratings yet
Gapb 2024 Aaos 101 Day3 Performance Analysis Tuning
75 pages
Basic Filters & Pipes
No ratings yet
Basic Filters & Pipes
33 pages
mini-project report1
No ratings yet
mini-project report1
7 pages
Example: Unix Commands Man
No ratings yet
Example: Unix Commands Man
5 pages
UNIX Tutorial Two
No ratings yet
UNIX Tutorial Two
6 pages
UNIX Shell Scripting: Y.V.S Prasad
No ratings yet
UNIX Shell Scripting: Y.V.S Prasad
114 pages
Dhruv Pandit: Name: Class:Cba Enrolment No: Batch: Cse - 21
No ratings yet
Dhruv Pandit: Name: Class:Cba Enrolment No: Batch: Cse - 21
12 pages
Unix Beyond Basics
No ratings yet
Unix Beyond Basics
20 pages
LIPIcs ECRTS 2024 2
No ratings yet
LIPIcs ECRTS 2024 2
24 pages
2024 Game Manual
No ratings yet
2024 Game Manual
153 pages
PSV Circular 10 of 2024
No ratings yet
PSV Circular 10 of 2024
84 pages
327
No ratings yet
327
1 page
European Solidarity Corps Guide 2024 en
No ratings yet
European Solidarity Corps Guide 2024 en
112 pages
DAC List of ODA Recipients For Reporting 2024 25 Flows
No ratings yet
DAC List of ODA Recipients For Reporting 2024 25 Flows
1 page
1024 Ubihub AI Spec Sheet - 102320
No ratings yet
1024 Ubihub AI Spec Sheet - 102320
2 pages
Linux Tutorial ASP2024
No ratings yet
Linux Tutorial ASP2024
46 pages
Bash Ch01
No ratings yet
Bash Ch01
14 pages
Unix_and_AWK_Guide_POLISHED_FINAL_FIXED
No ratings yet
Unix_and_AWK_Guide_POLISHED_FINAL_FIXED
22 pages
Ref LML MFL71798829 01 230111 00 Web
No ratings yet
Ref LML MFL71798829 01 230111 00 Web
40 pages
State of Enterprise Linux 2024 89wyMvW
No ratings yet
State of Enterprise Linux 2024 89wyMvW
30 pages
Complete Unix and AWK Guide
No ratings yet
Complete Unix and AWK Guide
19 pages
Sedawk
No ratings yet
Sedawk
3 pages
Perfected_Unix_and_AWK_Guide
No ratings yet
Perfected_Unix_and_AWK_Guide
21 pages
AWK One Liners
No ratings yet
AWK One Liners
5 pages
Iom Global Appeal 2024 - Final
No ratings yet
Iom Global Appeal 2024 - Final
97 pages
Just eSIM Guide Android en
No ratings yet
Just eSIM Guide Android en
14 pages
2024 Mechanics Presentation
No ratings yet
2024 Mechanics Presentation
27 pages
Ansys Platform Support Strategy Plans January 2024
No ratings yet
Ansys Platform Support Strategy Plans January 2024
6 pages
DR Prashant Profile
No ratings yet
DR Prashant Profile
4 pages
Ucd 16480 P 2024
No ratings yet
Ucd 16480 P 2024
13 pages
Eoss24 Prospectus 011524a
No ratings yet
Eoss24 Prospectus 011524a
5 pages
Week 01 B
No ratings yet
Week 01 B
14 pages
StateofDesign2024 PDF V1 Compressed
No ratings yet
StateofDesign2024 PDF V1 Compressed
16 pages
Files:: Ls Ls - L Ls - A Esc K More Filename
No ratings yet
Files:: Ls Ls - L Ls - A Esc K More Filename
9 pages
CSC 150 Operating Systems Fundamentals UNIX 5.2024
No ratings yet
CSC 150 Operating Systems Fundamentals UNIX 5.2024
6 pages
2024 Oklahoma Youth Expo Ag Mechanics Rule Book A
No ratings yet
2024 Oklahoma Youth Expo Ag Mechanics Rule Book A
8 pages
Unix Commands: Simple UNIX Commands File Related Commands Directory Related Commands
No ratings yet
Unix Commands: Simple UNIX Commands File Related Commands Directory Related Commands
29 pages
13 - Digital Controller Design
No ratings yet
13 - Digital Controller Design
22 pages
Altea DCS - Flight Management
No ratings yet
Altea DCS - Flight Management
28 pages
International Encyclopedia of Geography - 2017 - Wang - Economic Geography Spatial Interaction
No ratings yet
International Encyclopedia of Geography - 2017 - Wang - Economic Geography Spatial Interaction
4 pages
2024 JAE Courses
No ratings yet
2024 JAE Courses
52 pages
Heat Exchanger Presentation
No ratings yet
Heat Exchanger Presentation
27 pages
Indikasi Endoskopi Saluran Cerna Atas Dan Persiapan Pasien
No ratings yet
Indikasi Endoskopi Saluran Cerna Atas Dan Persiapan Pasien
38 pages
School of Thought Handout
No ratings yet
School of Thought Handout
2 pages
PFM920I-6U-C Datasheet 20221020
No ratings yet
PFM920I-6U-C Datasheet 20221020
2 pages
Retiree Rates Tier 2
No ratings yet
Retiree Rates Tier 2
1 page
Interpretation of The Arterial Blood Gas
No ratings yet
Interpretation of The Arterial Blood Gas
29 pages
German Design Award 2024 - Programme
No ratings yet
German Design Award 2024 - Programme
2 pages
Maids Cart or Trolley
No ratings yet
Maids Cart or Trolley
20 pages
Tech Trend 2024 Report-2
No ratings yet
Tech Trend 2024 Report-2
11 pages
Mastering Shell Commands On Linux
From Everand
Mastering Shell Commands On Linux
Urko Galen
No ratings yet
Ap Chem Unit 3 Review Packet
No ratings yet
Ap Chem Unit 3 Review Packet
7 pages
Ten Big Trends 2024 infographic-MF-E
No ratings yet
Ten Big Trends 2024 infographic-MF-E
11 pages
Medicine Details
No ratings yet
Medicine Details
5 pages
Sufi Cosmology and Psychology
100% (1)
Sufi Cosmology and Psychology
7 pages
The Outer Presence (8073706)
75% (4)
The Outer Presence (8073706)
45 pages
Precast PCFQA Audit
100% (3)
Precast PCFQA Audit
14 pages
Bare Pump Drawing: Model 3700 11th Edition (ISO 13709) LA Size 6x8-13a API610
No ratings yet
Bare Pump Drawing: Model 3700 11th Edition (ISO 13709) LA Size 6x8-13a API610
1 page
Food Service Organization
No ratings yet
Food Service Organization
34 pages
Datasheet SmartSolar Charge Controller MPPT 250 60, 250 70, 250 85 & 250 100 EN
100% (1)
Datasheet SmartSolar Charge Controller MPPT 250 60, 250 70, 250 85 & 250 100 EN
1 page
Aqua Tabs
No ratings yet
Aqua Tabs
3 pages
Bad Latin Translation For Singers
No ratings yet
Bad Latin Translation For Singers
7 pages
Pe11 q2 Mod3 My-Fitness-Goals
100% (3)
Pe11 q2 Mod3 My-Fitness-Goals
29 pages

UnixCommands Day1

Uploaded by

UnixCommands Day1

Uploaded by

Unix commands for data

BLUPF90 TEAM, 02/2023

wc –l file counts the number of lines

sort sorts a file

UGA42011 UGA41101 UGA34199

• Not efficient to read/edit with regular editors (vi, vim, gedit…)

• Number of lines of a file(s)

Add content of file3 to output_file using >> redirection

cat file3 >> output_file

head file1 file2

• Sorts a file in numeric order

• Sorts a file in reverse numeric order

• Sorts based on column 1 then column 2

join -1 1 -2 1 phenotypes.txt pedigree.txt > new_file

• Merges two files by column 1 in both (sorting at the same time)

• grep -v shows all lines that do not match the pattern

• Pattern with spaces use -e

• Substitution of a pattern in the same file

• Substitution of a pattern in a specific line (e.g., line 24)

• Deletes lines that contain “pattern to match”

• Very useful and fast command to work with text files

• Can be used as a database query program

• Can be used with if/else and for structures

• Process CSV files

• awk version to count progeny by sire

awk '{ sire[$2]+=1} END { for (i in sire)

• Sum of squares of element in column 1

• Average of elements in column 1

Example: counting progeny by sire in a pedigree file

• unixcombined.pdf from Misztal web site

You might also like