0% found this document useful (0 votes)
1 views

Linux_Commands_Developer_Data_Engineer

The document provides a comprehensive list of essential Linux commands for developers and data engineers, categorized into file and directory management, text processing, networking, data engineering-specific tools, process management, version control with Git, system monitoring, disk usage, archive and compression, and development utilities. Each command is accompanied by a brief description and example usage. This serves as a quick reference guide for performing various tasks in a Linux environment.

Uploaded by

studyhacks88
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views

Linux_Commands_Developer_Data_Engineer

The document provides a comprehensive list of essential Linux commands for developers and data engineers, categorized into file and directory management, text processing, networking, data engineering-specific tools, process management, version control with Git, system monitoring, disk usage, archive and compression, and development utilities. Each command is accompanied by a brief description and example usage. This serves as a quick reference guide for performing various tasks in a Linux environment.

Uploaded by

studyhacks88
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Linux Commands for Developers and Data Engineers

File and Directory Management

ls - Lists files and directories in the current directory. Example: ls -l shows

details.

cd - Changes the current directory. Example: cd /home navigates to the /home

directory.

pwd - Displays the current working directory.

mkdir - Creates a new directory. Example: mkdir project creates a folder named

'project'.

rm - Deletes files or directories. Example: rm file.txt deletes 'file.txt'. Use rm -r for

directories.

cp - Copies files or directories. Example: cp file1.txt file2.txt copies file1.txt to

file2.txt.

mv - Moves or renames files and directories. Example: mv old.txt new.txt renames

old.txt to new.txt.

find - Searches files and directories. Example: find / -name file.txt looks for 'file.txt'.

Text Processing (Critical for Data Engineering)

cat - Displays file contents. Example: cat file.txt shows the content of 'file.txt'.

less - Views file content page by page. Example: less file.txt.

grep - Searches for patterns in files. Example: grep 'error' log.txt finds 'error' in

log.txt.

awk - Processes and analyzes text data. Example: awk '{print $1}' file.txt prints the

first column.
sed - Performs text substitution and manipulation. Example: sed 's/old/new/g'

file.txt replaces 'old' with 'new'.

cut - Extracts specific columns from files. Example: cut -d',' -f2 file.csv extracts the

second column.

sort - Sorts file contents. Example: sort file.txt sorts lines alphabetically.

uniq - Removes duplicate lines. Example: uniq file.txt outputs unique lines.

wc - Counts lines, words, or characters. Example: wc -l file.txt counts lines.

Networking

ping - Tests connectivity to a host. Example: ping google.com.

curl - Fetches data from URLs. Example: curl http://example.com downloads the

page content.

wget - Downloads files from the internet. Example: wget http://example.com/file.zip.

scp - Securely copies files between servers. Example: scp file.txt user@host:/path

transfers file.txt.

netstat - Displays network connections, routing tables, etc.

ss - Shows detailed network statistics. Example: ss -tuln displays listening ports.

ftp - Transfers files using the FTP protocol. Example: ftp hostname.

Data Engineering-Specific Tools

hdfs dfs - Manages Hadoop Distributed File System (HDFS). Example: hdfs dfs -ls /

lists HDFS contents.

spark-submit - Submits Spark jobs. Example: spark-submit app.py runs a PySpark

application.
sqoop - Transfers data between Hadoop and relational databases.

kafka-console-producer
- Publishes messages to a Kafka topic.

kafka-console-consumer
- Reads messages from a Kafka topic.

flume-ng - Configures Flume agents to ingest data streams.

Process Management

ps - Displays current running processes. Example: ps aux shows all processes

with details.

top - Displays real-time system resource usage and running processes.

htop - An interactive process viewer (similar to top).

kill - Terminates a process by its PID. Example: kill 1234 kills the process with

PID 1234.

bg - Resumes a suspended job in the background.

fg - Resumes a job in the foreground.

Version Control (Git)

git init - Initializes a new Git repository.

git clone - Clones an existing repository. Example: git clone <repo_url>.

git add - Stages changes for commit. Example: git add file.txt.

git commit - Commits staged changes. Example: git commit -m 'message'.

git push - Pushes changes to a remote repository. Example: git push origin main.

git pull - Fetches and merges changes from a remote repository.

System Monitoring and Disk Usage


df - Displays disk space usage. Example: df -h shows human-readable disk

usage.

du - Shows directory size. Example: du -sh /home gives the size of /home.

free - Displays memory usage. Example: free -h shows human-readable memory

usage.

uptime - Shows how long the system has been running.

Archive and Compression

tar - Archives files. Example: tar -cvf archive.tar file.txt creates an archive.

gzip - Compresses files. Example: gzip file.txt compresses 'file.txt'.

gunzip - Decompresses files. Example: gunzip file.txt.gz decompresses 'file.txt.gz'.

Development Utilities

vim - Edits text files in the terminal. Example: vim file.txt opens 'file.txt' for editing.

nano - A simple text editor. Example: nano file.txt opens 'file.txt' for editing.

ssh - Connects to remote servers securely. Example: ssh user@host.

screen - Allows detached terminal sessions. Example: screen starts a session.

You might also like