0% found this document useful (0 votes)

5 views7 pages

Untitled Document

Uploaded by

Anmol Shubham

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views7 pages

Untitled Document

Uploaded by

Anmol Shubham

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Here is the fully detailed and complete version of your development environment setup, merging

all components and examples:

---

Development Environment Setup

1. Python and Libraries Setup

a. Python Version
- Windows: Python 3.10.11
- Ubuntu: Python 3.10.12
- Usage: Python is the base language used for various data processing libraries, machine
learning frameworks, and other utilities.

Example:

```bash
# Windows
(venv) PS C:\Users\Olive\IdeaProjects\Anmol> python --version
Python 3.10.11

# Ubuntu
$ python3 --version
Python 3.10.12
```

#### b. Installed Python Packages

- **Packages**: FastAvro, PyArrow, NumPy, psutil, PySpark, etc.
- **Usage**: These packages are critical for data analysis, system monitoring, and big data
processing.

**Example**:

```bash
# List installed packages
(venv) PS C:\Users\Olive\IdeaProjects\Anmol> pip list
Package Version
---------- --------
fastavro 1.9.7
numpy 2.1.1
psutil 6.0.0
pyarrow 17.0.0
pyspark 3.5.2
```

### 2. Python Libraries and Their Usage

#### a. **PyArrow**
- **Version**: 17.0.0
- **Usage**: In-memory data representation, highly efficient for data conversion between
different formats like Pandas and Spark.

**Example**:

```python
import pyarrow as pa

data = {'column1': [1, 2, 3], 'column2': ['A', 'B', 'C']}

table = pa.Table.from_pydict(data)
print(table)
```

#### b. **Avro**
- **Version**: 1.10.2
- **Usage**: A data serialization system used for Hadoop and other systems, enabling data
exchange between different formats.

**Example**:

```python
import avro.schema
from avro.datafile import DataFileReader
from avro.io import DatumReader

schema = avro.schema.parse(open("example.avsc", "rb").read())

with open("data.avro", "rb") as f:
reader = DataFileReader(f, DatumReader())
for record in reader:
print(record)
reader.close()
```

#### c. **JSON**
- **Version**: Built-in Python library
- **Usage**: JSON is essential for data interchange between APIs and processing JSON
formatted data.
**Example**:

```python
import json

# Create a dictionary and convert it to JSON format

data = {'name': 'Alice', 'age': 30}
json_data = json.dumps(data)
print(json_data)

# Load JSON back to Python dict

loaded_data = json.loads(json_data)
print(loaded_data)
```

#### d. **PySpark**
- **Version**: 3.5.2
- **Usage**: PySpark is used for big data processing via Apache Spark, facilitating parallel
computation and large-scale analytics.

**Example**:

```python
from pyspark.sql import SparkSession

if __name__ == "__main__":
spark = SparkSession.builder \
.appName("Check Spark SQL Version") \
.master("local[*]").getOrCreate()

# Print Spark SQL version

print("Spark SQL Version:", spark.version) # Output: Spark SQL Version: 3.5.2
spark.stop()
```

---

### 3. Cloudera Environment Setup

#### a. Cloudera Version

- **Version**: Cloudera 5.12.0
- **Usage**: Managing and deploying Hadoop-based data processing systems.

**Example**:
```bash
$ cloudera-manager-server --version
Cloudera Manager Server: 5.12.0
```

#### b. Hadoop Version

- **Version**: Hadoop 2.6.0-cdh5.12.0
- **Usage**: Distributed storage system for managing large datasets across multiple nodes.

**Example**:

```bash
$ hadoop version
Hadoop 2.6.0-cdh5.12.0
```

#### c. YARN Version

- **Version**: 2.6.0-cdh5.12.0
- **Usage**: Resource management and job scheduling in Hadoop.

**Example**:

```bash
$ yarn version
Hadoop 2.6.0-cdh5.12.0
```

#### d. HDFS Version

- **Version**: 2.6.0-cdh5.12.0
- **Usage**: Distributed file system for scalable data storage.

**Example**:

```bash
$ hdfs version
HDFS 2.6.0-cdh5.12.0
```

#### e. Hive Version

- **Version**: Hive 1.1.0-cdh5.12.0
- **Usage**: Data warehousing system for large-scale data analysis with SQL-like queries.

**Example**:
```bash
$ hive --version
Hive 1.1.0-cdh5.12.0
```

#### f. Sqoop Version

- **Version**: Sqoop 1.4.6-cdh5.12.0
- **Usage**: Tool to transfer data between Hadoop and relational databases.

**Example**:

```bash
$ sqoop version
Sqoop 1.4.6-cdh5.12.0
```

#### g. Spark Version

- **Version**: Spark 2.3.1
- **Usage**: Large-scale data processing, batch, and real-time analytics.

**Example**:

```bash
$ spark-submit --version
Spark 2.3.1
```

#### h. Pig Version

- **Version**: Pig 0.12.0-cdh5.12.0
- **Usage**: High-level data processing language for analyzing large datasets.

**Example**:

```bash
$ pig -version
Pig 0.12.0-cdh5.12.0
```

#### i. HBase Version

- **Version**: HBase 1.2.0-cdh5.12.0
- **Usage**: Non-relational, distributed database for large data storage with random access.

**Example**:
```bash
$ hbase version
HBase 1.2.0-cdh5.12.0
```

#### j. Impala Version

- **Version**: Impala Shell v2.9.0-cdh5.12.0
- **Usage**: SQL query execution engine for real-time, low-latency queries on HDFS data.

**Example**:

```bash
$ impala-shell --version
Impala Shell v2.9.0-cdh5.12.0
```

---

### 4. Additional Cloudera Environment Information

#### a. AWS CLI Version

- **Version**: aws-cli/1.16.188
- **Usage**: Managing AWS services from the command line.

**Example**:

```bash
$ aws --version
aws-cli/1.16.188 Python/2.6.6
```

#### b. Bash Version

- **Version**: GNU bash, version 4.1.2
- **Usage**: Command-line shell for executing commands and scripts.

**Example**:

```bash
$ bash --version
GNU bash, version 4.1.2(1)-release (x86_64-redhat-linux-gnu)
```

#### c. Linux Distribution Information

- **Linux**: Ubuntu 22.04.3 LTS
- **Usage**: Operating system details for system management and compatibility checks.

**Example**:

```bash
$ lsb_release -a
Distributor ID: Ubuntu
Description: Ubuntu 22.04.3 LTS
Release: 22.04
Codename: jammy
```

---

This setup details all the essential components, versions, and examples for your development
environment, ensuring a comprehensive overview of the tools and technologies used across
different platforms. Let me know if you need further details or explanations!

Cellebrite UFED OverviewGuide
No ratings yet
Cellebrite UFED OverviewGuide
266 pages
Festival Tycoon User Content Tutorial
100% (1)
Festival Tycoon User Content Tutorial
12 pages
Payroll System Ip
No ratings yet
Payroll System Ip
38 pages
BDA Experiment1
No ratings yet
BDA Experiment1
8 pages
Study Guide Cisco 300-735 SAUTO Automating and Programming Cisco Security Solutions Exam
From Everand
Study Guide Cisco 300-735 SAUTO Automating and Programming Cisco Security Solutions Exam
Anand Vemula
No ratings yet
Hadoop ECO System
No ratings yet
Hadoop ECO System
1 page
Evaluation of Some Windows and Linux Intrusion Detection Tools
From Everand
Evaluation of Some Windows and Linux Intrusion Detection Tools
Dr. Hedaya Alasooly
No ratings yet
Evaluation of Some Intrusion Detection and Vulnerability Assessment Tools
From Everand
Evaluation of Some Intrusion Detection and Vulnerability Assessment Tools
Dr. Hedaya Mahmood Alasooly
No ratings yet
End Exam Only Answers
No ratings yet
End Exam Only Answers
2 pages
Python Tutorials GM
No ratings yet
Python Tutorials GM
44 pages
Integration of Python With Hadoop and Spark
No ratings yet
Integration of Python With Hadoop and Spark
10 pages
DevOps - Course Structure
No ratings yet
DevOps - Course Structure
14 pages
Quick Python Guide
From Everand
Quick Python Guide
Coder1
No ratings yet
BDA Experiment 14 PDF
No ratings yet
BDA Experiment 14 PDF
77 pages
Asit Kumar Das - M5 SPARK
No ratings yet
Asit Kumar Das - M5 SPARK
24 pages
Dev&Microservices Tools
No ratings yet
Dev&Microservices Tools
1 page
Data Science With Python Workflow
No ratings yet
Data Science With Python Workflow
3 pages
Hadoop Ecosystem
No ratings yet
Hadoop Ecosystem
5 pages
6 Essential Python Libraries For DevOps and DevSecOps
No ratings yet
6 Essential Python Libraries For DevOps and DevSecOps
8 pages
Python-Deprecated Library v1.1 Documentation
From Everand
Python-Deprecated Library v1.1 Documentation
Laurent LAPORTE
No ratings yet
2.2. Components of Hadoop - Analysing
No ratings yet
2.2. Components of Hadoop - Analysing
16 pages
Mastering Python Programming: A Comprehensive Guide: The IT Collection
From Everand
Mastering Python Programming: A Comprehensive Guide: The IT Collection
Christopher Ford
5/5 (1)
Python 4 Data Science
No ratings yet
Python 4 Data Science
561 pages
Recommended Platform:: 4.1. Install Java 7 (Recommended Oracle Java)
No ratings yet
Recommended Platform:: 4.1. Install Java 7 (Recommended Oracle Java)
5 pages
Recommended Platform:: 4.1. Install Java 7 (Recommended Oracle Java)
No ratings yet
Recommended Platform:: 4.1. Install Java 7 (Recommended Oracle Java)
5 pages
How to a Developers Guide to 4k: Developer edition, #3
From Everand
How to a Developers Guide to 4k: Developer edition, #3
Xinc Cyberwizard
No ratings yet
DataGrokr Technical Assignment - Data Engineering - Internshala
No ratings yet
DataGrokr Technical Assignment - Data Engineering - Internshala
5 pages
Cheatsheets Pentru Python
No ratings yet
Cheatsheets Pentru Python
2 pages
Configuration of Apache Server to Support Asp
From Everand
Configuration of Apache Server to Support Asp
Dr. Hidaia Mahmood Alassouli
No ratings yet
Cloudera Developer Training For Spark and Hadoop
No ratings yet
Cloudera Developer Training For Spark and Hadoop
4 pages
The 30 Most Useful Python Libraries For Data Engineering - by ODSC - Open Data Science - Medium
No ratings yet
The 30 Most Useful Python Libraries For Data Engineering - by ODSC - Open Data Science - Medium
23 pages
S - Hadoop Ecosystem
No ratings yet
S - Hadoop Ecosystem
14 pages
Venu Data Engineering Training in Hyderabad 1
No ratings yet
Venu Data Engineering Training in Hyderabad 1
8 pages
Feature Store
No ratings yet
Feature Store
19 pages
Python AWS Data Engineering Course - Master PySpark, Kafka, SQL
No ratings yet
Python AWS Data Engineering Course - Master PySpark, Kafka, SQL
3 pages
Python and SQLite Development
From Everand
Python and SQLite Development
Agus Kurniawan
No ratings yet
Iouu
No ratings yet
Iouu
12 pages
Rewwww
No ratings yet
Rewwww
12 pages
Micro Project Report Format
No ratings yet
Micro Project Report Format
11 pages
Configuration of Apache Server To Support ASP
From Everand
Configuration of Apache Server To Support ASP
Dr. Hedaya Mahmood Alasooly
No ratings yet
Python for Beginners: An Introduction to Learn Python Programming with Tutorials and Hands-On Examples
From Everand
Python for Beginners: An Introduction to Learn Python Programming with Tutorials and Hands-On Examples
Nathan Metzler
4/5 (2)
Ewwww
No ratings yet
Ewwww
12 pages
Evaluation of Some Windows and Linux Intrusion Detection Tools
From Everand
Evaluation of Some Windows and Linux Intrusion Detection Tools
Dr. Hidaia Mahmood Alassouli
No ratings yet
Overview of Some Windows and Linux Intrusion Detection Tools
From Everand
Overview of Some Windows and Linux Intrusion Detection Tools
Dr. Hidaia Mahmood Alassouli
No ratings yet
Spark Python Install
No ratings yet
Spark Python Install
3 pages
Ben G Weber - Data Science in Production - Building Scalable Model Pipelines With Python-Independently Published (2020)
No ratings yet
Ben G Weber - Data Science in Production - Building Scalable Model Pipelines With Python-Independently Published (2020)
234 pages
Firebase Storage for Angular: A reliable file upload solution for your applications
From Everand
Firebase Storage for Angular: A reliable file upload solution for your applications
Abdelfattah Ragab
No ratings yet
Configuration of a Simple Samba File Server, Quota and Schedule Backup
From Everand
Configuration of a Simple Samba File Server, Quota and Schedule Backup
Dr. Hedaya Alasooly
No ratings yet
R Programming - a Comprehensive Guide: Software
From Everand
R Programming - a Comprehensive Guide: Software
Editor IJSMI
No ratings yet
Hadoop Distribution Systems
No ratings yet
Hadoop Distribution Systems
2 pages
Intro To Python and IDE
No ratings yet
Intro To Python and IDE
2 pages
Python Vibration Analysis
No ratings yet
Python Vibration Analysis
22 pages
13 Lecture
No ratings yet
13 Lecture
23 pages
DevOps. How to build pipelines with Jenkins, Docker container, AWS ECS, JDK 11, git and maven 3?
From Everand
DevOps. How to build pipelines with Jenkins, Docker container, AWS ECS, JDK 11, git and maven 3?
John Edward Cooper Berg
No ratings yet
DevOps Guide For Python
No ratings yet
DevOps Guide For Python
32 pages
Big Data Developer
No ratings yet
Big Data Developer
81 pages
Hadoop Ecosystem Tools Presentation
No ratings yet
Hadoop Ecosystem Tools Presentation
13 pages
Configuration of a Simple Samba File Server, Quota and Schedule Backup
From Everand
Configuration of a Simple Samba File Server, Quota and Schedule Backup
Dr. Hidaia Mahmood Alassouli
No ratings yet
BDA All 37 Answers Complete
No ratings yet
BDA All 37 Answers Complete
5 pages
Hadoop 1
No ratings yet
Hadoop 1
8 pages
Inspiring Powershell Articles
From Everand
Inspiring Powershell Articles
Murat Yildirimoglu
No ratings yet
A concise guide to PHP MySQL and Apache
From Everand
A concise guide to PHP MySQL and Apache
alasdair gilchrist
4/5 (2)
DcDesk 2000 - Revision History
No ratings yet
DcDesk 2000 - Revision History
5 pages
Microsoft Excel - Cells and Worksheets
No ratings yet
Microsoft Excel - Cells and Worksheets
4 pages
Cisco Pix Simulation On gns3
No ratings yet
Cisco Pix Simulation On gns3
3 pages
WPS101 Assignment No 1 Solution
No ratings yet
WPS101 Assignment No 1 Solution
7 pages
How To Install Aspen Hysys v9 1 PDF
No ratings yet
How To Install Aspen Hysys v9 1 PDF
21 pages
Compatibility OperationSystem and CP en
No ratings yet
Compatibility OperationSystem and CP en
1 page
Senior Account Assistant, Level 5
No ratings yet
Senior Account Assistant, Level 5
5 pages
Understanding Memory
No ratings yet
Understanding Memory
23 pages
Rocky Linux Admin Guide
No ratings yet
Rocky Linux Admin Guide
284 pages
Logfile
No ratings yet
Logfile
17 pages
Java Data Structure
No ratings yet
Java Data Structure
92 pages
System Requirement To Install Windows 7: 64-Bit (x64)
No ratings yet
System Requirement To Install Windows 7: 64-Bit (x64)
3 pages
Environment Configuration Lab Guide
No ratings yet
Environment Configuration Lab Guide
6 pages
Using PowerShell To Quickly Publish An App-V Package in ConfigMgr
No ratings yet
Using PowerShell To Quickly Publish An App-V Package in ConfigMgr
1 page
Get Most Out of Your VDI Investment With Nutanix Files
No ratings yet
Get Most Out of Your VDI Investment With Nutanix Files
17 pages
Export
No ratings yet
Export
13 pages
Office 365 Administrator Resume: Career Goal
No ratings yet
Office 365 Administrator Resume: Career Goal
2 pages
Overview of Computer Workshop: Unit-3, Lecture - 1
No ratings yet
Overview of Computer Workshop: Unit-3, Lecture - 1
13 pages
Abaqus SE 2018 InstallationGuide
No ratings yet
Abaqus SE 2018 InstallationGuide
14 pages
Lesson 1 - Catch Up Friday Reading Activity 1
No ratings yet
Lesson 1 - Catch Up Friday Reading Activity 1
2 pages
WinManual PDF
100% (1)
WinManual PDF
131 pages
2.9.2 Packet Tracer - Basic Switch and End Device Configuration - Physical Mode
No ratings yet
2.9.2 Packet Tracer - Basic Switch and End Device Configuration - Physical Mode
3 pages
Install KLMS, Webmin Dan Postfix
No ratings yet
Install KLMS, Webmin Dan Postfix
5 pages
PHP Reverse Shell
No ratings yet
PHP Reverse Shell
3 pages
6.4.1.2 Packet Tracer - Configure Initial Router Settings
No ratings yet
6.4.1.2 Packet Tracer - Configure Initial Router Settings
5 pages
Install
No ratings yet
Install
24 pages
1043380-Efficient Table Splitting For Oracle DB
No ratings yet
1043380-Efficient Table Splitting For Oracle DB
3 pages
Windows Desktop Search Administration Guide 3 Revb
No ratings yet
Windows Desktop Search Administration Guide 3 Revb
49 pages

Untitled Document

Uploaded by

Untitled Document

Uploaded by

Here is the fully detailed and complete version of your development environment setup, merging

all components and examples:

Development Environment Setup

1. Python and Libraries Setup

#### b. **Installed Python Packages**

### 2. Python Libraries and Their Usage

data = {'column1': [1, 2, 3], 'column2': ['A', 'B', 'C']}

schema = avro.schema.parse(open("example.avsc", "rb").read())

# Create a dictionary and convert it to JSON format

# Load JSON back to Python dict

# Print Spark SQL version

### 3. Cloudera Environment Setup

#### a. **Cloudera Version**

#### b. **Hadoop Version**

#### c. **YARN Version**

#### d. **HDFS Version**

#### e. **Hive Version**

#### f. **Sqoop Version**

#### g. **Spark Version**

#### h. **Pig Version**

#### i. **HBase Version**

#### j. **Impala Version**

### 4. Additional Cloudera Environment Information

#### a. **AWS CLI Version**

#### b. **Bash Version**

#### c. **Linux Distribution Information**

You might also like

#### b. Installed Python Packages

#### a. Cloudera Version

#### b. Hadoop Version

#### c. YARN Version

#### d. HDFS Version

#### e. Hive Version

#### f. Sqoop Version

#### g. Spark Version

#### h. Pig Version

#### i. HBase Version

#### j. Impala Version

#### a. AWS CLI Version

#### b. Bash Version

#### c. Linux Distribution Information