Untitled Document
Untitled Document
---
a. Python Version
- Windows: Python 3.10.11
- Ubuntu: Python 3.10.12
- Usage: Python is the base language used for various data processing libraries, machine
learning frameworks, and other utilities.
Example:
```bash
# Windows
(venv) PS C:\Users\Olive\IdeaProjects\Anmol> python --version
Python 3.10.11
# Ubuntu
$ python3 --version
Python 3.10.12
```
**Example**:
```bash
# List installed packages
(venv) PS C:\Users\Olive\IdeaProjects\Anmol> pip list
Package Version
---------- --------
fastavro 1.9.7
numpy 2.1.1
psutil 6.0.0
pyarrow 17.0.0
pyspark 3.5.2
```
#### a. **PyArrow**
- **Version**: 17.0.0
- **Usage**: In-memory data representation, highly efficient for data conversion between
different formats like Pandas and Spark.
**Example**:
```python
import pyarrow as pa
#### b. **Avro**
- **Version**: 1.10.2
- **Usage**: A data serialization system used for Hadoop and other systems, enabling data
exchange between different formats.
**Example**:
```python
import avro.schema
from avro.datafile import DataFileReader
from avro.io import DatumReader
#### c. **JSON**
- **Version**: Built-in Python library
- **Usage**: JSON is essential for data interchange between APIs and processing JSON
formatted data.
**Example**:
```python
import json
#### d. **PySpark**
- **Version**: 3.5.2
- **Usage**: PySpark is used for big data processing via Apache Spark, facilitating parallel
computation and large-scale analytics.
**Example**:
```python
from pyspark.sql import SparkSession
if __name__ == "__main__":
spark = SparkSession.builder \
.appName("Check Spark SQL Version") \
.master("local[*]").getOrCreate()
---
**Example**:
```bash
$ cloudera-manager-server --version
Cloudera Manager Server: 5.12.0
```
**Example**:
```bash
$ hadoop version
Hadoop 2.6.0-cdh5.12.0
```
**Example**:
```bash
$ yarn version
Hadoop 2.6.0-cdh5.12.0
```
**Example**:
```bash
$ hdfs version
HDFS 2.6.0-cdh5.12.0
```
**Example**:
```bash
$ hive --version
Hive 1.1.0-cdh5.12.0
```
**Example**:
```bash
$ sqoop version
Sqoop 1.4.6-cdh5.12.0
```
**Example**:
```bash
$ spark-submit --version
Spark 2.3.1
```
**Example**:
```bash
$ pig -version
Pig 0.12.0-cdh5.12.0
```
**Example**:
```bash
$ hbase version
HBase 1.2.0-cdh5.12.0
```
**Example**:
```bash
$ impala-shell --version
Impala Shell v2.9.0-cdh5.12.0
```
---
**Example**:
```bash
$ aws --version
aws-cli/1.16.188 Python/2.6.6
```
**Example**:
```bash
$ bash --version
GNU bash, version 4.1.2(1)-release (x86_64-redhat-linux-gnu)
```
**Example**:
```bash
$ lsb_release -a
Distributor ID: Ubuntu
Description: Ubuntu 22.04.3 LTS
Release: 22.04
Codename: jammy
```
---
This setup details all the essential components, versions, and examples for your development
environment, ensuring a comprehensive overview of the tools and technologies used across
different platforms. Let me know if you need further details or explanations!