Big Data Security 20100BTCSDSI07268
Big Data Security 20100BTCSDSI07268
Practical No. 1
DISHA DHAMDHERE 20100BTCSDSI07268
Step 2: Click Download VirtualBox. It's a blue button in the middle of the page. Doing so will
open the downloads page.
DISHA DHAMDHERE 20100BTCSDSI07268
Step 3: Click Windows hosts. You'll see this link below the "VirtualBox 7.0.12 platform
packages" heading. The VirtualBox EXE file will begin downloading onto your computer.
Step 4: Open the VirtualBox EXE file. Go to the location to which the EXE file downloaded
and double-click the file. Doing so will open the VirtualBox installation window.
DISHA DHAMDHERE 20100BTCSDSI07268
Step 6: Click Finish when prompted. It's in the lower-right side of the window. Doing so will
close the installation window and open VirtualBox. Now that you've installed and opened
VirtualBox, you can create a virtual machine in order to run any operating system on your PC.
Make sure that you don't uncheck the "Start" box before doing this.
DISHA DHAMDHERE 20100BTCSDSI07268
Practical No. 2
Aim: Hadoop Installation and Configuration.
Prerequisites
1. Hardware Requirement
* RAM — Min. 8GB, if you have SSD in your system then 4GB RAM would also work.
* CPU — Min. Quad core, with at least 1.80GHz
2. JRE 1.8 — Offline installer for JRE
3. Java Development Kit — 1.8
4. A Software for Un-Zipping like 7Zip or Win Rar
* I will be using a 64-bit windows for the process, please check and download the version
supported by your system x86 or x64 for all the software.
5. Download Hadoop zip
* I am using Hadoop-2.9.2, you can use any other STABLE version for hadoop.
Once extracted, we would get a new file hadoop-2.9.2.tar. Now, once again we need to extract
this tar file.
1. Setting JAVA_HOME
Open environment Variable and click on “New” in “User Variable”.
DISHA DHAMDHERE 20100BTCSDSI07268
2. Setting HADOOP_HOME
1. Creating Folders
We need to create a folder data in the hadoop directory, and 2 sub
folders namenode and datanode.
DISHA DHAMDHERE 20100BTCSDSI07268
* core-site.xml
* hdfs-site.xml
* mapred-site.xml
* yarn-site.xml
* hadoop-env.cmd
1. Editing core-site.xml
Right click on the file, select edit and paste the following content within <configuration>
</configuration> tags.
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
2. Editing hdfs-site.xml
Right click on the file, select edit and paste the following content within
<configuration></configuration>tags.
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>PATH~1\namenode</value>
<final>true</final>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>PATH~2\datanode</value>
<final>true</final>
</property>
NOTE: - Also replace PATH~1 and PATH~2 with the path of namenode and datanode
folder that we created recently.
3. Editing mapred-site.xml
Right click on the file, select edit and paste the following content within
<configuration> </configuration> tags.
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
4. Editing yarn-site.xml
Right click on the file, select edit and paste the following content within <configuration>
</configuration> tags.
DISHA DHAMDHERE 20100BTCSDSI07268
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
Verifying hadoop-env.cmd
Right click on the file, select edit and check if the JAVA_HOME is set correctly or not.
We can replace the JAVA_HOME variable in the file with your actual JAVA_HOME that we
configured in the System Variable.
set JAVA_HOME=%JAVA_HOME%
OR
set JAVA_HOME="C:\Program Files\Java\jdk-21"
Replacing bin
Last step in configuring the hadoop is to download and replace the bin folder.
* Go to this GitHub Repo and download the bin folder as a zip.
* Extract the zip and copy all the files present under bin folder to %HADOOP_HOME%\bin.
Note:- If you are using different version of Hadoop then please search for its respective bin
folder and download it.
Launching Hadoop
This will open 4 new cmd windows running 4 different Daemons of hadoop:-
* Namenode
* Datanode
* Resourcemanager
* Nodemanager
DISHA DHAMDHERE 20100BTCSDSI07268
DISHA DHAMDHERE 20100BTCSDSI07268
Practical No. 3
Aim: Create a directory in Hadoop.
Theory:
mkdir: To create a directory. In Hadoop dfs there is no home directory by default. So let’s first
create it.
Syntax:
hdfs dfs -mkdir /folder name
Output:
At browser localhost:9870
DISHA DHAMDHERE 20100BTCSDSI07268
Practical No. 4
Aim: Commands of Linux Operating system.
Theory:
1. ls ⇒ directory listing
4. cd ⇒ change to home
5. pwd ⇒ shows current directory
6. mkdir dir ⇒ create a directory dir
7. rm file ⇒ delete the file
DISHA DHAMDHERE 20100BTCSDSI07268
12. cp -r dir1 dir2 ⇒ copy dir1 to dir2; create dir2 if it is not present.
DISHA DHAMDHERE 20100BTCSDSI07268
14. ln -s file
link ⇒
create a symbolic link to file
16. cat > file ⇒ places standard input into the file
20. tail -f file ⇒ output the contents of the file as it grows, starting with the
last 10 lines
Practical No. 5
DISHA DHAMDHERE 20100BTCSDSI07268
Step 3: To see all the databases present in the hive write command:
Syntax: -
hive(default)>show databases
Step 4: To use the database created in step 2 write the command:
Syntax: -
hive(default)>use name_of_database;
Step 5: For creating a table, use the following command:
Syntax: -
hive(name_of_database)> create table table_name
>(
DISHA DHAMDHERE 20100BTCSDSI07268
> id int,
> name string,
> city string
>);
Step 6: Table is created and to insert records in the table write command:
hive(name_of_database)> insert into table table_name
> values (101,'Ayush','Saxena');
Step 7: To display all records present in the table write the query:
>select * from table_name;
Practical No. 6
DISHA DHAMDHERE 20100BTCSDSI07268
Syntax: -
DROP (DATABASE|SCHEMA) [IF EXISTS] database_name [RESTRICT|CASCADE];
The default behaviour is RESTRICT, where DROP DATABASE will fail if the database is not
empty. To drop the tables in the database as well, use DROP DATABASE … with CASCADE
option.
Practical No. 7
DISHA DHAMDHERE 20100BTCSDSI07268
For example, to create a table called zipcodes with four columns (RecordNumber, Country,
City, and Zipcode) and partitioned by State, you can use the following HiveQL command1:
EXAMPLE 2
Practical No. 8
DISHA DHAMDHERE 20100BTCSDSI07268
EXAMPLE -1:
EXAMPLE 2:
Practical No. 9
Aim: Pig Commands
DISHA DHAMDHERE 20100BTCSDSI07268
Theory:
Apache Pig is a tool/platform for analyzing large datasets and performing extended data
operations. Pig is used with Hadoop. All pig scripts internally get converted into map-reduce
tasks and then get executed. It can handle structured, semi-structured, and unstructured data. Pig
stores its result in HDFS.
Programmers not good with Java usually struggle to write programs in Hadoop, i.e., writing
map-reduce tasks. Pig Latin, which is quite alike SQL language, is a boon for them. Its multi-
query approach reduces the length of the code.
So overall, it is a concise and effective way of programming. Pig Commands can invoke code in
many languages like JRuby, Jython, and Java.
Commands: -
1. Dump Command: This command is used to display all data loaded.
DISHA DHAMDHERE 20100BTCSDSI07268
6. Cogroup: This operator is used to group two databases using a particular column.
Left Outer: The left outer Join operation returns all rows from the left table, even if there
are no matches in the right relation.
Right Outer: The right outer join operation returns all rows from the right table, even if
there are no matches in the left table.
DISHA DHAMDHERE 20100BTCSDSI07268
Cross: The CROSS operator computes the cross-product of two or more relations.
DISHA DHAMDHERE 20100BTCSDSI07268
Practical No. 10
Aim: Exploring the IBM Guardium Interface.
Theory:
IBM Security Guardium is a comprehensive data security and protection platform. It’s designed
to safeguard sensitive data across a wide range of data environments, including databases, data
warehouses, cloud platforms, and big data environments.
IBM Security Guardium is part of a family of data security software in the IBM Security
portfolio. This includes Guardium Data Protection, which offers additional features such as near-
real-time threat response workflows, and automated compliance auditing and reporting. Another
product in the family is Guardium Data Encryption, which provides data encryption and key
management software.
Exploring IBM Guardium Interface
1. To access the Guardium GUI, log in with user labadmin and password guardium.
DISHA DHAMDHERE 20100BTCSDSI07268
2. The banner is the blue bar at the top of the interface. Perform the following tasks:
a) To view notifications, click the Notification icon. Disregard warning notices about
certificate expiration and missing Guardium DB Partitions.
b) To view items awaiting approval, click the To-Do List icon. The to-do list is empty and
there are no audit processes with pending results.
c) To view help, click the Help icon.
1. To view the Guardium production documentation, click Guardium Help.
2. Close the Guardium Help window.
3. To view the functions enabled and system information, click Help > About Guardium.
4. Close the About Guardium window.
d) To view the options, you can use to customize the look and feel of your account and
update additional account information, click the account list. The following series of
tasks will be done through this menu.
DISHA DHAMDHERE 20100BTCSDSI07268
1. To customize the navigation menu, click Customize. The Customize Navigation Menu
panel is shown. The Available Tools and Reports area shows available menu items and
the Navigation Menu area shows menu items in use.
2. To expand and collapse the Tools, click Tools.
3. To expand and collapse the Reports, click Reports.
4. To view the Setup menu items in use, click Setup > Quick Start.
5. To view the Tools and Views menu items in use, click Tools and Views and scroll down
to see all the items.
6. To close the Customize Navigation Menu panel, click Cancel.
7. To customize the user or role, click Customize User/Role.
8. To see the available menu items for a user like accessmgr, click user accessmgr.
9. To see the list of available roles, click the Roles tab.
10. To edit account details like password and email, go back to the account menu and click
Edit Account Details.
11. To close the account details window, click Cancel.
Practical No. 11
Aim: Setting up data classification.
To protect sensitive data, you must first identify and classify it.
Steps to classify data in your database environment.
You create a new classification policy that searches for credit card numbers and populates the
Sensitive Objects group with the table name and column name for each detected incident.
1. Use the Group Builder to view members of a group.
Before getting started, examine the current contents of the Sensitive Objects group in the
Group Builder.
a. In the left navigation menu, go to Setup > Tools and Views > Group Builder
(Legacy).
The Group Filter opens.
DISHA DHAMDHERE 20100BTCSDSI07268
b. Leave the fields blank and click Next. The Modify Existing Groups panel opens.
c. In the Modify Existing Groups list, scroll down, select Sensitive Objects, and
click the Edit icon ( ). The Manage Members for Selected Group panel opens.
d. Notice the default Sensitive Objects group members that are Guardium defaults.
List these members for comparison at the end of this lab.
e. When you are finished examining the contents of this group, scroll down and
click Back. You return to the Modify Existing Groups panel.
2. To add new members to the Sensitive Objects group, create a Classification Policy.
DISHA DHAMDHERE 20100BTCSDSI07268
a. In the left navigation menu, click Discover > Classifications > Classification
Policy Builder. The Classification Policy Finder panel opens.
b. To create a new classification policy, click the New icon ( ). The Classification
Policy Definition panel opens.
d. Click Apply.
3. Add a Search for Data rule to the policy.
a. Click Edit Rules.
The Classification Policy Rules panel opens.
The Classification Rule #1 For Classification Policy “Lab PCI Classification Policy” opens.
(Schema.Object).
f. Click Save. The panel named Classification Rule #1 For Classification Policy
“Lab PCI Classification Policy” opens again.
g. Scroll down to find an action is listed under Classification Rule Actions.
h. To return to the Classification Policy Rules panel, click Apply and click Back.
The new rule,
i. To return to the Classification Policy Finder panel, scroll down and click Back
again.
b. To create a Classification Process to run the Classifier Policy that was just
created, click the
New icon .
The Define Classification Process panel opens.
c. In the Process Description field, enter Lab PCI Classification Process.
d. In the Classification Policy list, select Lab PCI Classification Policy.
The datasource requires the operating system access credentials for the user
db2inst1
(on the database server). In the Password field, enter P@ssw0rd.
h. Scroll down in the datasource definition window and click Apply. You see a
notice that the datasource information has been saved.
DISHA DHAMDHERE 20100BTCSDSI07268
i. Now you test the datasource to ensure that Guardium can connect to the database
using the information in the datasource. Click Test Connection. A window
opens.
The Run Once Now and View Results buttons are enabled.
ii. In the left navigation menu, click Discover > Classifications >
Guardium Job Queue.
d. Verify that the job is either waiting in the queue, running, or completed.
Practical No. 12
Aim: Configure and run a vulnerability assessment
Steps to configure and run a database vulnerability assessment.
1. Ensure that the labadmin user has access to the vulnerability assessment tools.
a. To launch the Guardium GUI, double-click the Firefox icon on the desktop.
b. To access the Guardium GUI, log on as user accessmgr with password
guardium. The User Browser window opens.
c. To view the roles for user labadmin, click Roles. The Roles for Lab Admin form
opens.
DISHA DHAMDHERE 20100BTCSDSI07268
d. To enable vulnerability assessment for user labadmin, scroll down, select the
7. There is a classifier datasource you can use with this assessment. Select the
DISHA DHAMDHERE 20100BTCSDSI07268
9. You set up a Security Assessment and defined the database for it to use. However,
you did not specify which tests it should perform.
To configure the tests to perform, click Configure Tests. The Assessment Test
Selections window opens.
10. Scroll down to the Tests available for addition section.
13. To return to the Security Assessment Finder window, scroll down and click Return.
15. On the confirmation window that indicates the test is in the Guardium job queue,
click OK.
17. If the job does not have a status of Completed, click the Refresh icon ( ).
18. View a comprehensive report available through the Security Assessment Builder.
a. In the left navigation menu, go to Harden > Vulnerability
Assessment > Assessment Builder.
Lab_VA is auto-selected. Click View Results.
b. In the Show only window, select Fail from the Score column, and click Apply.
Note that the results are filtered to only show assessment failures.
20. To return to the browser version of the report, close the PDF.
Practical No. 13
Aim: Use the report to harden database and validate assessment.
Using Report to Harden the Database: -
1. In the security assessment report, scroll through the assessment test results.
2. Notice that for tests that failed, there are recommendations, including
suggested commands, to fix the vulnerability.
3. Scroll back up to find the third and fourth assessment tests with the following
names:
– No PUBLIC access to SYSCAT.AUDITPOLICIES and
SYSIBM.SYSAUDITPOLICIES
– No PUBLIC access to SYSCAT.AUDITUSE and SYSIBM.SYSAUDITUSE
Note the cause of failure and the recommendations, which include the
database commands to remediate the failures.
DISHA DHAMDHERE 20100BTCSDSI07268
4. To access the database server, close the report window, minimize the
Firefox browser, and double-click the PuTTY icon on the desktop.
5. To open the database server session, select Linux DB Server from the Saved
Sessions list, and click Open.
To login to the database server, type db2inst1 for the login name and type guardium for the
password.
8. Apply the recommendations from the two tests in step 3. There are often two
commands to run, separated by a period. In this case, run them as two
separate commands. This example shows one long command:
REVOKE ALL ON SYSCAT.AUDITPOLICIES FROM PUBLIC.
REVOKE ALL ON SYSIBM.SYSAUDITPOLICIES FROM PUBLIC
Instead, run each command separately and remove the periods at the end:
DISHA DHAMDHERE 20100BTCSDSI07268
11. On the confirmation window that indicates the test is in the Guardium job queue,
click OK.
12. In the left navigation menu, go to Harden > Vulnerability Assessment >
Guardium Job Queue.
13. If the job does not have a status of Completed, click the Refresh icon ( ).
14. To view the results, in the left navigation menu, go to Harden > Vulnerability
Assessment > Assessment Builder.
15. To view the results of the assessment, click View Results.
The result summary shows an improvement in the pass rate. The assessment result history graphs
the progress.
16. To filter the results to only show tests that have a status of Pass, click Filter / Sort
Controls.
DISHA DHAMDHERE 20100BTCSDSI07268
17. To configure the filter, select Pass from the Score column, and click Apply.
18. Scroll down and view the details of the vulnerabilities you addressed.
Practical No. 14
1. In the Navigation menu, click Discover > Classification > Discover Sensitive Data. The
Discover Sensitive Data pane opens.
7. To define the classification rules for discovery, click Next. The classification rules
section is displayed.
The classification rules for different types of credit cards are already populated, as part of
the PCI template.
When a rule name begins with guardium://CREDIT_CARD, and there is a valid credit
card number pattern in the Search Expression box, the classification policy uses the Luhn
algorithm, which is a widely used algorithm for validating identification numbers such as
credit card numbers. It also uses standard pattern matching.
Templates for universal patterns like credit card numbers and email addresses are
displayed for all Language menu selections.
8. Select the first classification rule, guardium://CREDIT_CARD credit card.
9. Click the Edit icon. The Edit Rule pane opens.
10. Click Next.
DISHA DHAMDHERE 20100BTCSDSI07268
You see the details of the rule criteria, such as the regular expression that is used to
search for credit card numbers and the types of objects (tables, views) where the search
occurs.
11. To see the actions associated with this rule, click Next. The Actions section is displayed.
The PCI template provides an action, which is to add the objects that the search finds to
the group PCI Cardholder Sensitive objects.
13. To configure the data sources where the discovery will run, click Next. The “Where to
Search” section is displayed.
In this section, we are choosing where the search for sensitive data runs. We can choose
one or more data sources or groups of data sources, as targets. In this, there is a single
data source in the Available data sources table. Before, we select the data source, we
test it to ensure it connects properly to the target database.
14. Select the osprey_db2inst1_DB2 data source and click the Edit icon. The “Update
datasource” window opens.
DISHA DHAMDHERE 20100BTCSDSI07268
16. To return to the Where to search section, close the “Update datasource” window.
17. Click the Move Right arrow to move osprey_db2inst1_DB2 to the list of selected
datasources.
Because you tested the datasource and the test was successful, it displays a green
checkmark icon in the Status column.
Practical No. 15
Aim: Refine discovery results.
Now we refine the results to exclude false positives that do not hold sensitive data. We assume
that the table named CC1 is a test table that does not hold sensitive data.
1. In the Review report section, click the Filter field, then type CC1and press Enter.
The report entries are filtered to show only four entries, which correspond to table name
CC1.
3. From the Add to Group drop-down menu, select Add to Group of Tables to Exclude.
The Select Excluded Group dialog box opens.
DISHA DHAMDHERE 20100BTCSDSI07268
4. Click the new group icon . The “Create new group” dialog box opens.
5. Enter the description Lab skip objects and click Save.
6. Close the informational message. The Select Exclude Group dialog box opens again.
7. To complete group selection and close the Select Exclude Group dialog box, click OK.
DISHA DHAMDHERE 20100BTCSDSI07268
8. Close the Success dialog. The Discover Sensitive Data pane is displayed again.
13. To edit the advanced options, click Show advanced options, and then scroll down.
14. In the Exclude Table field, enter Lab, and select Lab skip objects.
15. To save the rule, click Save. The Create New Discovery Scenario pane is displayed again.
Practical No. 16
DISHA DHAMDHERE 20100BTCSDSI07268
2. Click the Add icon. The New Receiver dialog box opens.
3. In the Role field, to filter roles, enter audit and select audit from the drop-down menu.
This choice allows any Guardium user with the audit user role to view the report.
4. Click Sign off. This selection means that the receiver must sign off on the report instead
of just viewing it.
DISHA DHAMDHERE 20100BTCSDSI07268
5. Click OK. The Audit table updates with the new receiver.
DISHA DHAMDHERE 20100BTCSDSI07268
Practical No. 17
Aim: Verify that the PCI Cardholder Sensitive Objects group is updated.
We verify that the sensitive tables that your discovery process finds are added to the appropriate
group.
1. Go to Protect > Security Policies > Group Builder.
2. To filter the entries, in the Filter field, type pci and press Enter.
3. Select the PCI Cardholder Sensitive objects group.
DISHA DHAMDHERE 20100BTCSDSI07268
The dialog box shows that the group is associated with your discovery. You can also
view which queries this group is associated with.
6. Close the group details dialog box.
DISHA DHAMDHERE 20100BTCSDSI07268
7. To view group members, select the group and click the Edit icon. Then, click the
Members tab.
Practical No. 18
Aim: Configure auto-discovery of subnet.
We configure Guardium to scan for new databases across your subnet, targeting specific ports
for probes.
1. Log in to the Guardium GUI with user admin and password guardium.
3. To create a database discovery process, click the New icon. The Auto-discovery Process
Builder page opens.
DISHA DHAMDHERE 20100BTCSDSI07268
4. To name the process, in the Process name field, type Discover Databases.
5. To save the process, click Apply.
6. To add the IP range to scan, in the Host(s) field, type 10.0.100.*.
7. To add the ports to scan, in the Port(s) field, type 1000-6000.
Complete the configuration of the hosts and ports to scan. You scan the entire 10.0.100 subnet for
hosts with open ports in the 1000-6000 range. To identify the type of databases that exist on the
hosts, Auto-discovery probes any discovered hosts with open ports within the range you set upt.
For the probe to run after the scan, do not clear the checkbox for the default, Run probe after
scan.
8. To add the host & port combination to the process, click Add scan.
9. To begin the database scan, click Run Once Now. You see a confirmation that the
process is active.
10. To close the confirmation window, click OK.
View the progress of the Auto-discovery process.
a. Scroll down.
b. Click Progress/Summary.
The Auto-discovery process progress page opens. This page details the current
progress by task within the process.
DISHA DHAMDHERE 20100BTCSDSI07268
11. To view the current progress of the task, expand the Hosts/Ports section.
Useful information is displayed, which details the progress of the task. At this point in the
scan, 21 host systems & 42 open ports are discovered.
12. To update the progress of the task, click Refresh. The process completes.
13. To view the run details of the task, expand the Hosts/Ports section.
15. View databases discovered by the scan: Click the User Interface Search field. Type datab. Press
Enter or click the Search Icon. Click any Discovered Databases report. The Discovered
Databases report opens and shows that nine databases across three hosts were found. The
types of databases found were MSSQL, MySQL, Postgres, Oracle, and Sybase. As
Database Auto-discovery runs, the Guardium collector sends a handshake to each open
port. If there is a database listening on that port, it responds in such a way that Guardium
is able to determine that it is Oracle, Db2, or any other supported database.
DISHA DHAMDHERE 20100BTCSDSI07268
Practical No. 19
Aim: Configure auto-discovery of specific hosts.
Scans of entire subnets or large ranges might take a long time to complete. Therefore, to be more
efficient, it is common to target specific hosts and ports that are known to have databases on
them.
We configure Guardium to scan for new databases, targeting specific hosts and ports. For each
host and port combination, we set up a target scan and add them to the Auto-discovery process.
We use port ranges that are known to be used by Db2 databases.
1. View the Auto-discovery configuration:
a. Click the Discover icon .
b. Go to Database Discovery > Auto-discovery Configuration. Thu Auto-Discovery
Process Selector page opens.
b. Click Progress/Summary.
DISHA DHAMDHERE 20100BTCSDSI07268
The Auto-discovery Process Page opens. The process is running and there is a task for each host
and port combination.
8. To view the progress of the host 10.0.100.197 / 50000-60000 task, expand the section.
9. To view the progress of the host 10.0.100.207 / 50000-60000 task, expand the section.
10. For a complete view of both tasks, scroll down.
11. After you review the details for each scan, click Refresh.
The scan is complete. Note that because you set up the scan to target specific hosts as
opposed to an entire subnet, like in the first scan, it takes seconds to complete.