0% found this document useful (0 votes)
90 views17 pages

Cloud Computing

This document discusses detecting data leakage from a data distributor to third party agents. It proposes strategies for distributing data across agents in a way that improves the ability to identify the source of any leaked data. These strategies include injecting fake records into the distributed data sets, without modifying real records, to act as watermarks. The goal is to assess the likelihood that leaked data came from one or more specific agents.

Uploaded by

Udit Kathpalia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
90 views17 pages

Cloud Computing

This document discusses detecting data leakage from a data distributor to third party agents. It proposes strategies for distributing data across agents in a way that improves the ability to identify the source of any leaked data. These strategies include injecting fake records into the distributed data sets, without modifying real records, to act as watermarks. The goal is to assess the likelihood that leaked data came from one or more specific agents.

Uploaded by

Udit Kathpalia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 17

Abstract—In the virtual and widely distributed

network, the process of handover sensitive data from


the distributor to the trusted third parties always
occurs regularly in this advanced world. It needs to
safeguard the security and durability of service based
on the demand of the users. A data distributor has
given sensitive data to a set of trusted agents (third
parties). Some of the data are leaked and found in an
unauthorized place (e.g., on the web or somebody’s
laptop). The distributor must assess the likelihood that
the leaked data came from one or more agents, as
opposed to having been independently gathered by
other means. We propose data allocation strategies
(across the agents) that improve the probability of
identifying leakages. These methods do not rely on
alterations of the released data (e.g., watermarks). In
some cases, we can also inject “realistic but fake” data
records to further improve our chances of detecting
leakage and identifying the guilty party. The idea of
modifying the data itself to detect the leakage is not a
new approach. Generally, the sensitive data are leaked
by the agents, and the specific agent is responsible for
the leaked data should always be detected at an early
stage. Thus, the detection of data from the distributor
to agents is mandatory. This project presents a data
leakage detection system using various allocation
strategies and assess the likelihood that the leaked
data came from one or more agents. For secure
transactions, allowing only authorized users to access
sensitive data through access control policies shall
prevent data leakage by sharing information only with
trusted parties and also the data should be detected
from leaking by means of adding fake record`s in the
data set and also improves the probability of
identifying leakages in the system. Then, finally it is
decided to implement this mechanism on cloud server.
INTRODUCTION
In this project report, I would develop a model for
finding the guilty agents. I also present algorithms for
distributing objects to agents, in a way that improves
the chances of identifying a leaker. Finally, I would also
consider the option of adding ―fake objects to the
distributed set. Such objects do not correspond to real
entities but appear realistic to the agents. In a sense,
the fake objects act as a type of watermark for the
entire set, without modifying any individual members.
If it turns out that an agent was given one or more fake
objects that were leaked, then the distributor can be
more confident that the agent was guilty. I also
consider optimization in which leaked data is
compared with original data and accordingly the third
party who leaked the data is guessed. I will also be
using approximation technique to encounter guilty
agents. The model gives the data allocation strategies
to improve the probability of identifying leakages. Also
there is application where there is a distributor, who
will be distributing and managing the files that contains
sensitive information to users when they send request.
The log is maintained for every request, which is later
used to find overlapping with the leaked file set and
the subjective risk and for Assessment of guilt
probability. Data leakage happens every day when
confidential business information such as customer or
patient data, source code or design specifications, price
lists, intellectual property and trade secrets, and
forecasts and budgets in spreadsheets are leaked out.
When these are leaked out it leaves the company
unprotected and goes outside the jurisdiction of the
corporation. This uncontrolled data leakage puts
business in a vulnerable position. Once this data is no
longer within the domain, then the company is at
serious risk. When cybercriminals cash out or sell this
data for earning profit it costs the organization money,
damages the competitive advantage, brand, and
reputation and destroys customer trust. To address
this problem, we develop a model for assessing the
guilt of agents.
The distributor will smartly give data to agents in order
to improve the chances of detecting a guilty agent like
adding the fake objects to distributed sets. At this point
the distributor can assess the likelihood that the leaked
data came from one or more agents, as opposed to
having been independently gathered by other means.
If the distributor finds enough evidence that an agent
leaked data then they may stop doing business with
him, or may initiate legal proceedings. Mainly it has
one constraint and one objective. The Distributor’s
constraint satisfies the agent, by providing number of
objects they request to satisfy their conditions.
LITERATURE SURVEY
The guilt detection approach present is related to the
data provenance problem- tracing the lineage of
objects implies essentially the detection of the guilty
agents and assume some prior knowledge on the way a
data view is created out of data sources. Objects and
sets are more general .As far as the data allocation
strategies are concerned; the work is mostly relevant
to watermarking that is used as a means of establishing
original ownership of distributed objects. Finally, there
are also lots of other works on mechanisms that allow
only authorized users to access sensitive data through
access control policies. Such approaches prevent in
some sense data leakage by sharing information only
with trusted parties. However, these policies are
restrictive and may make it impossible to satisfy
agent’s requests.
BASICS OF CLOUD COMPUTING
Key to the definition of cloud computing is the ―cloud
itself. For many purposes, The cloud is a large group of
interconnected computers. These computers can be
personal computers or network servers; they can be
public or private. For example, Google hosts a cloud
that consists of both smallish PCs and larger servers.
Google’s cloud is a private on(that is, Google owns it)
that is publicly accessible (by Google’s users). This
cloud of computers extends beyond a single company
or enterprise. The applications and data served by the
cloud are available to broad group of users, cross
enterprise and cross-platform. Access is via the
Internet. Any authorized user can access these docs
and apps from any computer over any Internet
connection. And, to the user, the technology and
infrastructure behind the cloud is invisible. It isn’t
apparent (and, in most cases doesn’t matter) whether
cloud services are based on HTTP, HTML, XML, Java
script, or other specific technologies. From Google’s
perspective, there are six key properties of cloud
computing:
1. Cloud Computing is user-centric. Once you as a
user are connected to the cloud, whatever is
stored there -- documents, messages, images,
applications, whatever – becomes yours. In
addition, not only is the data yours, but you can
also share it with others. In effect, any device that
accesses your data in the cloud also becomes
yours.
2. Cloud computing is task-centric. Instead of
focusing on the application and what it can do, the
focus is on what you need done and how the
application can do it for you., Traditional
applications—word processing, spreadsheets,
email, and so on – are becoming less important
than the documents they create.
3. Cloud computing is powerful. Connecting
hundreds or thousands of computers together in a
cloud creates a wealth of computing power
impossible with a single desktop PC.
4. Cloud computing is accessible. Because data is
stored in the cloud, users can instantly retrieve
more information from multiple repositories.
You’re not limited to a single source of data, as
you are with a desktop PC.
5. Cloud computing is intelligent. With all the various
data stored on the computers in the cloud, data
mining and analysis are necessary to access that
information in an intelligent manner.
6. Cloud computing is programmable. Many of the
tasks necessary with cloud computing must be
automated. For example, to protect the integrity
of the data, information stored on a single
computer in the cloud must be replicated on other
computers in the cloud. If that one computer goes
offline, the cloud’s programming automatically
redistributes that computer’s data to a new
computer in the cloud. Computing in the cloud
may provide additional infrastructure and
flexibility.
SOFTWARE & HARDWARE REQUIREMENTS
Memory leaks are a class of bugs where the application fails to release memory when
no longer needed. Over time, memory leaks affect the performance of both the
particular application as well as the operating system. A large leak might result in
unacceptable response times due to excessive paging. Eventually the application as
well as other parts of the operating system will experience failures.

Windows will free all memory allocated by the application on process termination, so
short-running applications will not affect overall system performance significantly.
However, leaks in long-running processes like services or even Explorer plug-ins can
greatly impact system reliability and might force the user to reboot Windows in order
to make the system usable again.

Applications can allocate memory on their behalf by multiple means. Each type of
allocation can result in a leak if not freed after use. Here are some examples of
common allocation patterns:

 Heap memory via the HeapAlloc function or its C/C++ runtime


equivalents malloc or new
 Direct allocations from the operating system via
the VirtualAlloc function.
 Kernel handles created via Kernel32 APIs such
as CreateFile, CreateEvent, or CreateThread, hold kernel memory on
behalf of the application
 GDI and USER handles created via User32 and Gdi32 APIs (by default,
each process has a quota of 10,000 handles)

Best Practices
Monitoring the resource consumption of your application over time is the first step
in detecting and diagnosing memory leaks. Use Windows Task Manager and add the
following columns: "Commit Size", "Handles", "User Objects", and "GDI Objects". This
will allow you to establish a baseline for your application and monitor resource usage
over time.
The following Microsoft tools provide more-detailed information and can help to
detect and diagnose leaks for the various allocation types in your application:

 Performance Monitor and Resource Monitor are part of Windows 7 and


can monitor and graph resource use over time
 The latest version of Application Verifier can diagnose heap leaks on
Windows 7
 UMDH, which is part of the Debugging Tools for Windows, analyzes the
heap memory allocations for a given process and can help find leaks and
other unusual usage patterns
 Xperf is a sophisticated performance analysis tool with support for heap
allocation traces
 CRT Debug Heap tracks heap allocations and can help build your own
heap debugging features

Certain coding and design practices can limit the number of leaks in your code.

 Use smart pointers in C++ code both for heap allocations as well as for
Win32 resources like kernel HANDLEs. The C++ Standard library
provides the auto_ptr class for heap allocations. For other allocation
types you will need to write your own classes. The ATL library provides a
rich set of classes for automatic resource management for both heap
objects and kernel handles
 Use compiler intrinsic features like _com_ptr_t to encapsulate your COM
interface pointers into "smart pointers" and assist with reference
counting. There are similar classes for other COM data
types: _bstr_t and _variant_t
 Monitor your .NET code unusual memory usage. Managed code is not
immune to memory leaks. See "Tracking down managed memory
leaks" on how to find GC leaks
 Be aware of leak patterns in web client-side code. Circular references
between COM objects and scripting engines like JScript can cause large
leaks in web applications. "Understanding and Solving Internet Explorer
Leak Patterns" has more information on these kinds of leaks. You can use
the JavaScript Memory Leak Detector to debug memory leaks in your
code. While Windows Internet Explorer 8, which is shipping with
Windows 7, mitigates most of these issues, older browsers are still
vulnerable to these bugs
 Avoid using multiple exit paths from a function. Allocations assigned to
variables at function scope should be freed in one particular block at the
end of the function
 Do not use exceptions in your code without freeing all local variables in
functions. If you use native exceptions, free all your allocations inside the
__finally block. If you use C++ exceptions, all your heap and handle
allocations need to be wrapped in smart pointers
 Do not discard or reinitialize a PROPVARIANT object without calling
the PropVariantClear function
Code Analysis Basics
 Before we go on, there are a few concepts that you should understand: “sources”,
“sinks”, and “data flow”. In code analysis speak, a “source” is the code that allows
a vulnerability to happen. Whereas a “sink” is where the vulnerability
actually happens.
 Take command injection vulnerabilities, for example. A “source” in this case could
be a function that takes in user input. Whereas the “sink” would be functions that
execute system commands. If the untrusted user input can get from “source” to
“sink” without proper sanitization or validation, there is a command injection
vulnerability. Many common vulnerabilities can be identified by tracking this “data
flow” from appropriate sources to corresponding sinks.

 Quick Start Hunting
 If you are short on time, focusing on a few issues can help you discover the most
common and severe issues.
 Start by searching for strings, keywords, and code patterns known to be indicators
for vulnerabilities or misconfiguration. For example, hardcoded credentials such as
API keys, encryption keys, and database passwords can be discovered by grepping
for keywords such as “key”, “secret”, “password”, or a regex search for hex or
base64 strings. Don’t forget to search in your git history for these strings as well.
 Unchecked use of dangerous functions and outdated dependencies are also a huge
source of bugs. Grep for dangerous functions and see if they are reachable by user-
controlled data. For example, you can search for strings like and “system()” and
“eval()” for potential command injections. Search through your dependencies to
see if any of them are outdated.

 Digging Deeper
 You can complement the above strategy with a more extensive source code review
if you have time.
 Focus on areas of code that deal with user input. User input locations such as HTTP
request parameters, HTTP headers, HTTP request paths, database entries, file
reads, and file uploads provide the entry points for attackers to exploit the
application’s vulnerabilities. Tracing data flow from these functions to
corresponding sinks can help you find common vulnerabilities such as stored-XSS,
SQL injections, shell uploads, and XXEs.
 Then, review code that performs critical functionalities in the application. This
includes code that deals with authorization, authentication, and other logic critical
to business functions. Look at the protection mechanisms implemented and see if
you can bypass them. At the same time, check how business and user data are
being transported. Is sensitive information transported and stored safely?
 Finally, look out for configuration issues specific to your application. Make sure
that your application is using secure settings according to best practices.

 Automate the process using ShiftLeft CORE


 As you can see, manual code review can be quite tedious and time-consuming.
Using SAST (Static Analysis Security Testing) tools is a great way to speed up the
process. Good SAST tools identify vulnerable patterns for you so that you can focus
on analyzing the impact and exploitability of the vulnerability.
 Now, let’s search for a vulnerability using Shift Left’s SAST tool, ShiftLeft CORE! We
will be analyzing the source code of an example application, shift left-java-demo.
Register for a free CORE account here. After you register, you will be taken to a
dashboard.

 From there, go to “Add App” on the top right, and select “Public and
Private repos”.

 After you authorize Shiftleft to access your Github repositories, and click on “Click
to see a list of your repositories”, you should see a list of your repos available for
analysis. If you choose to analyse one of your Github repos, all you have to do is
click on it and ShiftLeft will automatically import it for analysis.

 For now, we will be using a built-in demo app. So go to “Java > Demo”, and click
on “Next”.

 You should now see a new application on your dashboard! ShiftLeft is working hard
to find vulnerabilities in the application.

CONCLUSION
Data leakage is a silent type of threat. Your employee
as an insider can intentionally or accidentally leak
sensitive information. This sensitive information can
be electronically distributed via e-mail, Web sites,
FTP, instant messaging, spread sheets, databases, and
any other electronic means available – all without
your knowledge. To assess the risk of distributing
data two things are important, where first one is data
allocation strategy that helps to distribute the tuples
among customers with minimum overlap and second
one is calculating guilt probability which is based don
overlapping of his data set with the leaked data set.

You might also like