Unit 1
Unit 1
Lucy Guy
Jack Barack
Myriam Maria
Florin William
Er John
Graph databases are used to store graph-based data and are queried with specialized query
languages such as SPARQL.
2: Retrieving data +
3: Data preparation +
4: Data exploration +
5: Data modeling +
1.4.6Scheduling tools
Scheduling tools help you automate repetitive tasks and trigger jobs based on events such as
adding a new file to a folder. These are similar to tools such as CRON on Linux.
1.4.7Benchmarking tools
This class of tools was developed to optimize your big data installation by providing
standardized profiling suites. A profiling suite is taken from a representative set of big data
jobs. Benchmarking and optimizing the big data infrastructure and configuration aren’t often
jobs for data scientists themselves but for a professional specialized in setting up IT
infrastructure.
1.4.8System deployment
Setting up a big data infrastructure isn’t an easy task and assisting engineers in deploying new
applications into the big data cluster is where system deployment tools shine.
1.4.9Service programming
We have no idea of the architecture or technology of everyone keen on using your predictions.
Service tools excel here by exposing big data applications to other applications as a service.
Data scientists sometimes need to expose their models through services. The best-known
example is the REST service; REST stands for representational state transfer.
1.4.10 Security
We probably need to have fine-grained control over the access to data but don’t want to
manage this on an application-by-application basis. Big data security tools allow you to have
central and fine-grained control over access to the data.
11