Data engineering Flow-
Data engineering Flow-
Genration-Insertio-trasformation-serving.
Dats engineering is the bridge between date producers and data Consumers.
Here, Data Producers using different platforms produces data and Consumers uses it for
analysis and decision making.
Data engineering can be done on stramed data(kafka & Flink) as well as stored data.
like there are many platforms and apps get used nowadays and produces large amount of data,
so to manage that data data engineering is helpful.
computr vs Storage:
ETL -
E- Is the proccess of extraction of data or taking the data out of sources.
T- is nothing but transformation of data in more usable formats.
L- Loading data into storage.
On primices- company purchase their own networks and stored it on their premises.
• Hadoop - Allow data engineers to handle data on scale of terabyte and Petabyte.
Modern date stack : made up with collections of open sourse platforms and third party tools that
connet together.
Unstructured Data(ColumnBased)
Audio,video,images,log files
Deep learning, neural networks to detect large and micro features.
Event Strams-
Challenges:-
Messages come in asynchronously.
Ordering
Duplicacy of data,
Idempotemy: An operation is idempot when the same result comes out no matter how many
times you run it. (imp to manage Duplicate data)
Popular event streaming platform -AWA SQS, Amazon kinesis, Rabbit MQ. Kafka,
pulsar ,spark.
HDD- Hard disk drive - Traditional magnetic disk drives that have a rotating disk and arm.
Serialisation : Turning data into byte stream to easily save and transport it.
Serialise data into std. fromat which is sent around and deserialized on the receiving end.
When the system doesn't allow read operation until all the nodes with replicated data are
updated.
User read requests are not halted till all the replicas are updated rathe than upadate process is
eventual.
Some user might receive old data but eventually all the data is updated thr latest data.
ACID VS BASE-
ACID- Single machine, Strong Consistency
BASE- Distributed Consistency, Eventual Consistency
Storage systems:-