0% found this document useful (0 votes)
15 views4 pages

Reading 1

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views4 pages

Reading 1

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

see few applications reside in Sybase actually.

So as we know, Sybase was the


outdated one we were involved to migrate the Sybase applications to Oracle. As we
know Oracle is the optimized one than Sybase so we had to create the objects from
table onwards like creating the column onwards we were involved, and whatever the
procedures present in the Sybase we were involved to migrate the same into Oracle
by some of the optimized techniques. Yes, yes, yes. Correct. It was like around one
and a half a year to two years. See, it was like so many applications reside in Sybase
language. So we were involved to migrate them, because they have to analyze the
project and we have to implement in Oracle and then to in development, and then to
production. And also then we have to work on defects. It will definitely take long time.
It's not like just database upgrading. It is like applications, we have to migrate from
Sybase to Oracle and we have to do testing with Java environment as well.
Application migrations, application migration, application migration. Front end was
Java. There were 20 applications. See, I'm just telling you clearly there were 20
around applications were there inside insight base their data was in and their front
end was in Java. So we need to migrate from one system to another system where we
need to integrate with Oracle with the Java actually. So the thing is, we have to
analyze the things what are the things we need to implement and all and then like
development and then production and then defects? It took that long? I don't know
why you are asking about like why
Since I started my career very late, like after four years gap, so I joined as a contract
employee. Like I have only SQL SQL and PL SQL and Azure like they have given me
the trainings on cloud computing. So I came from SQL background to Azure Azure
Blob Storage Azure Data Lake Gen to Azure Data Factory and Azure synapse, the
data warehouse and logic apps we have used for one of the triggering mail alerts and
data bricks we use should in basic level for validation purpose and Azure SQL
database. And like, they recently started giving the training on it. So we are just
having the subscription. So we just started doing like validation purpose before
running the pipeline. We need to check whether source has implemented some of the
changes in the source file or not. So with that, we just is loading that file into data
frame and checking on that. So I can say basic level still we are in learning phase
Yeah.
Okay in my project we are using Visual Studio 2022 where first we need to, from
remote to local, we have to capture the changes. And then like we have to create a
development feature branch over there. Under get solutions we have to go to commit
all option with the valid command. And then if you come down you will see one pull
request. So where you have to pull the changes from our feature branch to
development branch it will basically validate whether the same changes are done by
any other team members or not. Once we are
Once we are done with that, then when we go to DevOps boat over there repositories
under that, you can check the development branch pull request over there, so once
you create the pull request, like from development branch to staging branch or
master branch, then we can pull the changes from development to master branch like
the reviewer will review the changes and send the changes to master branch
and it will do it will create a continuous integration pipeline and a release pipeline
where itself it creates the artifacts like DAC pack file at the end which have the
summary of the changes and the code and infrastructure
We do have this scaler script handy like config variable and mount variable so we just
eat reusing the same thing. By changing the data store name and key values like
using access keys. We can perform this like using shared access signature as well.
And then we can read inside pi spark using DF equal to spark dot read dot CSV
followed by the path of the file. So as per coming to data bricks, I just know basic level
of the knowledge not in too much actually. So I, I know only basic level knowledge in
o in the case of Azure Data Factory flatten transformation is used to if you have a
column which is consist of multiple values and you want separate records in, in that
target then we can go with flat and transformations in Azure Data Factory.
Delta Lake is actually having the features of data warehouse and also like it can
provide schema versioning and also like preserve ants of schema, code versioning
and then like ACID properties on the transactions. Whereas data lake is actually you
can store I can say data lake holds the data of any type or any size and allows us to do
some analytics also. So it is like a two in one option storing and analyzing, so data
lakes stores, semi structured and structured and unstructured data. And the data
will be chunked if the data is greater than two GB and replicate in three different
regions so we can process the data parallely I can say delta Lake actually turns into
staging area I funny job space also
So, like, if it is like customized mapping we have to customize it but you said we have
multiple file so we cannot go and do manual inter intervention. So here we can
otherwise if the if you don't have any tables you can select the option Create table. So
itself it creates a table in the target side I'm not sure I'm not able to recollect it
actually usually if you yes yes we can do by using mapping data flows also.
Like using selector clauses we can do it all the columns will be fetched from the file,
whatever the columns present in the source file and will be fetched, and then we can
move them to the target. Actually the thing is when you have a SELECT clause which
is like first we need to access the source file correct inside the mapping data flow over
there. We have to keep the SELECT clause like which is copying the all the column
names from the source file. so here we can add a rule like under clicking like add
mapping and then we can select rule based mapping. So here like two inputs we have
to give the condition on which we have to match and name of the map column. So
both the values we have to give
Yeah, Azure Data Lake is actually see we can dump any type of data and irrespective
of any size of the data, it will store structured and semi structured and unstructured
data and we no need to bother about the data size or like cleaning of the data and
necessary data and all so the data will be divided if the data is larger than two GB and
stored in chunks format. Whereas data warehouse consists of the data in structured
format like tables, columns and rows. So data lakes support some of the analytical
tools as well. So it is providing a storage and analysis option like two in one option by
using Azure Data Lake Analytics and also like machine learning or odd scalar. We can
spark using this we can analyze the data in data lake
So if we are using Azure Data Factory to move the data from on premises to SQL
Server, then it would be like first we need to create a linkerd service with the swords
as the like on premise server. First, we need to download a self hosted integration
runtime on on premises and execute the dot exe file and copy paste the same key
while creating a linker service though. So then going forward, this particular linker
service will, this particular virtual machine will act as the self hosted integration
runtime to our pipeline. So basically, integration runtime provides a computational
power to maintain or create the infrastructure and manage the same. And then like,
we have to create an activity called Get metadata activity, which will fetch all the files
from that particular folder. And this, this files can be passed through the for each
activity by creating a parameter and adding the dynamic content as at the rate of
output dot activity of get metadata activity dot child item, which is like a child item
referring to the file name and has been passed through the for each and inside for
each we can keep copy data activity to capture the file name from the particular for
each by using a parameter called like item dot name. So it fetches the file name and
copies to the destination. So if you're configuring the target as blob storage, then we
have to create a linkerd service for the blob storage by having access key as a
authentication method. And then we can configure integration runtime would be as
your default integration runtime. And data set would be like CSV file format, where
we need to pick up that delimited text format while creating a data set.
Using group by function having count of customer ID is greater than one we can
achieve this. If the table is too huge then we can go with row number function as a
done ranking function
yes yes ranking functions see actually while creating a linkerd surveys if you are
writing the credentials over there it will definitely do the data bleach actually. So in
order to avoid that we are storing the credentials inside Key Vault and fetching them
dynamically. So, first we need to create a linked service with the key vault and then
like we can fetch the values so Key Vault stores the keys in encrypted form or like
cryptographic form for security purpose. First, we need to go to Key Vault and over
there import our there like upload manually. We have to upload the credentials
We are applying the transformations using mapping data flows like SELECT clause
derived column active transformations look up transformations group by functions,
aggregate functions and joins with X reference tables and also like abstraction and
encryption in the case of like very sensitive data
we do have locked table. So, at the end of the pipeline data will be recording into the
lock table. So, we will be fetching the results from the rock lock table using triggers
like normal scheduling trigger and tumbling window trigger and event based triggers
if we want to run the pipeline for periodic intervals of time, then we can go with this
and also we can set the dependency like if my pipeline is dependent on any other
pipeline data then we can go with this and also like if we want to fetch some of the
data for the past dates then we can go with that yes, we do we do have some
dependency and few jobs are like to like trigger at periodic intervals of time. Yes, like
we are getting some of the transactional files right. So, it would be like for every four
hours in a day, we are running it four times in a day for every four hours. So, we are
just using this tumbling window trigger. So, as soon as it triggers the pipeline will be
executed. So, we do have the buffer time within that period only. Yes event based
triggers are used to trigger the pipeline based on the arrival or the deletion of the file
but we are going with the arrival of the file. As soon as the suppose if we if we are
maintaining the sales data then like as soon as the sales files will be arrived in the
Blob Storage then our trigger will initiate the pipeline
link ID service is the connection string with the any of the data store. So while
creating the or like establishing the connection between any of the data store with
Azure Data Factory, we need to fill all the required fields for the link ID service like
authentication, authorization and subscription and data store name and integration
runtime, whereas data set is actually created for the source and destination to show
the exact location of the data which we are going to access with Azure Data Factory.
Going with the get meta data activity by passing that particular input folder it will
fetch all the files from that particular folder, but you can filter them using last
modified date under the settings of the source. So if you want to fetch only like today's
files, then we can go with that. If you have any wildcard pattern to fetch the file name,
then over there itself, wildcard pattern you can give it.
Yes, if we can get over there like dynamic content under the file name to capture that
one, by giving the concatenation operation of that particular file name by appending
the date of that particular day.
See under the file name, we have to give the dynamic content over there we can give
that one so, if you are getting multiple files correct in the gait meta data activity, back
that can be passed through he for each over there we can give the dynamic content
like at the rate of output dot activity of get meta data activity dot child item so here
child item means the file name, but you're fetching only the files which is matching
with the wildcard pattern with the given thing. So inside the wildcard pattern you
can give like star underscore date.
Using derived column transformations we can do it. Dangling window trigger is
actually to invoke the pipeline for frequent intervals of time suppose I want to trigger
the pipeline from seven to 9pm window then we can go with this so it will trigger the
same pipeline into multiple windows like a seven to eight and eight to 9pm. And also
we can fit some of the properties like window start time and window end time. And
also like we can fetch the data for the past dates as well. And also we can set the
dependency like my pipeline is dependent on any other pipeline then I can set the
dependency until our analysts that pipeline is triggered and then only. Not sure can
you put the same question in different way

You might also like