Oozie

 Aug 8th 2020

**** oozie     ----Job sheduling tool used to schedule hadoop batch jobs----port 11000 (hive 10000, oozie 11000)--it need to sunmit a job on YARN

WHy oozie in hadoop?

exmplr :3 jobs need to run for every one hour

now every one hour hadoop admin can't run (sqoop import,hive query,sqoop export) cmd

How can  we automate these kind of requirements

diff types of jobs in oozie? 1.work flow jobs 2. co-ordinator jobs (real time) 3. bundle jobs

1.DAG -direct acyclc graphs 

2.consids of work flow jobs with time interval (seqence of actions)-executing set of jobs in fequnt interval of time(start,end,timezone,frequency)

3.combination of work flow and co-ordinator jobs

mainly used for scheduling batch jobs not real time jobs (batch jobs--jobs that are run  for the data that is alredy present in HDFS 9ex: hIVE query)

when you talk aboubt streaming jobs---live data coming from web servers,remote injestors,sys logs (kafka-spark jobs -procesing engine similar to YARN)

edureka.co/blog/apaxhe-oozie-tutorial

***list oozie jobs ?

oozie jobs

****OOzie deamons? ------oozie server (gate way installed in edge node  and oozie service installed in master server)

***dev write job.properties file and schedule oozie jobs (admin will not do this)     # oozie job --oozie .......

if you want to submit any oozie job ?---

1.crete job.properties file

2.wrok flow.xml file

3.submit job in HDFS

diff ways to access oozie? 1. terminal 2. Hue UI


YARN schedlders (fifo, fair, capacity) ---assign the resouces to (queue)job

oozie -job schedling --run job in particular time


tell me about your self ? --no personal things -speak more on technical skills--years of exp,roles and responsibilties

cluster architecture?--masters,slaves,total capacity,how much date, h/w configs

data flow in your project?from how (kafaka),where getting data,where you are exporting data for visulization (imp question)

hdfs,yarn,zk,hive,hbase,kafka,sqoop,oozie,spark

As an admin--ozzie, job types,cmds

No comments:

Post a Comment