Aug 8th 2020
**** oozie ----Job sheduling tool used to schedule hadoop batch jobs----port 11000 (hive 10000, oozie 11000)--it need to sunmit a job on YARN
WHy oozie in hadoop?
exmplr :3 jobs need to run for every one hour
now every one hour hadoop admin can't run (sqoop import,hive query,sqoop export) cmd
How can we automate these kind of requirements
diff types of jobs in oozie? 1.work flow jobs 2. co-ordinator jobs (real time) 3. bundle jobs
1.DAG -direct acyclc graphs
2.consids of work flow jobs with time interval (seqence of actions)-executing set of jobs in fequnt interval of time(start,end,timezone,frequency)
3.combination of work flow and co-ordinator jobs
mainly used for scheduling batch jobs not real time jobs (batch jobs--jobs that are run for the data that is alredy present in HDFS 9ex: hIVE query)
when you talk aboubt streaming jobs---live data coming from web servers,remote injestors,sys logs (kafka-spark jobs -procesing engine similar to YARN)
edureka.co/blog/apaxhe-oozie-tutorial
***list oozie jobs ?
oozie jobs
****OOzie deamons? ------oozie server (gate way installed in edge node and oozie service installed in master server)
***dev write job.properties file and schedule oozie jobs (admin will not do this) # oozie job --oozie .......
if you want to submit any oozie job ?---
1.crete job.properties file
2.wrok flow.xml file
3.submit job in HDFS
diff ways to access oozie? 1. terminal 2. Hue UI
YARN schedlders (fifo, fair, capacity) ---assign the resouces to (queue)job
oozie -job schedling --run job in particular time
tell me about your self ? --no personal things -speak more on technical skills--years of exp,roles and responsibilties
cluster architecture?--masters,slaves,total capacity,how much date, h/w configs
data flow in your project?from how (kafaka),where getting data,where you are exporting data for visulization (imp question)
hdfs,yarn,zk,hive,hbase,kafka,sqoop,oozie,spark
As an admin--ozzie, job types,cmds
No comments:
Post a Comment