TEZ-4631: Include an official script that installs hadoop and tez and runs a simple example DAG #414

abstractdog · 2025-05-16T07:46:47Z

Running a simple Tez example from the terminal can be challenging, especially since the installation guide isn’t always up to date. In the long term, providing both a clear web presence and a convenience script would be essential for maintaining the project’s health and earning users’ trust.

The introduced dev-support/bin folder structure follows the hadoop's one: https://github.com/apache/hadoop/tree/trunk/dev-support/bin

… runs a simple example DAG

ayushtkn

Do we need to specify: mapreduce.framework.name as yarn as well?

for me earlier it never use to work unless, I specify export HADOOP_USER_CLASSPATH_FIRST=true

does it work for you without that, even BigTop had to add that
https://github.com/apache/bigtop/pull/1246/files#diff-f68b85f9302907e466b58d438376afb074df98fdbe571d30c188cd1767ff11eeR18

ayushtkn · 2025-05-16T08:13:34Z

dev-support/bin/tez_run_example.sh

+#$HADOOP_HOME/sbin/stop-dfs.sh
+#$HADOOP_HOME/sbin/stop-yarn.sh
+
+hdfs namenode -format


did you run it twice, usually if there was previous installation & if you run namenode -format, it asks for a prompt, are you sure you want to delete & you need to give Y

2025-05-16 13:43:11,940 INFO snapshot.SnapshotManager: SkipList is disabled 2025-05-16 13:43:11,942 INFO util.GSet: Computing capacity for map cachedBlocks 2025-05-16 13:43:11,942 INFO util.GSet: VM type = 64-bit 2025-05-16 13:43:11,942 INFO util.GSet: 0.25% max memory 7.1 GB = 18.2 MB 2025-05-16 13:43:11,942 INFO util.GSet: capacity = 2^21 = 2097152 entries 2025-05-16 13:43:11,949 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.window.num.buckets = 10 2025-05-16 13:43:11,949 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.num.users = 10 2025-05-16 13:43:11,949 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.windows.minutes = 1,5,25 2025-05-16 13:43:12,062 INFO namenode.FSNamesystem: Retry cache on namenode is enabled 2025-05-16 13:43:12,062 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis 2025-05-16 13:43:12,064 INFO util.GSet: Computing capacity for map NameNodeRetryCache 2025-05-16 13:43:12,064 INFO util.GSet: VM type = 64-bit 2025-05-16 13:43:12,064 INFO util.GSet: 0.029999999329447746% max memory 7.1 GB = 2.2 MB 2025-05-16 13:43:12,064 INFO util.GSet: capacity = 2^18 = 262144 entries Re-format filesystem in Storage Directory root= /tmp/hadoop-ayushsaxena/dfs/name; location= null ? (Y or N)

there is a " -force" option of namenode format, let me try

-force worked

dev-support/bin/tez_run_example.sh

ayushtkn · 2025-05-16T08:15:37Z

dev-support/bin/tez_run_example.sh

+hadoop fs -mkdir /apps/
+hadoop fs -mkdir /apps/tez-$TEZ_VERSION


hadoop fs -mkdir -p /apps/tez-$TEZ_VERSION

ack, will do

ayushtkn · 2025-05-16T08:16:18Z

dev-support/bin/tez_run_example.sh

+hadoop fs -copyFromLocal words.txt /words.txt
+
+# finally run the example
+hadoop jar $TEZ_HOME/tez-examples-$TEZ_VERSION.jar orderedwordcount /words.txt /words_out


can we do yarn jar instead

I haven't used yarn executable so far, fine with changing, but for the record: what advantages does it have?

AFAIK for YARN it should be yarn jar
https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/bin/hadoop.cmd#L186-L190

if you have yarn opts defined and all, it would shooting a warning as well. hadoop jar was for MR job, though it doesn't fail for Tez job today

ayushtkn · 2025-05-16T08:19:49Z

dev-support/bin/tez_run_example.sh

+
+# configure this if needed, by default it will use the latest stable versions in the current directory
+export TEZ_VERSION=$(curl -s "https://downloads.apache.org/tez/" | grep -oP '\K[0-9]+\.[0-9]+\.[0-9]+(?=/)' | sort -V | tail -1) # e.g. 0.10.4
+export HADOOP_VERSION=$(curl -s "https://downloads.apache.org/hadoop/common/" | grep -oP 'hadoop-\K[0-9]+\.[0-9]+\.[0-9]+(?=/)' | sort -V | tail -1) # e.g. 3.4.1


shouldn't the hadoop version should be from the pom? not always the latest version is gonna work with Tez

good question, depends on what we want to achieve with this script, here is what I can think of:

get hadoop from the tez pom.xml as you adviced

both HADOOP_VERSION and TEZ_VERSION could be used from env if already defined (making the user able to define any for random experience)

I am like if the user defines it use it else get it from the POM, I believe that is what the Hive docker build script does that as well.

yeah, makes sense, let me do the same

ayushtkn · 2025-05-16T08:20:25Z

dev-support/bin/tez_run_example.sh

+cd $HADOOP_STACK_HOME
+wget -nc https://a
8000
rchive.apache.org/dist/hadoop/common/hadoop-$HADOOP_VERSION/hadoop-$HADOOP_VERSION.tar.gz
+wget -nc https://archive.apache.org/dist/tez/$TEZ_VERSION/apache-tez-$TEZ_VERSION-bin.tar.gz
+


is there some caching possible? like if it is already present we don't download it again

-nc (--no-clobber) is exactly what takes care of this

abstractdog · 2025-05-18T05:36:20Z

Do we need to specify: mapreduce.framework.name as yarn as well?

for me earlier it never use to work unless, I specify export HADOOP_USER_CLASSPATH_FIRST=true

does it work for you without that, even BigTop had to add that https://github.com/apache/bigtop/pull/1246/files#diff-f68b85f9302907e466b58d438376afb074df98fdbe571d30c188cd1767ff11eeR18

yeah, I can see this workaround happening everywhere, but here, it has just worked OOTB, maybe a certain state of defining ENV vars like HADOOP_CLASSPATH? I don't know
what about:

I'm playing with it if I can reproduce their problems
can you try the script on your side if it works? if the script works without the additional export for you too, we might want to publish it as is, proving that no further classpath hack are needed

let me check mapreduce.framework.name as well, for me, the script simply ran a Tez DAG, so I haven't configured anything more...but this is really interesting, I'll discover

abstractdog · 2025-05-20T08:38:07Z

Do we need to specify: mapreduce.framework.name as yarn as well?
for me earlier it never use to work unless, I specify export HADOOP_USER_CLASSPATH_FIRST=true
does it work for you without that, even BigTop had to add that https://github.com/apache/bigtop/pull/1246/files#diff-f68b85f9302907e466b58d438376afb074df98fdbe571d30c188cd1767ff11eeR18

yeah, I can see this workaround happening everywhere, but here, it has just worked OOTB, maybe a certain state of defining ENV vars like HADOOP_CLASSPATH? I don't know what about:

I'm playing with it if I can reproduce their problems

can you try the script on your side if it works? if the script works without the additional export for you too, we might want to publish it as is, proving that no further classpath hack are needed

let me check mapreduce.framework.name as well, for me, the script simply ran a Tez DAG, so I haven't configured anything more...but this is really interesting, I'll discover

wow, that's indeed needed, otherwise I get exotic exception like

java.lang.IllegalAccessError: tried to access field com.google.protobuf.AbstractMessage.memoizedSize from class org.apache.tez.dag.api.records.DAGProtos$ConfigurationProto
	at org.apache.tez.dag.api.records.DAGProtos$ConfigurationProto.getSerializedSize(DAGProtos.java:21080)
	at com.google.protobuf.AbstractMessageLite.writeTo(AbstractMessageLite.java:75)
	at org.apache.tez.common.TezUtils.writeConfInPB(TezUtils.java:162)
	at org.apache.tez.common.TezUtils.createByteStringFromConf(TezUtils.java:82)
	at org.apache.tez.mapreduce.hadoop.MRInputHelpers.createMRInputPayload(MRInputHelpers.java:717)
	at org.apache.tez.mapreduce.input.MRInput$MRInputHelpersInternal.createMRInputPayload(MRInput.java:712)
	at org.apache.tez.mapreduce.input.MRInput$MRInputConfigBuilder.createGeneratorDataSource(MRInput.java:336)
	at org.apache.tez.mapreduce.input.MRInput$MRInputConfigBuilder.build(MRInput.java:266)
	at org.apache.tez.examples.OrderedWordCount.createDAG(OrderedWordCount.java:130)
	at org.apache.tez.examples.OrderedWordCount.runJob(OrderedWordCount.java:200)
	at org.apache.tez.examples.TezExampleBase._execute(TezExampleBase.java:245)
	at org.apache.tez.examples.TezExampleBase.run(TezExampleBase.java:126)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:82)
	at org.apache.tez.examples.OrderedWordCount.main(OrderedWordCount.java:208)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
	at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
	at org.apache.tez.examples.ExampleDriver.main(ExampleDriver.java:51)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hadoop.util.RunJar.run(RunJar.java:328)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:241)

adding this export right before the Tez DAG submission

maybe it ran successfully before because I ran the steps one-by-one and I had environment exports, so messed up something/anything

tez-yetus · 2025-05-20T10:16:16Z

🎊 +1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	28m 58s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 0s		No case conflicting files found.
+0 🆗	detsecrets	0m 0s		detect-secrets was not available.
+0 🆗	shelldocs	0m 0s		Shelldocs was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
			_ master Compile Tests _
+0 🆗	mvndep	2m 23s		Maven dependency ordering for branch
			_ Patch Compile Tests _
+0 🆗	mvndep	0m 8s		Maven dependency ordering for patch
+1 💚	codespell	0m 4s		No new issues.
+1 💚	blanks	0m 0s		The patch has no blanks issues.
+1 💚	shellcheck	0m 0s		No new issues.
			_ Other Tests _
+0 🆗	asflicense	0m 0s		ASF License check generated no output?
		31m 54s

Subsystem	Report/Notes
Docker	ClientAPI=1.49 ServerAPI=1.49 base: https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-414/2/artifact/out/Dockerfile
GITHUB PR	#414
Optional Tests	dupname asflicense codespell detsecrets shellcheck shelldocs
uname	Linux 301ac7039060 5.15.0-136-generic #147-Ubuntu SMP Sat Mar 15 15:53:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	/home/jenkins/jenkins-home/workspace/tez-multibranch_PR-414/src/.yetus/personality.sh
git revision	master / `bd94d8b`
Max. process+thread count	60 (vs. ulimit of 5500)
modules	C: U:
Console output	https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-414/2/console
versions	git=2.34.1 maven=3.6.3 codespell=2.0.0 shellcheck=0.7.1
Powered by	Apache Yetus 0.15.1 https://yetus.apache.org

This message was automatically generated.

ayushtkn

Tried Locally & it works for me.

LGTM

… runs a simple example DAG (apache#414) - addendum ASF license

… runs a simple example DAG (#414) - addendum ASF license + shellcheck fixes (#417) (Laszlo Bodor reviewed by Ayush Saxena)

TEZ-4631: Include an official script that installs hadoop and tez and…

85bdf17

… runs a simple example DAG

abstractdog requested a review from ayushtkn May 16, 2025 07:49

This comment was marked as outdated.

Sign in to view

ayushtkn reviewed May 16, 2025

View reviewed changes

improvements + PR comments

bd94d8b

abstractdog requested a review from ayushtkn May 21, 2025 12:22

ayushtkn approved these changes May 22, 2025

View reviewed changes

abstractdog merged commit e847435 into apache:master May 28, 2025
2 checks passed

abstractdog added a commit to abstractdog/tez that referenced this pull request Jun 3, 2025

TEZ-4631: Include an official script that installs hadoop and tez and…

e0656fe

… runs a simple example DAG (apache#414) - addendum ASF license

abstractdog mentioned this pull request Jun 3, 2025

TEZ-4631: Include an official script that installs hadoop and tez andruns a simple example DAG - addendum ASF license + shellcheck fixes #417

Merged

abstractdog added a commit to abstractdog/tez that referenced this pull request Jun 3, 2025

TEZ-4631: Include an official script that installs hadoop and tez and…

5be0073

… runs a simple example DAG (apache#414) - addendum ASF license

abstractdog added a commit to abstractdog/tez that referenced this pull request Jun 3, 2025

TEZ-4631: Include an official script that installs hadoop and tez and…

2357704

… runs a simple example DAG (apache#414) - addendum ASF license

abstractdog added a commit to abstractdog/tez that referenced this pull request Jun 3, 2025

TEZ-4631: Include an official script that installs hadoop and tez and…

6ea0d62

… runs a simple example DAG (apache#414) - addendum ASF license

abstractdog added a commit that referenced this pull request Jun 4, 2025

TEZ-4631: Include an official script that installs hadoop and tez and…

b0a65ec

… runs a simple example DAG (#414) - addendum ASF license + shellcheck fixes (#417) (Laszlo Bodor reviewed by Ayush Saxena)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

TEZ-4631: Include an official script that installs hadoop and tez and runs a simple example DAG #414

TEZ-4631: Include an official script that installs hadoop and tez and runs a simple example DAG #414

Uh oh!

This comment was marked as outdated.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

		hadoop fs -mkdir /apps/
		hadoop fs -mkdir /apps/tez-$TEZ_VERSION

TEZ-4631: Include an official script that installs hadoop and tez and runs a simple example DAG #414

TEZ-4631: Include an official script that installs hadoop and tez and runs a simple example DAG #414

Conversation

Uh oh!

Uh oh!

This comment was marked as outdated.

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!