8000 TEZ-4631: Include an official script that installs hadoop and tez and runs a simple example DAG by abstractdog · Pull Request #414 · apache/tez · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

TEZ-4631: Include an official script that installs hadoop and tez and runs a simple example DAG #414

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

abstractdog
Copy link
Contributor
@abstractdog abstractdog commented May 16, 2025

Running a simple Tez example from the terminal can be challenging, especially since the installation guide isn’t always up to date. In the long term, providing both a clear web presence and a convenience script would be essential for maintaining the project’s health and earning users’ trust.

The introduced dev-support/bin folder structure follows the hadoop's one: https://github.com/apache/hadoop/tree/trunk/dev-support/bin

@abstractdog abstractdog requested a review from ayushtkn May 16, 2025 07:49
@tez-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 27m 25s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+0 🆗 shelldocs 0m 0s Shelldocs was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
_ master Compile Tests _
+0 🆗 mvndep 2m 16s Maven dependency ordering for branch
_ Patch Compile Tests _
+0 🆗 mvndep 0m 9s Maven dependency ordering for patch
+1 💚 codespell 0m 4s No new issues.
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 shellcheck 0m 0s No new issues.
_ Other Tests _
+0 🆗 asflicense 0m 0s ASF License check generated no output?
30m 13s
Subsystem Report/Notes
Docker ClientAPI=1.49 ServerAPI=1.49 base: https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-414/1/artifact/out/Dockerfile
GITHUB PR #414
Optional Tests dupname asflicense codespell detsecrets shellcheck shelldocs
uname Linux 45e3c0032967 5.15.0-136-generic #147-Ubuntu SMP Sat Mar 15 15:53:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality /home/jenkins/jenkins-home/workspace/tez-multibranch_PR-414/src/.yetus/personality.sh
git revision master / 85bdf17
Max. process+thread count 61 (vs. ulimit of 5500)
modules C: U:
Console output https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-414/1/console
versions git=2.34.1 maven=3.6.3 codespell=2.0.0 shellcheck=0.7.1
Powered by Apache Yetus 0.15.1 https://yetus.apache.org

This message was automatically generated.

Copy link
Member
@ayushtkn ayushtkn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to specify: mapreduce.framework.name as yarn as well?

for me earlier it never use to work unless, I specify export HADOOP_USER_CLASSPATH_FIRST=true

does it work for you without that, even BigTop had to add that
https://github.com/apache/bigtop/pull/1246/files#diff-f68b85f9302907e466b58d438376afb074df98fdbe571d30c188cd1767ff11eeR18

#$HADOOP_HOME/sbin/stop-dfs.sh
#$HADOOP_HOME/sbin/stop-yarn.sh

hdfs namenode -format
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you run it twice, usually if there was previous installation & if you run namenode -format, it asks for a prompt, are you sure you want to delete & you need to give Y

2025-05-16 13:43:11,940 INFO snapshot.SnapshotManager: SkipList is disabled
2025-05-16 13:43:11,942 INFO util.GSet: Computing capacity for map cachedBlocks
2025-05-16 13:43:11,942 INFO util.GSet: VM type       = 64-bit
2025-05-16 13:43:11,942 INFO util.GSet: 0.25% max memory 7.1 GB = 18.2 MB
2025-05-16 13:43:11,942 INFO util.GSet: capacity      = 2^21 = 2097152 entries
2025-05-16 13:43:11,949 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.window.num.buckets = 10
2025-05-16 13:43:11,949 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.num.users = 10
2025-05-16 13:43:11,949 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.windows.minutes = 1,5,25
2025-05-16 13:43:12,062 INFO namenode.FSNamesystem: Retry cache on namenode is enabled
2025-05-16 13:43:12,062 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis
2025-05-16 13:43:12,064 INFO util.GSet: Computing capacity for map NameNodeRetryCache
2025-05-16 13:43:12,064 INFO util.GSet: VM type       = 64-bit
2025-05-16 13:43:12,064 INFO util.GSet: 0.029999999329447746% max memory 7.1 GB = 2.2 MB
2025-05-16 13:43:12,064 INFO util.GSet: capacity      = 2^18 = 262144 entries
Re-format filesystem in Storage Directory root= /tmp/hadoop-ayushsaxena/dfs/name; location= null ? (Y or N) 


$HADOOP_HOME/sbin/start-dfs.sh
$HADOOP_HOME/sbin/start-yarn.sh

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we start historyserver as well. else once the Tez job is done you can't navigate to it via resourcemanager UI

 $HADOOP_HOME/bin/mapred --daemon start historyserver

Comment on lines +81 to +82
hadoop fs -mkdir /apps/
hadoop fs -mkdir /apps/tez-$TEZ_VERSION
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hadoop fs -mkdir -p /apps/tez-$TEZ_VERSION

hadoop fs -copyFromLocal words.txt /words.txt

# finally run the example
hadoop jar $TEZ_HOME/tez-examples-$TEZ_VERSION.jar orderedwordcount /words.txt /words_out
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we do yarn jar instead


# configure this if needed, by default it will use the latest stable versions in the current directory
export TEZ_VERSION=$(curl -s "https://downloads.apache.org/tez/" | grep -oP '\K[0-9]+\.[0-9]+\.[0-9]+(?=/)' | sort -V | tail -1) # e.g. 0.10.4
export HADOOP_VERSION=$(curl -s "https://downloads.apache.org/hadoop/common/" | grep -oP 'hadoop-\K[0-9]+\.[0-9]+\.[0-9]+(?=/)' | sort -V | tail -1) # e.g. 3.4.1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't the hadoop version should be from the pom? not always the latest version is gonna work with Tez

Comment on lines +14 to +17
cd $HADOOP_STACK_HOME
wget -nc https://archive.apache.org/dist/hadoop/common/hadoop-$HADOOP_VERSION/hadoop-$HADOOP_VERSION.tar.gz
wget -nc https://archive.apache.org/dist/tez/$TEZ_VERSION/apache-tez-$TEZ_VERSION-bin.tar.gz

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there some caching possible? like if it is already present we don't download it again

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0