-
Notifications
You must be signed in to change notification settings - Fork 430
TEZ-4631: Include an official script that installs hadoop and tez and runs a simple example DAG #414
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
… runs a simple example DAG
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to specify: mapreduce.framework.name as yarn as well?
for me earlier it never use to work unless, I specify export HADOOP_USER_CLASSPATH_FIRST=true
does it work for you without that, even BigTop had to add that
https://github.com/apache/bigtop/pull/1246/files#diff-f68b85f9302907e466b58d438376afb074df98fdbe571d30c188cd1767ff11eeR18
#$HADOOP_HOME/sbin/stop-dfs.sh | ||
#$HADOOP_HOME/sbin/stop-yarn.sh | ||
|
||
hdfs namenode -format |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did you run it twice, usually if there was previous installation & if you run namenode -format, it asks for a prompt, are you sure you want to delete & you need to give Y
2025-05-16 13:43:11,940 INFO snapshot.SnapshotManager: SkipList is disabled
2025-05-16 13:43:11,942 INFO util.GSet: Computing capacity for map cachedBlocks
2025-05-16 13:43:11,942 INFO util.GSet: VM type = 64-bit
2025-05-16 13:43:11,942 INFO util.GSet: 0.25% max memory 7.1 GB = 18.2 MB
2025-05-16 13:43:11,942 INFO util.GSet: capacity = 2^21 = 2097152 entries
2025-05-16 13:43:11,949 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.window.num.buckets = 10
2025-05-16 13:43:11,949 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.num.users = 10
2025-05-16 13:43:11,949 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.windows.minutes = 1,5,25
2025-05-16 13:43:12,062 INFO namenode.FSNamesystem: Retry cache on namenode is enabled
2025-05-16 13:43:12,062 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis
2025-05-16 13:43:12,064 INFO util.GSet: Computing capacity for map NameNodeRetryCache
2025-05-16 13:43:12,064 INFO util.GSet: VM type = 64-bit
2025-05-16 13:43:12,064 INFO util.GSet: 0.029999999329447746% max memory 7.1 GB = 2.2 MB
2025-05-16 13:43:12,064 INFO util.GSet: capacity = 2^18 = 262144 entries
Re-format filesystem in Storage Directory root= /tmp/hadoop-ayushsaxena/dfs/name; location= null ? (Y or N)
|
||
$HADOOP_HOME/sbin/start-dfs.sh | ||
$HADOOP_HOME/sbin/start-yarn.sh | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we start historyserver as well. else once the Tez job is done you can't navigate to it via resourcemanager UI
$HADOOP_HOME/bin/mapred --daemon start historyserver
hadoop fs -mkdir /apps/ | ||
hadoop fs -mkdir /apps/tez-$TEZ_VERSION |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hadoop fs -mkdir -p /apps/tez-$TEZ_VERSION
hadoop fs -copyFromLocal words.txt /words.txt | ||
|
||
# finally run the example | ||
hadoop jar $TEZ_HOME/tez-examples-$TEZ_VERSION.jar orderedwordcount /words.txt /words_out |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we do yarn jar instead
|
||
# configure this if needed, by default it will use the latest stable versions in the current directory | ||
export TEZ_VERSION=$(curl -s "https://downloads.apache.org/tez/" | grep -oP '\K[0-9]+\.[0-9]+\.[0-9]+(?=/)' | sort -V | tail -1) # e.g. 0.10.4 | ||
export HADOOP_VERSION=$(curl -s "https://downloads.apache.org/hadoop/common/" | grep -oP 'hadoop-\K[0-9]+\.[0-9]+\.[0-9]+(?=/)' | sort -V | tail -1) # e.g. 3.4.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't the hadoop version should be from the pom? not always the latest version is gonna work with Tez
cd $HADOOP_STACK_HOME | ||
wget -nc https://archive.apache.org/dist/hadoop/common/hadoop-$HADOOP_VERSION/hadoop-$HADOOP_VERSION.tar.gz | ||
wget -nc https://archive.apache.org/dist/tez/$TEZ_VERSION/apache-tez-$TEZ_VERSION-bin.tar.gz | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there some caching possible? like if it is already present we don't download it again
Running a simple Tez example from the terminal can be challenging, especially since the installation guide isn’t always up to date. In the long term, providing both a clear web presence and a convenience script would be essential for maintaining the project’s health and earning users’ trust.
The introduced dev-support/bin folder structure follows the hadoop's one: https://github.com/apache/hadoop/tree/trunk/dev-support/bin