TOOL/Spark
[Spark] on Windows 10 - 윈도우10에서 스파크 실행 시 필수 설정.
forgiveall
2021. 4. 18. 14:32
Spark
Spark on Windows 10
Windows 환경에서 Spark를 사용할 때, HDFS 관련된 오류
가 발생한다.
Windows에서는 winutils.exe
파일을 다운로드하여 특정위치에 위치시키고
HADOOP_HOME
환경변수를 설정해야하는 수고로움이 필요하다.
1. Error - 오류
Windows 에서 Spark를 실행시 다음 오류가 발생한다.
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:382)
at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:397)
at org.apache.hadoop.util.Shell.<clinit>(Shell.java:390)
at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:80)
at org.apache.hadoop.security.SecurityUtil.getAuthenticationMethod(SecurityUtil.java:611)
at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:274)
at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:262)
at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:807)
at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:777)
at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:650)
at org.apache.spark.util.Utils$.$anonfun$getCurrentUserName$1(Utils.scala:2412)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2412)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:303)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2555)
at org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$1(SparkSession.scala:930)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:921)
at com.tutorial.spark.SimpleApp.main(SimpleApp.java:10)
2. Problem - 문제
...
3. Solved - 해결
Download
winutils.exe
to C:\hadoop\bin.Set Environment Variable
HADOOP_HOME
C:\hadoop
Path
%HADOOP_HOME%\bin
run CMD or PowerShell with Privilige
4. Reference - 참조
- How To Install Apache Spark On Windows 10: https://phoenixnap.com/kb/install-spark-on-windows-10
- Problems running Hadoop on Windows: https://cwiki.apache.org/confluence/display/HADOOP2/WindowsProblems
- winutils/hadoop-2.7.1/bin/: https://github.com/steveloughran/winutils/tree/master/hadoop-2.7.1/bin
- 윈도우에서 spark-shell 실행하기: https://javacan.tistory.com/entry/%EC%9C%88%EB%8F%84%EC%9A%B0%EC%97%90%EC%84%9C-sparkshell-%EC%8B%A4%ED%96%89%ED%95%98%EA%B8%B0
- 아파치 스파크 (Apache Spark) 설치 - Window 10 환경: https://alphahackerhan.tistory.com/9