TOOL/Spark 2021. 4. 18. 14:32

Spark

Spark on Windows 10

Windows 환경에서 Spark를 사용할 때, HDFS 관련된 오류가 발생한다.

Windows에서는 winutils.exe 파일을 다운로드하여 특정위치에 위치시키고

HADOOP_HOME 환경변수를 설정해야하는 수고로움이 필요하다.

1. Error - 오류

Windows 에서 Spark를 실행시 다음 오류가 발생한다.

java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
    at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:382)
    at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:397)
    at org.apache.hadoop.util.Shell.<clinit>(Shell.java:390)
    at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:80)
    at org.apache.hadoop.security.SecurityUtil.getAuthenticationMethod(SecurityUtil.java:611)
    at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:274)
    at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:262)
    at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:807)
    at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:777)
    at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:650)
    at org.apache.spark.util.Utils$.$anonfun$getCurrentUserName$1(Utils.scala:2412)
    at scala.Option.getOrElse(Option.scala:189)
    at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2412)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:303)
    at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2555)
    at org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$1(SparkSession.scala:930)
    at scala.Option.getOrElse(Option.scala:189)
    at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:921)
    at com.tutorial.spark.SimpleApp.main(SimpleApp.java:10)

2. Problem - 문제

...

3. Solved - 해결

  1. Download winutils.exe to C:\hadoop\bin.

  2. Set Environment Variable

    • HADOOP_HOME
        C:\hadoop
    • Path
        %HADOOP_HOME%\bin
  3. run CMD or PowerShell with Privilige

4. Reference - 참조