코알못

[pyspark] UNSUPPORTED_SUBQUERY_EXPRESSION_CATEGORY 본문

ETC

[pyspark] UNSUPPORTED_SUBQUERY_EXPRESSION_CATEGORY

코린이s 2023. 9. 14. 10:05
728x90

오류

pyspark.errors.exceptions.captured.AnalysisException: [UNSUPPORTED_SUBQUERY_EXPRESSION_CATEGORY.MUST_AGGREGATE_CORRELATED_SCALAR_SUBQUERY] Unsupported subquery expression: Correlated scalar subqueries must be aggregated to return at most one row.;

서브쿼리에 행이 1개 이상 반환될수 있는 쿼리라 아래와 같이 수정

- 이전 

from pyspark.sql import SparkSession
import dateutil.tz as tz
from datetime import timedelta,datetime
from dateutil.relativedelta import relativedelta
import sys
argument=sys.argv

kst = tz.gettz('Asia/Seoul')
ksttime = datetime.now(tz=kst)
today = ksttime.astimezone(kst)
if len(argument) > 1:
    today=datetime.strptime(argument[1],'%Y-%m-%d')
yesterday = today - timedelta(days=1)
day_before_yesterday = today - timedelta(days=2)
ago_month = today + relativedelta(months=-1)

appname="song_analysis_daily"
spark = SparkSession.builder.appName(appname).config("spark.home", "/usr/lib/spark").config("spark.sql.debug.maxToStringFields","2000").enableHiveSupport().getOrCreate()
TODAY="2023-09-14"
query=f"""select a,
(    
    SELECT  b
    FROM    TB_DATA AS D
    WHERE   A.d = D.d
) AS like_cnt,
c
FROM TB_TEST AS A
WHERE TODAY={TODAY}
"""


- 변경 : 쿼리에 행 하나 나올 수 있도록 FIRST 함수 추가

from pyspark.sql import SparkSession
import dateutil.tz as tz
from datetime import timedelta,datetime
from dateutil.relativedelta import relativedelta
import sys
argument=sys.argv

kst = tz.gettz('Asia/Seoul')
ksttime = datetime.now(tz=kst)
today = ksttime.astimezone(kst)
if len(argument) > 1:
    today=datetime.strptime(argument[1],'%Y-%m-%d')
yesterday = today - timedelta(days=1)
day_before_yesterday = today - timedelta(days=2)
ago_month = today + relativedelta(months=-1)

appname="song_analysis_daily"
spark = SparkSession.builder.appName(appname).config("spark.home", "/usr/lib/spark").config("spark.sql.debug.maxToStringFields","2000").enableHiveSupport().getOrCreate()
TODAY="2023-09-14"
query=f"""select a,
(    
    SELECT  FIRST(b)
    FROM    TB_DATA AS D
    WHERE   A.d = D.d
) AS like_cnt,
c
FROM TB_TEST AS A
WHERE TODAY={TODAY}
"""
728x90
Comments