Notice
Recent Posts
Recent Comments
Link
일 | 월 | 화 | 수 | 목 | 금 | 토 |
---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | |||
5 | 6 | 7 | 8 | 9 | 10 | 11 |
12 | 13 | 14 | 15 | 16 | 17 | 18 |
19 | 20 | 21 | 22 | 23 | 24 | 25 |
26 | 27 | 28 | 29 | 30 | 31 |
Tags
- Cluster
- 설정
- 로그인
- login
- SpringBoot
- Jenkins
- vue
- 자바
- 예제
- Kafka
- fastcampus
- 머신러닝
- 간단
- Redis
- Zeppelin
- Docker
- gradle
- ec2
- EMR
- config
- 레디스
- spring
- 클러스터
- 자동
- 젠킨스
- aws
- java
- redash
- Mac
- hive
Archives
- Today
- Total
코알못
[pyspark] UNSUPPORTED_SUBQUERY_EXPRESSION_CATEGORY 본문
728x90
오류
pyspark.errors.exceptions.captured.AnalysisException: [UNSUPPORTED_SUBQUERY_EXPRESSION_CATEGORY.MUST_AGGREGATE_CORRELATED_SCALAR_SUBQUERY] Unsupported subquery expression: Correlated scalar subqueries must be aggregated to return at most one row.;
서브쿼리에 행이 1개 이상 반환될수 있는 쿼리라 아래와 같이 수정
- 이전
from pyspark.sql import SparkSession
import dateutil.tz as tz
from datetime import timedelta,datetime
from dateutil.relativedelta import relativedelta
import sys
argument=sys.argv
kst = tz.gettz('Asia/Seoul')
ksttime = datetime.now(tz=kst)
today = ksttime.astimezone(kst)
if len(argument) > 1:
today=datetime.strptime(argument[1],'%Y-%m-%d')
yesterday = today - timedelta(days=1)
day_before_yesterday = today - timedelta(days=2)
ago_month = today + relativedelta(months=-1)
appname="song_analysis_daily"
spark = SparkSession.builder.appName(appname).config("spark.home", "/usr/lib/spark").config("spark.sql.debug.maxToStringFields","2000").enableHiveSupport().getOrCreate()
TODAY="2023-09-14"
query=f"""select a,
(
SELECT b
FROM TB_DATA AS D
WHERE A.d = D.d
) AS like_cnt,
c
FROM TB_TEST AS A
WHERE TODAY={TODAY}
"""
- 변경 : 쿼리에 행 하나 나올 수 있도록 FIRST 함수 추가
from pyspark.sql import SparkSession
import dateutil.tz as tz
from datetime import timedelta,datetime
from dateutil.relativedelta import relativedelta
import sys
argument=sys.argv
kst = tz.gettz('Asia/Seoul')
ksttime = datetime.now(tz=kst)
today = ksttime.astimezone(kst)
if len(argument) > 1:
today=datetime.strptime(argument[1],'%Y-%m-%d')
yesterday = today - timedelta(days=1)
day_before_yesterday = today - timedelta(days=2)
ago_month = today + relativedelta(months=-1)
appname="song_analysis_daily"
spark = SparkSession.builder.appName(appname).config("spark.home", "/usr/lib/spark").config("spark.sql.debug.maxToStringFields","2000").enableHiveSupport().getOrCreate()
TODAY="2023-09-14"
query=f"""select a,
(
SELECT FIRST(b)
FROM TB_DATA AS D
WHERE A.d = D.d
) AS like_cnt,
c
FROM TB_TEST AS A
WHERE TODAY={TODAY}
"""
728x90
'ETC' 카테고리의 다른 글
[kubernetes] Ingress - IngressController 설치 (0) | 2023.09.10 |
---|---|
[CKA] kubernetes Headless Service(core-dns 기능이용), kube Proxy (0) | 2023.09.10 |
[CKA] kubernetes Service (0) | 2023.09.10 |
[CKA] kubernetes Controller - JobController (0) | 2023.09.10 |
[CKA] kubernetes Controller - StatefulSet (0) | 2023.09.09 |
Comments