[Kaggle] Hotel Booking Demand 데이터셋 분석 2 (데이터 전처리)

728x90

이전 내용은 아래 글에서 확인하실 수 있습니다.

[Data Analysis/Kaggle] - [Kaggle] Hotel Booking Demand 데이터셋 분석 1 (데이터 확인 / 질문)

[Kaggle] Hotel Booking Demand 데이터셋 분석 1 (데이터 확인 / 질문)

Hotel Booking Demand 데이터셋 분석 데이터셋 개요 : 도시 및 리조트 호텔의 예약정보 데이터 분석 목적 : 호텔 객실 예약 및 취소에 대한 정보 분석 데이터셋 출처 : https://www.kaggle.com/jessemo..

sks8410.tistory.com

3. 데이터 전처리

3-1 컬럼명 변경

# hotel, is_canceled, is_repeated_guest 컬럼 이름 변경

hotel = hotel.rename(columns = {"hotel" : "hotel_type", "is_canceled" : "canceled", "is_repeated_guest" : "repeated_guest"})

hotel.columns

3-2 결측치 처리

3-2-1 children 컬럼

hotel["children"].value_counts()

children 컬럼의 대부분 값이 0 이므로 결측치도 0 으로 처리 하도록 하겠습니다.

# children 컬럼 결측치 0으로 변경

hotel["children"] = hotel["children"].fillna(0)

hotel["children"].isnull().sum()

3-2-2 country 컬럼

print("country 컬럼에 결측치가 있는 index 비중: {:.2f}%".format(len(hotel[hotel["country"].isnull()]) / len(hotel["country"]) * 100))

country 컬럼은 결측치를 대체할 값이 명확하지 않고 비중도 적으므로 결측치를 삭제하도록 하겠습니다.

# country 컬럼 결측치 삭제 

hotel = hotel.dropna(subset = ["country"])

hotel["country"].isnull().sum()

3-2-3 agent 컬럼

print("agent 컬럼 결측치가 있는 index 비중: {:.2f}%".format(len(hotel[hotel["agent"].isnull()]) / len(hotel["agent"]) * 100))

agent 컬럼은 결측치를 대체할 값이 명확하지 않으므로 Unknown 으로 변경하도록 하겠습니다.

# agent 컬럼 결측치 Unknown 으로 변경

hotel["agent"] = hotel["agent"].fillna("Unknown")

hotel["agent"].isnull().sum()

3-2-4 company 컬럼

print("company 컬럼에 결측치가 있는 index 비중: {:.2f}%".format(len(hotel[hotel["company"].isnull()]) / len(hotel) * 100))

company 컬럼은 결측치가 전체 index 의 대부분을 차지하고 대체할 값도 알지 못하므로 이번 분석에서는 제외하도록 하겠습니다.

# company 컬럼 삭제

hotel = hotel.drop("company", axis = 1)

hotel.columns

728x90

저작자표시 (새창열림)

'Data Analysis > Kaggle' 카테고리의 다른 글

[Kaggle] Hotel Booking Demand 데이터셋 분석 3-1 (데이터 시각화) (0)	2021.11.09
[Kaggle] Hotel Booking Demand 데이터셋 분석 1 (데이터 확인 / 질문) (0)	2021.11.08
[Kaggle] Students Performance in Exam 데이터 분석 2 (EDA / 시각화 / 리뷰) (0)	2021.11.05
[Kaggle] Students Performance in Exam 데이터 분석 1 (데이터 확인 / 질문 / 전처리) (0)	2021.11.03
[Kaggle] Nexfilx movies and TV shows 데이터 분석 3 (EDA / 시각화 / Review) (0)	2021.11.02

분석하는 SCM

[Kaggle] Hotel Booking Demand 데이터셋 분석 2 (데이터 전처리)

3. 데이터 전처리

3-1 컬럼명 변경

3-2 결측치 처리

3-2-1 children 컬럼

3-2-2 country 컬럼

3-2-3 agent 컬럼

3-2-4 company 컬럼

'Data Analysis > Kaggle' 카테고리의 다른 글

티스토리툴바

[Kaggle] Hotel Booking Demand 데이터셋 분석 2 (데이터 전처리)

3. 데이터 전처리

3-1 컬럼명 변경

3-2 결측치 처리

3-2-1 children 컬럼

3-2-2 country 컬럼

3-2-3 agent 컬럼

3-2-4 company 컬럼

'Data Analysis > Kaggle' 카테고리의 다른 글

'Data Analysis/Kaggle' Related Articles

티스토리툴바