[기타 데이터] Wine 데이터 분석 1 (데이터 확인 / 질문하기 / 데이터 전처리)

728x90

Wine 데이터 분석

포르투갈 비노 베르데(Vinho Verde) 지역의 레드와 화이트 와인의 데이터가 들어있는 데이터 셋.

데이터 출처 : https://archive.ics.uci.edu/ml/datasets/wine+quality

UCI Machine Learning Repository: Wine Quality Data Set

Wine Quality Data Set Download: Data Folder, Data Set Description Abstract: Two datasets are included, related to red and white vinho verde wine samples, from the north of Portugal. The goal is to model wine quality based on physicochemical tests (see [Cor

archive.ics.uci.edu

1. 데이터 확인

# 기본 패키지 불러오기

import math
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
plt.style.use("seaborn") 
sns.set(font_scale = 1)
sns.set_style("whitegrid")

import plotly.express as px

import chart_studio.plotly as py
import cufflinks as cf
cf.go_offline(connected=True)

import plotly.graph_objects as go
import plotly.offline as pyo
pyo.init_notebook_mode()

from plotly.subplots import make_subplots

import missingno as msno

import warnings # 경고 메세지 숨기기
warnings.filterwarnings(action='ignore')

plt.rcParams['font.family'] = 'S-Core Dream' # 한글 폰트 가져오기
# plt.rc("font", family = "AppleGothic") # 한글 폰트 가져오기
plt.rcParams['axes.unicode_minus'] = False # - 기호 깨짐 해결

# 데이터 불러오기

red = pd.read_csv("data/winequality-red.csv", sep = ";") # sep 로 구분
white = pd.read_csv("data/winequality-white.csv", sep = ";") # sep 로 구분

print(red.shape)
print(white.shape)

#red.head()
white.head()

<컬럼 설명>
- fixed acidity: 결합산
- volatile acidity: 휘발성 산
- citric acid: 시트르산
- redisual sugar: 잔당
- chlorides: 염화물
- free sulfur dioxide: 유리 이산화황
- total sulfur dioxide: 이산화황
- density: 비중
- pH: 산도
- sulphates: 황
- alcohol: 알콜 도수
- quality: 품질(0 ~ 10 사이의 값)

# 데이터 정보 확인

#red.info()
white.info()

# 수치형 데이터 통계치 확인

#red.describe()
white.describe()

# 결측치 확인

#red.isnull().sum()
white.isnull().sum()

RED, WHITE 데이터 모두 결측치가 없습니다.

2. 질문하기

- 각 퀄리티 당 몇개의 와인이 있을까?
- 각 컬럼 데이터 분포 확인
- 와인의 어떤 특성이 와인 퀄리티와 가장 밀접한 관련이 있을까?

3. 데이터 전처리

3-1 데이터 합치기

# red, white 데이터에 데이터 구분 컬럼 생성

red["tag"] = "r"
white["tag"] = "w"

print(red.shape)
print(white.shape)

#red.head()
white.head()

# red, white 데이터를 위아래로 합치기

wine = pd.concat([red, white])

print(wine.shape)
wine.head()

728x90

저작자표시 (새창열림)

'Data Analysis > 기타 데이터' 카테고리의 다른 글

[기타 데이터] Wine 데이터 분석 2 (EDA / 시각화 / 리뷰) (0)	2021.10.22
[기타 데이터] Airbnb NewYork 데이터 분석 3 (EDA / 시각화 / 리뷰) (0)	2021.10.21
[기타 데이터] Airbnb NewYork 데이터 분석 2 (데이터 전처리) (0)	2021.10.20
[기타 데이터] Airbnb NewYork 데이터 분석 1 (데이터 확인 / 질문) (0)	2021.10.20
[기타 데이터] Commerce 데이터 분석 3 (EDA / 시각화 / 리뷰) (0)	2021.10.13

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

분석하는 SCM

[기타 데이터] Wine 데이터 분석 1 (데이터 확인 / 질문하기 / 데이터 전처리)

Wine 데이터 분석

1. 데이터 확인

2. 질문하기

3. 데이터 전처리

3-1 데이터 합치기

'Data Analysis > 기타 데이터' 카테고리의 다른 글

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역

[기타 데이터] Wine 데이터 분석 1 (데이터 확인 / 질문하기 / 데이터 전처리)

Wine 데이터 분석

1. 데이터 확인

2. 질문하기

3. 데이터 전처리

3-1 데이터 합치기

'Data Analysis > 기타 데이터' 카테고리의 다른 글

'Data Analysis/기타 데이터' Related Articles

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역