실습 | Notion

데이터 준비

필요한 라이브러리와 데이터 셋 불러오기

목표 : 유방암 여부 예측

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

from sklearn.datasets import load_breast_cancer
cancer = load_breast_cancer()

print(cancer.DESCR)

결과

각 10개의 속성을 가지는 평균, 표준편차, 최악값들의 pairplot그리기

from sklearn.datasets import load_breast_cancer
cancer = load_breast_cancer()

df = pd.DataFrame(cancer.data, columns = cancer.feature_names)
df["class"] = cancer.target

sns.pairplot(df[["class"] + list(df.columns[10:20])])
plt.show()

코드 설명
결과

데이터셋 확인하기

파란색과 주황색이 양 극단으로 치우처져 있어야 매우 훈련이 잘된 것임
그런데 이 경우 전부 어느정도 겹쳐있음을 확인할 수 있음

cols = ["mean radius", "mean texture", "mean smoothness", "mean compactness",
"mean concave points", "worst radius", "worst texture", "worst smoothness", "worst compactness", 
"worst concave points", "class"]

for c in cols[:-1] :
	sns.histplot(df, x=c, hue=cols[-1], bins=50, stat='probability')
	plt.show()

결과

Untitled

Untitled

Untitled

Untitled

코드설명