기초 통계 | 백분위수(Numpy.percentile)

백분위수(qercentile)

numpy.percentile(a, q, axis=None, out=None, overwrite_input=False, method='linear', keepdims=False, *, interpolation=None)

지정된 축을 따라 데이터의 q번째 백분위수를 계산합니다.

▪Parameters

‣ a : 입력 배열, 평균 값을 연산할 입력 배열

‣ q : 입력 배열, 백분위수 입력(0~100 범위)

‣ axis : 축 설정(선택 사항), 계산이 진행되는 축 설정

‣ out : 출력(선택 사항), 기본값 = None으로 결과를 저장할 대체 출력 배열

‣ overwrite_input : 입력 덮어쓰기(선택 사항), 기본값 = False로 불리언 값으로 설정

‣ method : 백분위수를 추정하는데 사용할 방법을 설정

1. 'inverted_cdf'

2. 'averaged_inverted_cdf'

3. 'closest_observation'

4. 'interpolated_inverted_cdf'

5. 'hazen'

6. 'weibull'

7. 'linear'

8. 'median_unbiased'

9. 'normal_unbiased'

‣ keepdims : 차원 유지, Bool 값으로 설정하고 True일 경우출력에서의 차원을 입력과 동일하게 유지

‣ interpolation : * v1.22.0 이후 사용 안함

‣ percentile : scalar or ndarray, 백분위수 결과 반환

예제(Example)

<Example 01>

import numpy as np

a = np.array([1, 2, 3, 4, 5])
b = np.array([[1, 2, 3], [4, 5, 6]])

print('1D Array')
print(np.percentile(a, 25))
print(np.percentile(a, 50))
print(np.percentile(a, 75))

print('2D Array')
print(np.percentile(b, 25))
print(np.percentile(b, 50))
print(np.percentile(b, 75))

1D Array
2.0
3.0
4.0
2D Array
2.25
3.5
4.75

<Example 02>

import numpy as np

a = np.array([1, 2, 3, 4, 5])
b = np.array([[1, 2, 3], [4, 5, 6]])

print('1D Array')
print(np.percentile(a, 25))
print(np.percentile(a, 50))
print(np.percentile(a, 75))

print('2D Array - Case: axis=0')
print(np.percentile(b, 25, axis=0))
print(np.percentile(b, 50, axis=0))
print(np.percentile(b, 75, axis=0))

print('2D Array - Case: axis=1')
print(np.percentile(b, 25, axis=1))
print(np.percentile(b, 50, axis=1))
print(np.percentile(b, 75, axis=1))

1D Array
2.0
3.0
4.0
2D Array - Case: axis=0
[1.75 2.75 3.75]
[2.5 3.5 4.5]
[3.25 4.25 5.25]
2D Array - Case: axis=1
[1.5 4.5]
[2. 5.]
[2.5 5.5]

<Example 03>

import matplotlib.pyplot as plt
import numpy as np

a = np.arange(4)
p = np.linspace(0, 100, 6001)
ax = plt.gca()
lines = [
    ('linear', '-', 'C0'),
    ('inverted_cdf', ':', 'C1'),
    # Almost the same as `inverted_cdf`:
    ('averaged_inverted_cdf', '-.', 'C1'),
    ('closest_observation', ':', 'C2'),
    ('interpolated_inverted_cdf', '--', 'C1'),
    ('hazen', '--', 'C3'),
    ('weibull', '-.', 'C4'),
    ('median_unbiased', '--', 'C5'),
    ('normal_unbiased', '-.', 'C6'),
    ]
for method, style, color in lines:
    ax.plot(
        p, np.percentile(a, p, method=method),
        label=method, linestyle=style, color=color)
ax.set(
    title='Percentiles for different methods and data: ' + str(a),
    xlabel='Percentile',
    ylabel='Estimated percentile value',
    yticks=a)
ax.legend()
plt.show()

Result