您的位置：首页 > 脚本大全 > > 正文

pandas列设置随机值（Pandas统计重复的列里面的值方法）

更多时间：2022-04-02 02:10:17 类别：脚本大全浏览量：1668

pandas列设置随机值

Pandas统计重复的列里面的值方法

pandas

代码如下:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16 import pandas as pd

import numpy as np

salaries = pd.DataFrame({

'name': ['BOSS', 'Lilei', 'Lilei', 'Han', 'BOSS', 'BOSS', 'Han', 'BOSS'],

'Year': [2016, 2016, 2016, 2016, 2017, 2017, 2017, 2017],

'Salary': [1, 2, 3, 4, 5, 6, 7, 8],

'Bonus': [2, 2, 2, 2, 3, 4, 5, 6]

})

print(salaries)

print(salaries['Bonus'].duplicated(keep='first'))

print(salaries[salaries['Bonus'].duplicated(keep='first')].index)

print(salaries[salaries['Bonus'].duplicated(keep='first')])

print(salaries['Bonus'].duplicated(keep='last'))

print(salaries[salaries['Bonus'].duplicated(keep='last')].index)

print(salaries[salaries['Bonus'].duplicated(keep='last')])

输出如下：

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37 Bonus Salary Year name

0 2 1 2016 BOSS

1 2 2 2016 Lilei

2 2 3 2016 Lilei

3 2 4 2016 Han

4 3 5 2017 BOSS

5 4 6 2017 BOSS

6 5 7 2017 Han

7 6 8 2017 BOSS

0 False

1 True

2 True

3 True

4 False

5 False

6 False

7 False

Name: Bonus, dtype: bool

Int64Index([1, 2, 3], dtype='int64')

Bonus Salary Year name

1 2 2 2016 Lilei

2 2 3 2016 Lilei

3 2 4 2016 Han

0 True

1 True

2 True

3 False

4 False

5 False

6 False

7 False

Name: Bonus, dtype: bool

Int64Index([0, 1, 2], dtype='int64')

Bonus Salary Year name

0 2 1 2016 BOSS

1 2 2 2016 Lilei

2 2 3 2016 Lilei

非pandas

对于如nunpy中的这些操作主要如下:

假设有数组

a = np.array([1, 2, 1, 3, 3, 3, 0])

想找出 [1 3]

则有

1

2

3

4

5 方法1

m = np.zeros_like(a, dtype=bool)

m[np.unique(a, return_index=True)[1]] = True

a[~m]

1

2

3 方法2

a[~np.in1d(np.arange(len(a)), np.unique(a, return_index=True)[1], assume_unique=True)]

1

2

3 方法3

np.setxor1d(a, np.unique(a), assume_unique=True)

1

2

3

4 方法4

u, i = np.unique(a, return_inverse=True)

u[np.bincount(i) > 1]

1

2

3

4 方法5

s = np.sort(a, axis=None)

s[:-1][s[1:] == s[:-1]]

参考：https://stackoverflow.com/questions/11528078/determining-duplicate-values-in-an-array

以上这篇Pandas统计重复的列里面的值方法就是小编分享给大家的全部内容了，希望能给大家一个参考，也希望大家多多支持开心学习网。

原文链接：https://blog.csdn.net/hguo11/article/details/82556171

标签：重复列值 pandas

上一篇：虚拟主机已被禁用（虚拟主机提示网站被暂停了是怎么回事？）

下一篇：display flex 布局（解决display:flex属性 justify-content: space-between换行后的排版问题）

您可能感兴趣

pandas列设置随机值（Pandas统计重复的列里面的值方法）

pandas列设置随机值

热门推荐

排行榜