3-3-2-9. Boolean Indexing, Set Operations, and Sorting

Up to now we’ve seen how to make slices and

select elements of an NumPy array using indices.

This is useful when we know the exact indices of the elements we want to select.

However, there are many situations in

which we don’t know the indices of the elements we want.

For example; Suppose we have a 10,000 by 10,000 array of random integers

ranging from one to 15,000 and we only want to select integers that are less than 20.

Boolean indexing can help us in these cases by helping us select

elements using logical arguments instead of explicit indices.

Let’s see some examples.

Consider this five by five array ranging from zero to 24.

We can use boolean indexing to select elements greater than 10.

Like this. Instead of indices,

we are using a boolean expression.

Let’s also get the elements that are less than or equal to seven.

And now both greater than seven and less than 17.

We can use boolean indexing to assign the elements that are

between 10 and 17 to the value of negative one.

In addition to Boolean indexing,

NumPy also allows for set operations.

This is useful when comparing two NumPy arrays.

For example, to find common elements.

Consider these two rank one arrays.

We can create arrays for the intersection, difference, and union.

Like this. We can also sort NumPy arrays.

Let’s use NumPy sort function to sort rank one and rank two arrays in different ways.

Like with other functions we saw before,

the sort function can also be used as a method.

However, there’s a big difference on how the data is stored in memory in this case.

When sort is used as a function,

it sorts the NumPy array out of place,

meaning they don’t change the original array.

However, when you use sort as a method,

the array is sorted in place,

meaning the original array is changed.

Let’s create an unsorted rank one array.

We can sort x using sort as a function.

This will sort x out of place and leave the original array as is.

As you can see Numpy.sort did sort the x array,

but x itself did not change.

Notice that this sorts the array and leaves repeating values.

If you want to sort only the unique elements in x,

you can combine it with a unique function like this.

Now, let’s see how we can sort arrays in place by using sort as a method. Here’s x again.

If we sort x like this,

we will see that this affects the original x and sorts it.

When sorting rank two arrays,

we need to tell the sort function whether we are sorting by rows or by columns.

This is done by using the keyword axis.

Here is an unsorted rank two array.

We can sort x by rows like this,

which you can see here or we can sort X by columns like this, which you can see up here.

2702 / 5000

번역 결과

지금까지 슬라이스 만드는 방법과 인덱스를 사용하여 NumPy 배열의 요소를 선택합니다. 이것은 우리가 선택하려는 요소의 정확한 인덱스를 알고 있을 때 유용합니다. 그러나 많은 상황이 우리가 원하는 요소의 인덱스를 모릅니다. 예를 들어; 임의의 정수로 구성된 10,000 x 10,000 배열이 있다고 가정합니다. 1에서 15,000 사이이며 20보다 작은 정수만 선택하려고 합니다. 부울 인덱싱은 이러한 경우 선택하는 데 도움이 됩니다. 명시적 인덱스 대신 논리적 인수를 사용하는 요소. 몇 가지 예를 살펴보겠습니다. 0에서 24 사이의 5×5 배열을 고려하십시오. 부울 인덱싱을 사용하여 10보다 큰 요소를 선택할 수 있습니다. 이와 같이. 지수 대신, 우리는 부울 표현식을 사용하고 있습니다. 7보다 작거나 같은 요소도 구해 봅시다. 이제 7보다 크고 17보다 작습니다. 부울 인덱싱을 사용하여 다음과 같은 요소를 할당할 수 있습니다. 10과 17 사이에서 음수 값까지. 부울 인덱싱 외에도 NumPy는 또한 집합 연산을 허용합니다. 이것은 두 개의 NumPy 배열을 비교할 때 유용합니다. 예를 들어, 공통 요소를 찾기 위해. 이 두 개의 1순위 어레이를 고려하십시오. 교집합, 차분, 합집합에 대한 배열을 만들 수 있습니다. 이와 같이. NumPy 배열을 정렬할 수도 있습니다. NumPy 정렬 기능을 사용하여 순위 1을 정렬하고 두 배열의 순위를 다른 방식으로 지정해 보겠습니다. 이전에 본 다른 기능과 마찬가지로, sort 함수도 메소드로 사용할 수 있습니다. 그러나 이 경우 데이터가 메모리에 저장되는 방식에 큰 차이가 있습니다. sort를 함수로 사용하면 NumPy 배열을 제자리에서 정렬합니다. 즉, 원래 배열을 변경하지 않습니다. 그러나 sort를 메소드로 사용하면 배열이 제자리에 정렬되고, 원래 배열이 변경되었음을 의미합니다. 정렬되지 않은 1순위 배열을 만들어 보겠습니다. 정렬을 함수로 사용하여 x를 정렬할 수 있습니다. 이것은 x를 제자리에서 정렬하고 원래 배열을 그대로 둡니다. 보시다시피 Numpy.sort는 x 배열을 정렬했습니다. 그러나 x 자체는 변경되지 않았습니다. 이렇게 하면 배열이 정렬되고 반복되는 값이 남습니다. x의 고유한 요소만 정렬하려면, 이와 같은 고유한 기능과 결합할 수 있습니다. 이제 sort를 메서드로 사용하여 배열을 제자리에서 정렬하는 방법을 살펴보겠습니다. 여기 x가 또 있습니다. x를 이렇게 정렬하면 이것이 원래 x에 영향을 미치고 정렬한다는 것을 알 수 있습니다. 2순위 배열을 정렬할 때, 행 또는 열로 정렬할지 여부를 정렬 함수에 알려야 합니다. 이것은 키워드 축을 사용하여 수행됩니다. 다음은 정렬되지 않은 2순위 배열입니다. 다음과 같이 행별로 x를 정렬할 수 있습니다. 여기에서 볼 수 있거나 여기에서 볼 수 있는 이와 같은 열을 기준으로 X를 정렬할 수 있습니다.

Boolean Indexing, Set Operations, and Sorting

Up to now we have seen how to make slices and select elements of an ndarray using indices. This is useful when we know the exact indices of the elements we want to select. However, there are many situations in which we don’t know the indices of the elements we want to select. For example, suppose we have a 10,000 x 10,000 ndarray of random integers ranging from 1 to 15,000 and we only want to select those integers that are less than 20. Boolean indexing can help us in these cases, by allowing us select elements using logical arguments instead of explicit indices. Let’s see some examples:

Example 1. Boolean indexing

# We create a 5 x 5 ndarray that contains integers from 0 to 24
X = np.arange(25).reshape(5, 5)

# We print X
print()
print('Original X = \n', X)
print()

# We use Boolean indexing to select elements in X:
print('The elements in X that are greater than 10:', X[X > 10])
print('The elements in X that less than or equal to 7:', X[X <= 7])
print('The elements in X that are between 10 and 17:', X[(X > 10) & (X < 17)])

# We use Boolean indexing to assign the elements that are between 10 and 17 the value of -1
X[(X > 10) & (X < 17)] = -1

# We print X
print()
print('X = \n', X)
print()

Original X =
[[ 0 1 2 3 4]
 [ 5 6 7 8 9]
 [10 11 12 13 14]
 [15 16 17 18 19]
 [20 21 22 23 24]]

The elements in X that are greater than 10: [11 12 13 14 15 16 17 18 19 20 21 22 23 24]
The elements in X that less than or equal to 7: [0 1 2 3 4 5 6 7]
The elements in X that are between 10 and 17: [11 12 13 14 15 16]

X =
[[ 0 1 2 3 4]
 [ 5 6 7 8 9]
 [10 -1 -1 -1 -1]
 [-1 -1 17 18 19]
 [20 21 22 23 24]]

In addition to Boolean Indexing NumPy also allows for set operations. This useful when comparing ndarrays, for example, to find common elements between two ndarrays. Let’s see some examples:

Example 2. Set operations

# We create a rank 1 ndarray
x = np.array([1,2,3,4,5])

# We create a rank 1 ndarray
y = np.array([6,7,2,8,4])

# We print x
print()
print('x = ', x)

# We print y
print()
print('y = ', y)

# We use set operations to compare x and y:
print()
print('The elements that are both in x and y:', np.intersect1d(x,y))
print('The elements that are in x that are not in y:', np.setdiff1d(x,y))
print('All the elements of x and y:',np.union1d(x,y))

x = [1 2 3 4 5]

y = [6 7 2 8 4]

The elements that are both in x and y: [2 4]
The elements that are in x that are not in y: [1 3 5]
All the elements of x and y: [1 2 3 4 5 6 7 8]

numpy.ndarray.sort method

Syntax:

ndarray.sort(axis=-1, kind=None, order=None)

The method above sorts an array in-place. All arguments are optional, see the details here.

Like with other functions we saw before, the sort can be used as a method as well as a function. The difference lies in how the data is stored in memory in this case.

  • When numpy.sort() is used as a function, it sorts the ndrrays out of place, meaning, that it doesn’t change the original ndarray being sorted.
  • On the other hand, when you use numpy.ndarray.sort() as a method, ndarray.sort() sorts the ndarray in place, meaning, that the original array will be changed to the sorted one.

Let’s see some examples:

Example 3. Sort arrays using sort() function

# We create an unsorted rank 1 ndarray
x = np.random.randint(1,11,size=(10,))

# We print x
print()
print('Original x = ', x)

# We sort x and print the sorted array using sort as a function.
print()
print('Sorted x (out of place):', np.sort(x))

# When we sort out of place the original array remains intact. To see this we print x again
print()
print('x after sorting:', x)

Original x = [9 6 4 4 9 4 8 4 4 7]

Sorted x (out of place): [4 4 4 4 4 6 7 8 9 9]

x after sorting: [9 6 4 4 9 4 8 4 4 7]

Notice that np.sort() sorts the array but, if the ndarray being sorted has repeated values, np.sort() leaves those values in the sorted array. However, if desired, we can use the unique() function. Let’s see how we can sort the unique elements of x above:

# Returns the sorted unique elements of an array
print(np.unique(x))

[4 6 7 8 9]

Finally, let’s see how we can sort ndarrays in place, by using sort as a method:

Example 4. Sort rank-1 arrays using sort() method

# We create an unsorted rank 1 ndarray
x = np.random.randint(1,11,size=(10,))

# We print x
print()
print('Original x = ', x)

# We sort x and print the sorted array using sort as a method.
x.sort()

# When we sort in place the original array is changed to the sorted array. To see this we print x again
print()
print('x after sorting:', x)

Original x = [9 9 8 1 1 4 3 7 2 8]

x after sorting: [1 1 2 3 4 7 8 8 9 9]

numpy.sort function

Syntax:

numpy.sort(array, axis=-1, kind=None, order=None)

It returns a sorted copy of an array. The axis denotes the axis along which to sort. It can take values in the range -1 to (ndim-1). Axis can take the following possible values for a given 2-D ndarray:

  • If nothing is specified, the default value is axis = -1, which sorts along the last axis. In the case of a given 2-D ndarray, the last axis value is 1.
  • If explicitly axis = None is specified, the array is flattened before sorting. It will return a 1-D array.
  • If axis = 0 is specified for a given 2-D array – For one column at a time, the function will sort all rows, without disturbing other elements. In the final output, you will see that each column has been sorted individually.
  • The output of axis = 1 for a given 2-D array is vice-versa for axis = 0. In the final output, you will see that each row has been sorted individually.

Tip: As mentioned in this discussion, you can read axis = 0 as “down” and axis = 1 as “across” the given 2-D array, to have a correct usage of axis in your methods/functions.

Refer here for details about the optional arguments.

When sorting rank 2 ndarrays, we need to specify to the np.sort() function whether we are sorting by rows or columns. This is done by using the axis keyword. Let’s see some examples:

Example 5. Sort rank-2 arrays by specific axis.

# We create an unsorted rank 2 ndarray
X = np.random.randint(1,11,size=(5,5))

# We print X
print()
print('Original X = \n', X)
print()

# We sort the columns of X and print the sorted array
print()
print('X with sorted columns :\n', np.sort(X, axis = 0))

# We sort the rows of X and print the sorted array
print()
print('X with sorted rows :\n', np.sort(X, axis = 1))

Original X =
[[6 1 7 6 3]
  [3 9 8 3 5]
  [6 5 8 9 3]
  [2 1 5 7 7]
  [9 8 1 9 8]]

X with sorted columns :
[[2 1 1 3 3]
  [3 1 5 6 3]
  [6 5 7 7 5]
  [6 8 8 9 7]
  [9 9 8 9 8]]

X with sorted rows :
[[1 3 6 6 7]
  [3 3 5 8 9]
  [3 5 6 8 9]
  [1 2 5 7 7]
  [1 8 8 9 9]]

Additional Resource

Supporting Materials

다음;

%d 블로거가 이것을 좋아합니다: