Exploring Pure Sorting in Python

Date:

Share post:

 


Picture by Writer
 

What Is Pure Sorting, And Why Do We Want It?

 

When working with Python iterables reminiscent of lists, sorting is a typical operation you’ll carry out. To kind lists you need to use the checklist methodology kind() to kind a listing in place or the sorted() operate that returns a sorted checklist.

The sorted() operate works tremendous when you may have a listing of numbers or strings containing letters. However what about strings containing alphanumeric characters, reminiscent of filenames, listing names, model numbers, and extra? The sorted() operate performs lexicographic sorting.

Take a look at this straightforward instance:

# Checklist of filenames
filenames = ["file10.txt", "file2.txt", "file1.txt"]

sorted_filenames = sorted(filenames)
print(sorted_filenames)

 

You will get the next output:

Output >>> ['file1.txt', 'file10.txt', 'file2.txt']

 

Nicely, ‘file10.txt’ comes earlier than ‘file2.txt’ within the output. Not the intuitive sorting order we’re hoping for. It is because the sorted() operate makes use of the ASCII values of the characters to kind and never the numeric values. Enter pure sorting.

Pure sorting is a sorting method that arranges parts in a manner that displays their pure order, notably for alphanumeric information. In contrast to lexicographic sorting, pure sorting interprets the numerical worth of digits inside strings and arranges them accordingly, leading to a extra significant and anticipated sequence.

On this tutorial, we’ll discover pure sorting with the Python library natsort.

 

Getting Began

 

To get began, you possibly can set up the natsort library utilizing pip:

 

As a finest apply, set up the required bundle in a digital setting for the mission. As a result of natsort requires Python 3.7 or later, ensure you’re utilizing a current Python model, ideally Python 3.11 or later. To discover ways to handle completely different Python variations, learn Too Many Python Variations to Handle? Pyenv to the Rescue.

 

Pure Sorting Fundamental Examples

 
We’ll begin with easy use instances the place pure sorting is helpful:

  • Sorting file names: When working with file names containing digits, pure sorting ensures that recordsdata are ordered within the pure intuitive order.
  • Model sorting: Pure sorting can be useful for ordering strings of model numbers, making certain that variations are sorted based mostly on their numerical values moderately than their ASCII values. Which could not mirror the specified versioning sequence.

Now let’s proceed to code these examples.

 

Sorting Filenames

 
Now that we’ve put in the natsort library, we are able to import it into our Python script and use the completely different features that the library gives.

Let’s revisit the primary instance of sorting file names (the one we noticed in the beginning of the tutorial) the place the lexicographic sorting with the operate was not what we wished.

Now let’s kind the identical checklist utilizing the natsorted() operate like so:

import natsort

# Checklist of filenames
filenames = ["file10.txt", "file2.txt", "file1.txt"]

# Kind filenames naturally
sorted_filenames = natsort.natsorted(filenames)
print(sorted_filenames)

 

On this instance, natsorted() operate from the natsort library is used to kind the checklist of file names naturally. Because of this, the file names are organized within the anticipated numerical order:

Output >>> ['file1.txt', 'file2.txt', 'file10.txt']

 

Sorting Model Numbers

 
Let’s take one other related instance the place we have now strings denoting variations:

import natsort

# Checklist of model numbers
variations = ["v-1.10", "v-1.2", "v-1.5"]

# Kind variations naturally
sorted_versions = natsort.natsorted(variations)

print(sorted_versions)

 

Right here, the natsorted() operate is utilized to kind the checklist of model numbers naturally. The ensuing sorted checklist maintains the right numerical order of the variations:

Output >>> ['v-1.2', 'v-1.5', 'v-1.10']

 

Customizing Sorting with a Key

 

When utilizing the built-in sorted() operate, you might need used the key parameter to customise. Equally, the sorted() operate additionally takes the elective key parameter which you need to use to kind based mostly on particular standards.

Let’s take an instance: we have now file_data which is the checklist of tuples. The primary ingredient within the tuple (at index 0) is the file title and the second merchandise (at index 1) is the dimensions of the file.

Say we wish to kind based mostly on the file dimension in ascending order. So we set the key parameter to lambda x: x[1] in order that the file dimension at index 1 is used because the sorting key:

import natsort

# Checklist of tuples containing filename and dimension
file_data = [
("data_20230101_080000.csv", 100),
("data_20221231_235959.csv", 150),
("data_20230201_120000.csv", 120),
("data_20230115_093000.csv", 80)
]

# Kind file information based mostly on file dimension
sorted_file_data = natsort.natsorted(file_data, key=lambda x:x[1])

# Print sorted file information
for filename, dimension in sorted_file_data:
    print(filename, dimension)

 

Right here’s the output:

data_20230115_093000.csv 80
data_20230101_080000.csv 100
data_20230201_120000.csv 120
data_20221231_235959.csv 150

 

Case-Insensitive Sorting of Strings

 

One other use case the place pure sorting is useful is while you want case-insensitive sorting of strings. Once more the lexicographic sorting based mostly on ASCII values is not going to give the specified outcomes.

To carry out case-insensitive sorting, we are able to set alg to natsort.ns.IGNORECASE which is able to ignore the case when sorting. The alg key controls the algorithm that natsorted() makes use of:

import natsort

# Checklist of strings with blended case
phrases = ["apple", "Banana", "cat", "Dog", "Elephant"]

# Kind phrases naturally with case-insensitivity
sorted_words = natsort.natsorted(phrases, alg=natsort.ns.IGNORECASE)

print(sorted_words)

 

Right here, the checklist of phrases with blended case is sorted naturally with case-insensitivity:

Output >>> ['apple', 'Banana', 'cat', 'Dog', 'Elephant']

 

Wrapping Up

 

And that is a wrap! On this tutorial, we reviewed the constraints of lexicographic sorting and the way pure sorting generally is a good different when working with alphanumeric strings. You’ll find all of the code on GitHub.

We began with easy examples and likewise checked out sorting based mostly on customized keys and dealing with case-insensitive sorting in Python. Subsequent, you could discover different capabilities of the natsort library. I’ll see you all quickly in one other Python tutorial. Till then, preserve coding!

 

 

Bala Priya C is a developer and technical author from India. She likes working on the intersection of math, programming, information science, and content material creation. Her areas of curiosity and experience embody DevOps, information science, and pure language processing. She enjoys studying, writing, coding, and low! At present, she’s engaged on studying and sharing her data with the developer group by authoring tutorials, how-to guides, opinion items, and extra. Bala additionally creates participating useful resource overviews and coding tutorials.

Related articles

The Tempo of AI: The Subsequent Part within the Way forward for Innovation

Because the emergence of ChatGPT, the world has entered an AI growth cycle. However, what most individuals don’t...

How They’re Altering Distant Work

Distant work has change into part of on a regular basis life for many people. Whether or not...

David Maher, CTO of Intertrust – Interview Sequence

David Maher serves as Intertrust’s Govt Vice President and Chief Know-how Officer. With over 30 years of expertise in...