laitimes

CleverCSV, a fantastic python library

author:Not bald programmer
CleverCSV, a fantastic python library

introduce

CleverCSV is a Python-based library designed to provide a smarter and more flexible way to work with CSV files than the standard library csv. The library uses machine learning algorithms to probe the correct dialing structure of CSV files, which solves read issues caused by different CSV file formats. It is especially useful for working with CSV files with complex structures or non-standard delimiters.

Installation:

Installing CleverCSV is a simple process that can be done with Python's package manager, pip. Open your terminal or command prompt and enter the following command:

pip install clevercsv           

Make sure your pip version is up-to-date to avoid any installation-related issues.

CleverCSV, a fantastic python library

Usage

Once you have installed CleverCSV, you can use it to read CSV files in the following ways:

  1. Import the necessary modules:
import clevercsv           
  1. Read the file using the clevercsv.read_csv() method, which will automatically detect delimiter and quotation mark characters:
dataframe = clevercsv.read_csv("your_file.csv")           
  1. If you want more control, you can also use the clevercsv.detect_dialect() method to detect the CSV dial first, and then use that dial in a standard csv.reader:
dialect = clevercsv.detect_dialect("your_file.csv")
with open("your_file.csv", newline='') as csvfile:
    reader = csv.reader(csvfile, dialect=dialect)
    for row in reader:
        print(row)           

Code samples

Since CleverCSV is primarily used to automatically detect and read CSV files, a simple instance typically doesn't exceed 150 lines of code. However, to meet the requirements, we can create a multi-step example that will:

  1. Generate a complex CSV file.
  2. Use CleverCSV to probe CSV dialing.
  3. Read the CSV file.
  4. Do some data manipulation.
  5. Write the modified data back to a new CSV file.

This process is split into multiple functions, each of which will handle a single step.

import clevercsv
import pandas as pd
import numpy as np
import os

# 步骤 1: 生成一个复杂的 CSV 文件
def generate_complex_csv(filename, rows=100):
    data = {
        "Column1": np.random.rand(rows),
        "Column2;Column3": np.random.choice(['a', 'b', 'c', 'd'], size=(rows, 2), replace=True).tolist(),
        "Column4": np.random.randint(0, 100, size=rows)
    }
    df = pd.DataFrame(data)

    # 将 "Column2;Column3" 分割成两列,并合并回数据框 
    df[["Column2", "Column3"]] = pd.DataFrame(df["Column2;Column3"].tolist(), index=df.index)
    df.drop("Column2;Column3", axis=1, inplace=True)

    # 将数据写到 CSV 文件中,使用 ";" 作为分隔符
    df.to_csv(filename, sep=';', index=False)

# 步骤 2 和 3: 探测拨号并读取 CSV
def read_csv_with_clevercsv(filename):
    dialect = clevercsv.detect_dialect(filename)
    return clevercsv.read_csv(filename, dialect=dialect)

# 步骤 4: 进行一些数据操作
def manipulate_data(df):
    # 假设操作是对 Column4 进行平方
    df["Column4"] = df["Column4"] ** 2
    return df

# 步骤 5: 将数据写回 CSV
def write_data_to_csv(df, filename):
    df.to_csv(filename, index=False)

# 主执行函数
def main():
    # 设置文件名
    input_filename = 'complex_data.csv'
    output_filename = 'processed_data.csv'

    # 生成 CSV
    generate_complex_csv(input_filename)

    # 读取 CSV 文件
    df = read_csv_with_clevercsv(input_filename)
    print("Original Data:")
    print(df.head())

    # 数据操作
    manipulated_df = manipulate_data(df)
    print("\nManipulated Data:")
    print(manipulated_df.head())

    # 写回新的 CSV 文件
    write_data_to_csv(manipulated_df, output_filename)

    # 清理生成的文件
    os.remove(input_filename)
    os.remove(output_filename)

if __name__ == "__main__":
    main()           

summary

CleverCSV is a very useful library that can intelligently handle CSV files in multiple formats, especially if the structure is irregular or the separators are inconsistent. While CleverCSV doesn't require complex code to implement its basic functionality, you can still create large scripts with rich logic by incorporating custom data manipulation and processing flows. When dealing with unknown or irregular CSV data, CleverCSV is a tool worth trying.