One. background

For the testing of big data-related projects, a large amount of test data is often required, and it is difficult to obtain such a large amount of real data during on-site testing, which requires testers to quickly simulate test data that meets the requirements. At this time, if there is an automation tool or test script, it is simply not too cool~

For big data testing, you should definitely try Python's fake library

Two. Problem introduction

For Internet big data testing, when it is necessary to quickly simulate user data in tens of thousands (including username, IP address, hostname, access URL, etc.), is there any method that can be recommended?

At this time, you must try Python's fake library, which is simple and convenient, and it is good to use!

Three. Function introduction

What information can the fake library emulate? Analog information mainly includes: people, address, company information, document information, Internet information, etc., the official address is detailed in: https://faker.readthedocs.io/en/master/#.

Four. Case introduction

1. Interface test - simulate the interface input parameters

Interface testing is a must in almost every tester's testing career. For interface testing, often we need to pass in the corresponding parameter values according to the definition to verify the correctness of the interface return. For example, an interface needs to pass in int-type variables, and the upper and lower limits are 0 and 100, respectively. When writing interface automated tests, we often use the equivalent class and boundary value test method to select several test data, such as: 0, 100, -1, 101, 50 for testing, but after all, the test data coverage is not comprehensive enough.

For this kind of need to enrich test data, we can use fake's python data function to randomly generate int number (fake.pyint()), generate float number (fake.pyfloat(left_digits=None, right_digits=None, positive=False)) and so on.

2. Business testing - big data user information simulation

For some projects, it is necessary to simulate user information for business testing (for example, nucleic acid monitoring system, concurrent test of information collection of a large number of users). Let's say we need to generate 10,000 users, including: username, contact number, email address, date of birth, city, company and ID number. How do you simulate using the fake library?

A simple data example is shown in the following figure:

Run the result, simulate the user as shown in the following figure:

3. Security testing – Internet access information simulation

For some Internet access security projects, it is often necessary to simulate the user's Internet access information and determine whether the user has dangerous behavior from the information. For example, a company's security detection system needs to access information from the employee's Internet (including: employee machine host_name, source IP of the access machine, access website URL, access website IP, etc.) to monitor whether employees visit dangerous websites or dangerous IPs. So, when testing the system and unable to obtain real user access information, how to use the fake library to simulate the test data for the business test of the system under test?

A simple code example of data simulation is shown in the following figure:

The running result shows the simulated user Internet information as shown in the following figure:

五.Q&A

Why is the user name, address, and other information generated in the above example Chinese, and how is the English information generated?

Simulating Chinese information, when initializing the fake library, use Faker(locale='zh_CN'), and the user information generated by default Faker() initialization is in English. Of course, we can also use other languages (such as Japanese) to generate corresponding information, and use Faker (local='ja_JP') when initializing;

How to customize and generate some customized information, such as using custom words to generate text?

ext_word_list parameters can be used. For example:

At last:

1) Follow + private message reply: "Test", you can get a free 10G software test engineer interview document. And the corresponding video learning tutorials to share for free! , including basic knowledge, Linux essentials, Mysql database, packet capture tool, interface test tool, test advanced - Python programming, web automation test, APP automation test, interface automation test, test advanced continuous integration, test architecture development test framework, performance test, etc.

For big data testing, you should definitely try Python's fake library

At last: