This script will generate randomised sample data that conforms to the specified DDL and restrictions for all tables in the database (BookEdition, BookCopy, loan, and Client). It's important that your test database meets the below defined assumptions, otherwise this script will not work.
- All columns are named exactly as named in the project specification for each table
- Any triggers or integrity checks are implemented correctly (data generated here shouldn't be rejected from insert or update)
- You have a basic understanding of how to run Python executables and install any dependencies
To use this script, it is important that you have the following modules installed:
- names (pip3 install names)
- pandas (pip3 install pandas) If you are using a Python distro, i.e., Anaconda, Miniconda, etc, follow their conventions for installing modules.
Other dependencies that should already be installed are:
- random
- datetime
- sqlite3
- Fork this repository or copy and paste the code in the "sql-script.py" file in to your own python file.
- Ensure that the python file is in the same directory/file as your test database (the file ending with a .db extension)
- Run this script using "python3 sql-script.py"
- Enter the name of your test database when prompted, include the file extension (.db)
- If the database was found, your tables should now be populated with sample data and you should see a success message before the end of the script
If the script does not work, you have some errors in your test database. Some likely issues may be:
- Incorrectly named tables
- Incorrect schema (columns in the wrong place)
- Incorrectly named columns
- You input the name of the database incorrectly
- Your constraints and/or integrity checks are incorrect and preventing insertion or updating of data
If this script cannot find your database, it will make it's own in the same name. Double check your table DDL's and ensure that they are the tables that YOU have written. This script contains no DDL, as such, the tables will have no keys, constraints, or integrity checks.
While care was taken to write this script, you should not rely solely on the data generated to ensure that your database is 100% correct. Bugs can occur in strange places unbeknownst to the user or developer. Make sure you test it with some of your own data that you know should absolutely work.
If you find any bugs, please contact me ASAP so I can address them.
This script uses mostly list comprehension to generate date
- The ISBN values were generated by using a function called checkSum that takes a 4 digit number and converts it into a 5 digit checksum
- The author values were generated using the name module, which generates a single random name when called
- The publicationDate values were generated by a random number generated between the numbers 1900 and 2022
- The genre values were generated by first creating a template list of a few genres and then iterating over the list and selecting a random index
- The ISBN values were selected randomly from the already generated values in BookEdition
- The copyNumber values were generated by iterating through the ISBN values and counting the occurences of each value progressively
- The daysLoaned values were generated by filling all values with 0
- The clientId values are a range of numbers from 1 to 60
- The name values were randomly generated by the names module
- The residence values were generated by first creating a template list of a few residences and then iterating over the list and selecting a random index
First, a randomised list of tuples containing ISBN and copyNumber from BookCopy was generated to ensure no mismatches
- The clientId values were randomly selected from the existing clientId values in Client
- The ISBN values were taken from the existing list of random tuples
- The copyNumber values were taken from the existing list of random tuples
- The dateOut values were generated by using a function called randomDateString which generated a random date between 2015 and 2022 inclusive
- The dateBack values were generated by filling all values with NULL
Firstly, a new list was created to reflect new dateBack values. The values were generated by using a function called randomDayAdder which takes two arguments, namely maxDate and date (dateOut). This function would randomly add anywhere from 4 to 20 days to a date. It ignored any dates where the date equaled maxDate, allowing for NULL values and enabling database testing.
Following this, the loan table is iterated row wise with an insert statement to insert the loan row values before the book was return. Immediately after this, the inserted row was then updated if the adjacent dateBack generated values were not NULL. This allows the testing of the trigger portion of the project specification. See Note 1.
- Due to the nature of inserting bulk random but conforming sample data, the update statement from the project specification required modification to ensure that the correct row in loans was updated. The update statement in the specification does not check the clientId and of whether the return date is greater than the loan date. As such, insertion and updating of bulk sample data lead to negative values in the loanedDays column of the BookCopy table, which should not be possible. The modification of the query ensures that the correct row is being modified.