You never know when your chosen uncommon character(s) might randomly turn up in informal text (such as when people use them for text effects like ɘꙅɿɘvɘЯ and t̸̢͔̬͉̦̅̋ḧ̸̖͓͎̤̖̲͔͔̹̘͋̆̔͌̿͘ḯ̸̛̛͙͈͍̬̺̙͊͂s̴̩̮̓) or when they turn out to be used in international text either because they really are intended for that use or because some old encoding used them as a substitute for a character that was not yet available directly. I suppose I could escape whatever separators may be in the text.īecause of the general nature of your target corpus I suspect that you will need to do this no matter what single character delimiter you choose, and probably for multi character options too. Also as others have suggested you could head for the other end of the table and use something from the control characters block like BEL (variously represented as \a, ^G, U+0007. There are many uncommon punctuation symbols you could try (though many may fall foul of your mathematical content requirement) such as a flureon (❧). That, and other emoji, might not be suitable if your corpus includes informal text. Perhaps U+1F4A9 (pile of pop) which I initially used but moved away from in case the client accidentally saw the intermediate proceeding content, didn't like it, and didn't buy the "we had to pick a character you'd never use". This may not be suitable for you as it may appear in your mathematical content, but there are many others you could try. In the end I settled on ‖ (U+2016, double vertical bar) as it looks like a delimiter if the content needs to be visually inspected for debugging purposes. # Strip each one of the columns to remove whitespaces.įor column in line.split(cell_separator)]ĭb_cursor.execute(insert_query, tuple(cleaned_columns))Īs the content is encoded as UTF8, are you able to use "special" characters (those not present in ANSI) as your delimiter? I've had a similar problem working around SSIS in Azure not being able to deal with quoted commas in CSV by preprocessing the files to use a different delimiter. # Split each line at each cell_separator into a list of strings. With open(import_file_name, 'r') as import_file: It's the column separator mentioned in the error. So even if you omit both, your code will work. Insert_query = "INSERT INTO imported (a, b, c) VALUES (?, ?, ?) "ĭb_connection = nnect("database.sqlite")ĭb_cursor.execute("CREATE TABLE imported (a TEXT, b TEXT, c INTEGER) ") 1 Answer Sorted by: 4 Edit: The default conditions to import a csv file to SQLite is:, comma as column separator new line for row separator. Here is an example in Python3 using $$$$$$ as a separator, but consider strings like if needed. I would use a multi-character separator with very low probability to be in the text to import and then parse the document using a custom script. separator does not allow multi-character strings or even a multi-byte characters. I was just about to suggest using a multi-character separator or a character from a foreign language (e.g.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |