====== VisiData ====== **[[https://www.visidata.org/|VisiData]]** is an interactive CLI table explorer with vim key bindings. Here is [[https://jsvine.github.io/intro-to-visidata/index.html|a really great tutorial]]. These are notes to myself. ====== Source Table Recommendations ====== Gather data with more granularity than you think you need. You often need three tables: - **Logs**: Include date, build (includes platform ID), product ID, params of log, total count, unique devices. - **Device Activity**: Includes date, build, product ID, active time, active device count. - **Mappings from product ID to product name**: (Maybe from platform ID or from product ID.) ====== Open VisiData and Set Data Types ====== $ vd logs.csv active_devices.csv id_name.csv ===== Set Column Types, Widths, Names, Importance ===== ^ Key ^ Meaning ^ | _ | Adjust widths of columns | | @ | Column type is date. | | ! | Column is important. (Will be key for table merging.) | | # | Column is integer. (Set aggregator to sum.) | | + | Set aggregator, usually to "sum" | | ; | Extract regex to new column. Ex, ''(video|audio)'', ''(^..)'' or ''(^([STUVW]%%...%%|..))'' | | %%^%% | rename the column. Might have to be "product_id" or "platform_id" | | = | Use Python function to create new column. Ex, hex to dec: ''int(curcol,16)'' | | : | Split column by regex | | - | Hide column | | S | Go to "Sheets" sheet, to select another sheet to format. | | q | Leave the sheet. (''gq'' quit all sheets) | | F | Make a Frequency Table for the selected column. (''gF'' for selected columns) | | U | Undo. ''R'' Redo. | | d | Delete row. (I only use this on Sheets sheet.) | | , | Select all rows that match this column's value | | " | Open duplicate sheet with only selected rows | ===== Inspecting Columnar Data ===== ^ Key ^ Meaning ^ | I | Describe all columns, errors, distinct, mode, mean, median, stdev, etc. | | i | Add a column of incrementing numbers (useful for '.' charts) | | . | Requires an "important" numeric column for row data. Make chart. | | O | Options to enable "numeric_binning" and set number of "histogram_bins" (use 'e') | | F | Frequency table of row counts, or histogram if numeric_binning is true | Calculating a percentage-of-total column for a numeric column: ^ Key ^ Meaning ^ | # | Set column type to "int". | | I | Describe all columns. (Highlight the "sum" cell for the column of interest.) | | ~ | Convert that column to "text" so it'll be copied correctly for pasting later. | | zy | Yank the value of the sum. | | q | Quit the Describe sheet. | | = | New column. Enter ''curcol/'', use Ctrl+y to paste the column sum value. | ====== Case Study Link: Exported CSV from PG&E ====== Visit [[vd-pge]]. ====== Cast Study: Merging Two Tables, logs and metadata ====== ==== Protip: Use column view to set multiple columns at once ==== ^ Key ^ Meaning ^ | C | Go to column view, each row is data for a column. | | t | Select rows, select all integer rows. | | ge | Go to "type" column, enter ''ge'' (global edit), type "int" | | | Go to "aggregators", enter ''ge'', type "sum" | | q | Leave that table. | For example, your "logs.csv" would end up looking like: {{:general:logs.png?direct|}} Your active_devices.csv would look like this: {{:general:active_devices.png?direct|}} ====== Join Sheets ====== Use ''!'' to set Key columns. Use ''F'' (or ''gF'') to remove dupes of Keys by making a Frequency Table. ^ Key ^ Meaning ^ | | The key columns to join have to match names. Check and rename with ''^''. | | S | Go to Sheets sheet | | t | (toggle) Select the sheets you want to join by key columns. | | & | (or g&) Join sheets. Type "inner" for an inner join. | | gu | Next time you're in Sheets, "global unselect" to unselect the sheets you selected. | ==== Protip: Remove insignificant noisy rows ==== ^ Key ^ Meaning ^ | %%z|%% | Select rows matching Python expression. Type "blits > 10000" | | " | Make a new sheet with selected rows | And the merge of the logs and active devices sheets would look like: {{:general:merge_logs_active.png?direct|}} Do a similar thing to join the Joined Sheet with the product Name sheet. ====== Process Data ====== In your final Joined sheet, add rows that are calculations of other rows. ^ Key ^ Meaning ^ | HL | (JK for rows) Move columns | | - | Remove columns | | | Maybe use ''C'' to assure desired columns are ints and aggregators are "sum" | | ! | Toggle key columns. | | gF | global Frequency Table so key columns are unique. | | = | Add column. Enter ''count_sum * 100 / active_devices_sum'' | | % | Set new column as float. | | _ | Resize columns | | ] | Sort by column | | . | (''g.'' for multiple columns) Make a graph based on Key column. | {{:general:joined_sheets_pct.png?direct|}} Now you should have some basic info to keep exploring. When you want to save what you've done: ^ Key ^ Meaning ^ | Ctrl+s | Save sheet | | Ctrl+d | Save command log (cmdlog). Use extension ".vd". You can replay what you've done later! | The commandlog is a CSV file too. The first lines are the open file commands. So to replay, you only have to specify the command log like so: $ vd --play=my_cmdlog.vd --replay-wait=0.5 ====== Lists in Cells for Frequency Tables ====== Sometimes you want one of the columns in a Frequency Table to be a list of unique values. Let's say the column title is "my_column", then: ^ Key ^ Meaning ^ | + | Set the aggregator to "List" | | F | Make a Frequency Table for the selected column. (gF for selected columns) | | =, ','.join(set(my_column)) | Create a new column of a comma delimited Python Set of cell entries. |