User Tools

Site Tools


vd

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
vd [2021/06/11 11:45] – created dblumevd [2024/05/13 11:23] (current) – [Process Data] dblume
Line 1: Line 1:
 ====== VisiData ====== ====== VisiData ======
  
-**[[https://www.visidata.org/|VisiData]]** is CLI table explorer with vim key bindings. These are notes to myself.+**[[https://www.visidata.org/|VisiData]]** is an interactive CLI table explorer with vim key bindings. Here is [[https://jsvine.github.io/intro-to-visidata/index.html|a really great tutorial]]. These are notes to myself.
  
-====== Tips ====== +====== Source Table Recommendations ======
- +
-===== Source Table Recommendations =====+
  
 Gather data with more granularity than you think you need. You often need three tables: Gather data with more granularity than you think you need. You often need three tables:
Line 13: Line 11:
   - **Mappings from product ID to product name**: (Maybe from platform ID or from product ID.)   - **Mappings from product ID to product name**: (Maybe from platform ID or from product ID.)
  
-===== Open VisiData and Set Data Types =====+====== Open VisiData and Set Data Types ======
  
   $ vd logs.csv active_devices.csv id_name.csv   $ vd logs.csv active_devices.csv id_name.csv
  
-==== Set Column Types, Widths, Names, Importance ====+===== Set Column Types, Widths, Names, Importance =====
  
 ^ Key ^ Meaning ^ ^ Key ^ Meaning ^
Line 25: Line 23:
 | # | Column is integer. (Set aggregator to sum.) | | # | Column is integer. (Set aggregator to sum.) |
 | + | Set aggregator, usually to "sum" | | + | Set aggregator, usually to "sum" |
-| ; | Extract regex to new column. Ex, ''(video|audio)'', ''(^..)'' or ''(^([STUVW]...|..))'' |+| ; | Extract regex to new column. Ex, ''(video|audio)'', ''(^..)'' or ''(^([STUVW]%%...%%|..))'' |
 | %%^%% | rename the column. Might have to be "product_id" or "platform_id" | | %%^%% | rename the column. Might have to be "product_id" or "platform_id" |
 +| = | Use Python function to create new column. Ex, hex to dec: ''int(curcol,16)'' |
 +| : | Split column by regex |
 | - | Hide column | | - | Hide column |
 | S | Go to "Sheets" sheet, to select another sheet to format. | | S | Go to "Sheets" sheet, to select another sheet to format. |
Line 34: Line 34:
 | d | Delete row. (I only use this on Sheets sheet.) | | d | Delete row. (I only use this on Sheets sheet.) |
 | , | Select all rows that match this column's value | | , | Select all rows that match this column's value |
 +| " | Open duplicate sheet with only selected rows |
  
-=== Protip: Use column view to set multiple columns at once ===+===== Inspecting Columnar Data ===== 
 + 
 +^ Key ^ Meaning ^ 
 +| I | Describe all columns, errors, distinct, mode, mean, median, stdev, etc. | 
 +| i | Add a column of incrementing numbers (useful for '.' charts) | 
 +| . | Requires an "important" numeric column for row data. Make chart. | 
 +| O | Options to enable "numeric_binning" and set number of "histogram_bins" (use 'e') | 
 +| F | Frequency table of row counts, or histogram if numeric_binning is true | 
 + 
 +Calculating a percentage-of-total column for a numeric column: 
 + 
 +^ Key ^ Meaning ^ 
 +| # | Set column type to "int". | 
 +| I | Describe all columns. (Highlight the "sum" cell for the column of interest.) | 
 +| ~ | Convert that column to "text" so it'll be copied correctly for pasting later. | 
 +| zy | Yank the value of the sum. | 
 +| q | Quit the Describe sheet. | 
 +| = | New column. Enter ''curcol/'', use Ctrl+y to paste the column sum value. | 
 +====== Case Study Link: Exported CSV from PG&E ====== 
 + 
 +Visit [[vd-pge]]. 
 + 
 +====== Cast Study: Merging Two Tables, logs and metadata ====== 
 + 
 +==== Protip: Use column view to set multiple columns at once ====
  
 ^ Key ^ Meaning ^ ^ Key ^ Meaning ^
Line 44: Line 69:
 | q | Leave that table. | | q | Leave that table. |
  
-===== Join Sheets =====+For example, your "logs.csv" would end up looking like: 
 + 
 +{{:general:logs.png?direct|}} 
 + 
 +Your active_devices.csv would look like this: 
 + 
 +{{:general:active_devices.png?direct|}} 
 + 
 +====== Join Sheets ======
  
 Use ''!'' to set Key columns. Use ''F'' (or ''gF'') to remove dupes of Keys by making a Frequency Table. Use ''!'' to set Key columns. Use ''F'' (or ''gF'') to remove dupes of Keys by making a Frequency Table.
  
 ^ Key ^ Meaning ^ ^ Key ^ Meaning ^
 +|   | The key columns to join have to match names. Check and rename with ''^''. |
 | S | Go to Sheets sheet | | S | Go to Sheets sheet |
 | t | (toggle) Select the sheets you want to join by key columns. | | t | (toggle) Select the sheets you want to join by key columns. |
 | & | (or g&) Join sheets. Type "inner" for an inner join. | | & | (or g&) Join sheets. Type "inner" for an inner join. |
 | gu | Next time you're in Sheets, "global unselect" to unselect the sheets you selected. | | gu | Next time you're in Sheets, "global unselect" to unselect the sheets you selected. |
 +
 +==== Protip: Remove insignificant noisy rows ====
 +
 +^ Key ^ Meaning ^
 +| %%z|%% | Select rows matching Python expression. Type "blits > 10000" |
 +| " | Make a new sheet with selected rows |
 +
 +And the merge of the logs and active devices sheets would look like:
 +
 +{{:general:merge_logs_active.png?direct|}}
  
 Do a similar thing to join the Joined Sheet with the product Name sheet. Do a similar thing to join the Joined Sheet with the product Name sheet.
  
-===== Process Data =====+====== Process Data ======
  
 In your final Joined sheet, add rows that are calculations of other rows. In your final Joined sheet, add rows that are calculations of other rows.
Line 72: Line 116:
 | . | (''g.'' for multiple columns) Make a graph based on Key column. | | . | (''g.'' for multiple columns) Make a graph based on Key column. |
  
-Now you should have some basic info to keep exploring.+{{:general:joined_sheets_pct.png?direct|}}
  
 +Now you should have some basic info to keep exploring. When you want to save what you've done:
  
 +^ Key ^ Meaning ^
 +| Ctrl+s | Save sheet |
 +| Ctrl+d | Save command log (cmdlog). Use extension ".vd". You can replay what you've done later! |
 +
 +The commandlog is a CSV file too. The first lines are the open file commands. So to replay, you only have to specify the command log like so:
 +
 +  $ vd --play=my_cmdlog.vd --replay-wait=0.5
 +
 +====== Lists in Cells for Frequency Tables ======
 +
 +Sometimes you want one of the columns in a Frequency Table to be a list of unique values. Let's say the column title is "my_column", then:
 +
 +^ Key ^ Meaning ^
 +| + | Set the aggregator to "List" |
 +| F | Make a Frequency Table for the selected column. (gF for selected columns) |
 +| =, ','.join(set(my_column)) | Create a new column of a comma delimited Python Set of cell entries. |
vd.1623437125.txt.gz · Last modified: 2023/04/12 20:44 (external edit)