User Tools

Site Tools


vd

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
vd [2021/06/11 12:18] – [VisiData] dblumevd [2024/05/13 11:23] (current) – [Process Data] dblume
Line 3: Line 3:
 **[[https://www.visidata.org/|VisiData]]** is an interactive CLI table explorer with vim key bindings. Here is [[https://jsvine.github.io/intro-to-visidata/index.html|a really great tutorial]]. These are notes to myself. **[[https://www.visidata.org/|VisiData]]** is an interactive CLI table explorer with vim key bindings. Here is [[https://jsvine.github.io/intro-to-visidata/index.html|a really great tutorial]]. These are notes to myself.
  
-====== Tips ====== +====== Source Table Recommendations ======
- +
-===== Source Table Recommendations =====+
  
 Gather data with more granularity than you think you need. You often need three tables: Gather data with more granularity than you think you need. You often need three tables:
Line 13: Line 11:
   - **Mappings from product ID to product name**: (Maybe from platform ID or from product ID.)   - **Mappings from product ID to product name**: (Maybe from platform ID or from product ID.)
  
-===== Open VisiData and Set Data Types =====+====== Open VisiData and Set Data Types ======
  
   $ vd logs.csv active_devices.csv id_name.csv   $ vd logs.csv active_devices.csv id_name.csv
  
-==== Set Column Types, Widths, Names, Importance ====+===== Set Column Types, Widths, Names, Importance =====
  
 ^ Key ^ Meaning ^ ^ Key ^ Meaning ^
Line 25: Line 23:
 | # | Column is integer. (Set aggregator to sum.) | | # | Column is integer. (Set aggregator to sum.) |
 | + | Set aggregator, usually to "sum" | | + | Set aggregator, usually to "sum" |
-| ; | Extract regex to new column. Ex, ''(video|audio)'', ''(^..)'' or ''(^([STUVW]...|..))'' |+| ; | Extract regex to new column. Ex, ''(video|audio)'', ''(^..)'' or ''(^([STUVW]%%...%%|..))'' |
 | %%^%% | rename the column. Might have to be "product_id" or "platform_id" | | %%^%% | rename the column. Might have to be "product_id" or "platform_id" |
 +| = | Use Python function to create new column. Ex, hex to dec: ''int(curcol,16)'' |
 +| : | Split column by regex |
 | - | Hide column | | - | Hide column |
 | S | Go to "Sheets" sheet, to select another sheet to format. | | S | Go to "Sheets" sheet, to select another sheet to format. |
Line 36: Line 36:
 | " | Open duplicate sheet with only selected rows | | " | Open duplicate sheet with only selected rows |
  
-=== Protip: Use column view to set multiple columns at once ===+===== Inspecting Columnar Data ===== 
 + 
 +^ Key ^ Meaning ^ 
 +| I | Describe all columns, errors, distinct, mode, mean, median, stdev, etc. | 
 +| i | Add a column of incrementing numbers (useful for '.' charts) | 
 +| . | Requires an "important" numeric column for row data. Make chart. | 
 +| O | Options to enable "numeric_binning" and set number of "histogram_bins" (use 'e') | 
 +| F | Frequency table of row counts, or histogram if numeric_binning is true | 
 + 
 +Calculating a percentage-of-total column for a numeric column: 
 + 
 +^ Key ^ Meaning ^ 
 +| # | Set column type to "int". | 
 +| I | Describe all columns. (Highlight the "sum" cell for the column of interest.) | 
 +| ~ | Convert that column to "text" so it'll be copied correctly for pasting later. | 
 +| zy | Yank the value of the sum. | 
 +| q | Quit the Describe sheet. | 
 +| = | New column. Enter ''curcol/'', use Ctrl+y to paste the column sum value. | 
 +====== Case Study Link: Exported CSV from PG&E ====== 
 + 
 +Visit [[vd-pge]]. 
 + 
 +====== Cast Study: Merging Two Tables, logs and metadata ====== 
 + 
 +==== Protip: Use column view to set multiple columns at once ====
  
 ^ Key ^ Meaning ^ ^ Key ^ Meaning ^
Line 53: Line 77:
 {{:general:active_devices.png?direct|}} {{:general:active_devices.png?direct|}}
  
-===== Join Sheets =====+====== Join Sheets ======
  
 Use ''!'' to set Key columns. Use ''F'' (or ''gF'') to remove dupes of Keys by making a Frequency Table. Use ''!'' to set Key columns. Use ''F'' (or ''gF'') to remove dupes of Keys by making a Frequency Table.
Line 63: Line 87:
 | & | (or g&) Join sheets. Type "inner" for an inner join. | | & | (or g&) Join sheets. Type "inner" for an inner join. |
 | gu | Next time you're in Sheets, "global unselect" to unselect the sheets you selected. | | gu | Next time you're in Sheets, "global unselect" to unselect the sheets you selected. |
 +
 +==== Protip: Remove insignificant noisy rows ====
 +
 +^ Key ^ Meaning ^
 +| %%z|%% | Select rows matching Python expression. Type "blits > 10000" |
 +| " | Make a new sheet with selected rows |
  
 And the merge of the logs and active devices sheets would look like: And the merge of the logs and active devices sheets would look like:
Line 70: Line 100:
 Do a similar thing to join the Joined Sheet with the product Name sheet. Do a similar thing to join the Joined Sheet with the product Name sheet.
  
-===== Process Data =====+====== Process Data ======
  
 In your final Joined sheet, add rows that are calculations of other rows. In your final Joined sheet, add rows that are calculations of other rows.
Line 98: Line 128:
   $ vd --play=my_cmdlog.vd --replay-wait=0.5   $ vd --play=my_cmdlog.vd --replay-wait=0.5
  
 +====== Lists in Cells for Frequency Tables ======
  
 +Sometimes you want one of the columns in a Frequency Table to be a list of unique values. Let's say the column title is "my_column", then:
 +
 +^ Key ^ Meaning ^
 +| + | Set the aggregator to "List" |
 +| F | Make a Frequency Table for the selected column. (gF for selected columns) |
 +| =, ','.join(set(my_column)) | Create a new column of a comma delimited Python Set of cell entries. |
vd.1623439117.txt.gz · Last modified: 2023/04/12 20:44 (external edit)